This is a post about week 2 of the CIS 194: Introduction to Haskell course by Brent Yorgey, from the Penn University of Pennsilvania.
This week is all about Algebraic Data Types (ADT), not to be confused with Abstract Data Types (also ADT) which are another topic.
Haskell has enumeration types (like Java, but still less verbose and more intuitive). An example:
data Food = Pizza | Bacon | Salad deriving Show
We have just declared a new (algebraic data) type called Food
, with three data constructors which are the only values of the type Food
.
We can now define new functions on the new data type using pattern matching:
isTempting:: Food -> Boolean isTempting Salad = False isTempting _ = True
But in Haskell enumeration types are only a special case of Algebraic Data Types. One common class of ADT is called sum type (a.k.a. tagged union). A simple example of ADT which is not an enumeration is this:
data OperationResult = OK | Error Integer deriving Show
Here we see that Error
is a data constructor that takes an argument of type Integer
. We can construct new OperationResult
values using the Error
data constructor:
success:: OperationResult success = OK failure:: OperationResult failure = Error 404
OK
is a value of type OperationResult
(since it's a data constructor with zero arguments), but Error
by itself it's not. We have to pass an Integer
value to it to build an OperationResult
with it.
We've just introduced polymorphic data types. Specifically, we can have type signatures with variables just as we can have function implementations with variables. The difference here is that while in actual code variables are symbols bound to values, in type variables are bound to types of values. In other words, types are actually values in type signatures. We're reasoning on a higher and more abstract level. Take a moment to contemplate this fact.
Formally, in Haskell an ADT is a type with one or more data constructors, each one of them can have zero or more arguments.
A general example that shows how to build values is:
isSafeDiv:: Double -> Double -> OperationResult isSafeDiv _ 0 = Error 1000 isSafeDiv _ _ = OK
We can also use pattern matching to make decisions based on the structure of the OperationResult
value and bind variables to the arguments:
isSuccessful:: OperationResult -> Boolean isSuccessful OK = True isSuccessful (Error n) = False
It's idiomatic in Haskell when you have an algebraic data type with a single data constructor, to name it like the data type itself. Example:
data Person = Person String String Int deriving Show
This can be done since types and data constructors live in separate namespaces.
Pattern Matching
In general, pattern matching is a way to know what data constructor has been used to create a value of a certain ADT, and to take apart its arguments. Effectively, in Haskell this is the only way to make decisions.
pat ::= _ | var | var @ (pat) | (Constructor pat1 pat2 ... patn)In order:
_
is a wildcard.- We can pattern match against literal values (for example:
OK
). - We can pattern match against a pattern, and still bind the entire value to a variable.
- We can pattern match against a data constructor (even recursively).
It's worth noting that types like Int
can be viewed like an ADT:
data Int = 0 | 1 | 2 | ...
Indeed, we can pattern match on its values. But, perhaps obviously, they are not implemented like that in the compiler.
Case expressions
A way (the only one actually) to do pattern matching is by using case expressions:
case exp of pat1 -> exp1 pat2 -> exp2 ...
For example, we could reimplement the isSuccessful
function from earlier using a case expression:
isSuccessful:: OperationResult -> Boolean isSuccessful op = case op of OK -> True (Error n) -> False
However it's more elegant to use the first version. Indeed, the syntax for doing pattern matching in a function definition is just syntactic sugar on case expressions.
Recursive algebraic data types
It's interesting to note that ADTs can be recursive. For example, let's define a list of integers:
data IntList = Empty | Cons Int IntList
This definition can be read as: "an IntList
is either an Empty
one or an Int
value followed by an IntList
". This kind of definition is quite clear and elegant (see Church encoding). For example:
-- [1,2,3] can be represented as an IntList: l:: IntList l = Cons 1 (Cons 2 (Cons 3 Empty))
A recursive data ADT naturally leads to recursive functions. For example, to calculate the sum of all the values in an IntList
:
calcSum:: IntList -> Int calcSum Empty = 0 calcSum (Cons n l) = n + calcSum l
So, we've seen so far that type signatures can have variables, and can be recursive. Sounds like we could have a Turing-complete type system... indeed, we have one. Someone even implemented a LISP interpreter that completely runs on the Haskell type system!
That's all for this week. Remember: do the exercises!