When I was reading the collection of learning resources on Haskell and tried to find a good start, I quickly realized that none of the books or tutorials are suitable for me: the easier a tutorial claims to be, the harder to really understand Haskell by reading it. What I need is a terse documentation that introduces the syntax and semantics of Haskell systematically and clearly, but unfortunately none was found.

I know I have to try the hard way: reading the Haskell language specification directly and absorb it myself. To make the process less dull and record my progress, I will write down my learning notes here incrementally.

## Overview

A Haskell program is organized with four levels: *modules*, *declarations*,
*expressions* & *lexical structures*, but the specification is organized in the
reverse order.

Haskell has *ad hoc* polymorphism (*overloading*) and *parametric* polymorphism
(Hindley-Milner type structure).

Haskell has six namespaces:

- for value
- variable
- constructor

- for type entity
- type variable
- type constructor
- type class

- module

The same name can be reused without conflicts as long as they are in different namespaces.

## Lexical Structure

A Haskell program is composed of lexemes (tokens) and whitespaces.

```
program → { lexeme | whitespace }
```

Whitespace includes:

- The complete set of whitespace in both ASCII & Unicode
- Two kinds of comments
- inline comment starts with
`--`

- nested comment wrapped by
`{- -}`

- inline comment starts with

A lexeme is one of the following:

- identifier
- qvarid: (qualified) variable identifier
- qconid: (qualified) constructor identifier
- reservedid: reserved identifier

- operator
- qvarsym: (qualified) variable (symbolic) operator
- qconsym: (qualified) constructor (symbolic) operator
- reservedop: reserved operator

- literal: integer, float, character or string literal
- special: one of special symbols
`()[]{}`,;`

A variable and a constructor is distinguished by the first character and put into different namespaces:

- identifier
- variable: lower case (including underscore)
- constructor: upper case

- operator
- variable: non-colon
- constructor: colon
`:`

A variable or constructor can contain symbol `'`

, so the common mathematical
term “x prime” can be represented as `x'`

.

By using layout rule (indentation), symbols `{};`

can be omitted in sereral
grammer productions.

## Expressions

### Parentheses

`(exp)`

is a *parenthesized expression*, and is equivalent to `exp`

.

### Function & operator application

Function is prefixed and curried, so `f x y`

means `(f x) y`

.

```
fexp → [fexp] aexp
```

All operators are infixed except prefix negation `-`

.

```
infixexp→ lexp qop infixexp
| - infixexp (prefix negation)
| lexp
qop → qvarop | qconop
```

An operator can be converted to prefix notation by parentheses. e.g.

```
(+) x y
```

Reversely, a function identifier (either variable or constructor) can be converted to an infix operator by backquotes.

```
x `op` y
```

### List & Tuple

List is constructed with `:`

.

```
1:2:3:[] or (:) 1 ((:) 2 ((:) 3 []))
```

*Arithmetic sequence* is another way to construct a list.

```
[1,3..6] == [1,3,5]
[1..6] == [1,2,3,4,5,6]
```

Tuple is constructed with `(,...,)`

.

```
(1, 2, "a") or (,,) 1 2 "a"
```

### Field label

A field label is used to give a name to a field in a datatype. It can be used to construct, extract and update the field.

A constructor with labeled fields:

```
aexp → qcon { fbind1 , … , fbindn } (n ≥ 0)
fbind → qvar = exp
```

### Pattern matching

A pattern itself is not an expression, but it is an important part of sereral
expressions, including: *lambda abstractions*, *function definitions*,
*let expressions*, *list comprehensions*, *do expressions* and *case expressions*.

Pattern matching is used to deconstruct values according to their type specification. It proceeds from left to right, and outside to inside. Attempting to match a pattern can have one of three results:

- Fail
- Succeed: returning a binding for each variable in the pattern
- Diverge: i.e. return ⊥

The syntax for patterns covers a subset of expressions.

A pattern can match against infix expressions, but limited to infix constructors
(the operator must be `qconop`

).

```
pat → lpat qconop pat (infix constructor)
```

A pattern can match against constructor functions (with or without field labels).

```
pat → ...
| lpat
lpat → apat
| gcon apat1 … apatk (arity gcon = k, k ≥ 1)
apat → ...
| gcon (arity gcon = 0)
| qcon { fpat1 , … , fpatk } (labeled pattern, k ≥ 0)
fpat → qvar = pat
```

A pattern can match against a variable, a literal, a parenthesized expression, a tuple or a list.

```
lpat → var ...
| - (integer | float) (negative literal)
apat → ...
| literal
| ( pat ) (parenthesized pattern)
| ( pat1 , … , patk ) (tuple pattern, k ≥ 2)
| [ pat1 , … , patk ] (list pattern, k ≥ 1)
```

The variables defined within the pattern can be binded, but how to name and bind
the whole pattern? This is exactly what an *as pattern* does (`var`

before `@`

is the name for the whole pattern).

```
apat → var [ @ apat] (as pattern)
```

Wildcard is used when you need a variable placeholder but do not want to bind the value to a name.

```
apat → ...
| _ (wildcard)
```

Sometimes you need a pattern that can never fail (only succeed or diverge), it is called a irrefutable pattern.

```
apat → ...
| ~ apat (irrefutable pattern)
```

Besides `~apat`

, these patterns are also irrefutable: a variable, a wildcard,
`N apat`

where N is a constructor defined by newtype and apat is irrefutable,
`var@apat`

where apat is irrefutable. All other patterns are refutable.

### Case expression

Case expression is very important because all other pattern matching expressions ultimately translate into case expressions.

A case expression has one or more alternatives.

```
lexp → case exp of { alts }
alts → alt1 ; … ; altn (n ≥ 1)
```

An alternative is either a pattern or empty. The pattern either coresponds to
an expression (body) directly, or has one or more guarded patterns (note an
optional gdpat appears at the right side of itself). A guarded pattern starts
with `|`

and is composed of one or more guards.

```
alt → pat -> exp [where decls]
| pat gdpat [where decls]
| (empty alternative)
gdpat → guards -> exp [ gdpat ]
guards → | guard1, …, guardn (n ≥ 1)
decls → { decl1 ; … ; decln } (n ≥ 0)
```

Note that each alternative can optionally have a `where`

declaration. It is used
to bind extra variables to be used in the local scope.

There are two types of guards: *pattern guard* & *boolean guard*, and local
declarations can also be introduced together with guards.

```
guard → pat <- infixexp (pattern guard)
| let decls (local declaration)
| infixexp (boolean guard)
```

This is how a case expression works:

- The expression after
`case`

is matched against each alternative till a match is found. - Then each guarded pattern in the matched alterantive is tested till one passes. A guarded pattern passes if and only if all of its guards pass.
- If successful, the conresponding expression is returned, otherwise, the next guarded pattern or alternative is tried sequentially.
- If no match can be found, the result is ⊥.

### Lambda expression

Lambda is used to construct a function without a name. Besides a variable, the input can also be any pattern.

```
lexp → \ apat1 … apatn -> exp (n ≥ 1)
```

An example:

```
(\ (x, y) -> x+y) (3, 7)
```

Also note that lambda `->`

associates to the right, e.g.

```
Integer -> Integer -> Integer
is equivalent to
Integer -> (Integer -> Integer)
```

### Let expression

A let expression introduces a nested, lexically-scoped, mutually-recursive list
of declarations. `exp`

after keyword `in`

is the value of a let expression.

```
lexp → let decls in exp
decls → { decl1 ; … ; decln } (n ≥ 0)
```

Some examples:

```
let {} in 42
let {(x,y) = (1,2)} in x+y
```

### List comprehension

A list comprehension constructs a list of elements represented by `exp`

qualified
by one or more qualifiers.

```
aexp → [ exp | qual1 , … , qualn ] (n ≥ 1)
```

There are three kinds of qualifiers.

A generator is composed of a pattern (`pat`

with type `t`

) and a list (`e`

with
type `[t]`

). The pattern is matched against and binded to each element of the
list one by one, so the binded variables can be used to generate each element of
the result list. The generators are evaluated in a nested, depth-first order,
and a failed match is just skipped over.

```
qual → pat <- e (generator)
```

A qualifier can also be a local declaration to bind auxiliary variables, or a boolean guard to exclude some elements from the list.

```
qual → ...
| let decls (local declaration)
| exp (boolean guard)
```

### Do expression

*Wait till monad is fully understood.*

### Type signature

Haskell has type inference, but an expression can be optionally specified with a
*type signature*.

```
exp → infixexp :: [context =>] type
```

## Declarations

There are top declarations (`topdecl`

) that are only allowed at the top level of
a module, and nested declarations (`decl`

) that may be used either at the top
level or in nested scopes.

```
topdecl → ...
| ...
| decl
```

### Top declarations

A top declaration starts with one of the keywords: `data`

, `type`

, `newtype`

,
`class`

, `instance`

, `default`

and `foreign`

.

`data`

, `type`

and `newtype`

is for declaring new types. `class`

is for
declaring typeclasses and `instance`

is for declaring the membership between
types and typeclasses.

They are explained later one by one.

### Nested declarations

In the last section, `decl`

appears in the let expression and where clause
without any explanations. Actually, they are nested declarations.

There are five kinds of nested declarations: *type signature*, *fixity*,
*function binding*, *pattern binding* and *empty* declartions.

A *type signature* simply specifies types for variables.

```
decl → gendecl
| ...
gendecl → vars :: [context =>] type
vars → var1 , …, varn (n ≥ 1)
```

A *fixity* declaration gives the fixity and binding precedence of one or more
operators.

```
gendecl → ...
| fixity [integer] ops
fixity → infixl | infixr | infix
ops → op1 , … , opn (n ≥ 1)
op → varop | conop
```

The right hand side (`rhs`

) of function and pattern bindings are the same.

```
decl → ...
| (funlhs | pat) rhs
```

The syntax of `rhs`

is almost the same as *case expression*, except `->`

is
replaced by `=`

.

```
rhs → = exp [where decls]
| gdrhs [where decls]
gdrhs → guards = exp [gdrhs]
```

A function can be binded in multiple function binding declarations, as long as they are contiguous and the number of patterns is the same. In each of the declaration, the left hand side supports three alternative syntaxes.

```
funlhs → var apat { apat }
| pat varop pat
| ( funlhs ) apat { apat }
```

And an example:

```
plus x y z = x+y+z
x ‘plus‘ y = \ z -> x+y+z
(x ‘plus‘ y) z = x+y+z
```

### Type

Just as an *expression* may contain variables and denotes a value, a
*type expression* may contain *type variables* and denotes a type value (but it
is evaluated at compile time, unlike an expression evaluated at run time).

Type expressions are designed to have similar representations as their corresponding expressions.

Function type can be represented in infix or prefix notation.
Like function application, *type application* (`btype`

) is also prefixed and
curried. `atype`

is the type expression excluding infix function type and
type application.

```
type → btype [-> type] (function type)
btype → [btype] atype (type application)
atype → gtycon
| ...
gtycon → ...
| (->) (function constructor)
```

A *type variable* is an identifier beginning with a lowercase letter.
A *parenthesized type* `(t)`

is identical to the type `t`

.
A *type constructor* (type template) as an identifier begins with an uppercase
letter.
Besides function type, the syntaxes for tuple and list are also special syntaxes.

```
atype → gtycon
| tyvar
| ( type1 , … , typek ) (tuple type, k ≥ 2)
| [ type ] (list type)
| ( type ) (parenthesised constructor)
gtycon → qtycon
| () (unit type)
| [] (list constructor)
| (->) (function constructor)
| (,{,}) (tupling constructors)
```

### Kind

What is `*`

, `* -> *`

…? An ordinary type has kind `*`

. A type constructor
(template) that has one type argument has kind `* -> *`

, e.g. a list. So on and
so forth.

### Context

The context (`context => type`

) is used to indicate that type `tyvar`

belongs to
class `qtycls`

.

A context is composed of zero or more class assertions.

```
context → class
| ( class1 , … , classn ) (n ≥ 0)
```

And a class assertion is either either a type variable, or the application of type variable to one or more types.

```
class → qtycls tyvar
| qtycls ( tyvar atype1 … atypen ) (n ≥ 1)
qtycls → [ modid . ] tycls
tycls → conid
tyvar → varid
```

### Algebraic data type

*algebraic data type* is named so because it is composed of other types by
product and sum (algebraic operations). It introduces a new type constructor
(`simpletype`

) with zero or more constituent data constructors (`constrs`

).

```
topdecl → ...
| data [context =>] simpletype [= constrs] [deriving]
```

The type constructor begins with upper case letter and may have zero to more type variables as parameters. The type constructor then can be used in curried type application in a type expression.

```
simpletype → tycon tyvar1 … tyvark (k ≥ 0)
```

On the right side of `=`

, sum (alternative) types are separated by `|`

.

```
constrs → constr1 | … | constrn (n ≥ 1)
```

For each `constr`

, there are three alternative syntaxes.

- A data constructor followed by zero or more
`atype`

. - An infix data constructor operator between two
`btype`

. - A data constructor followed by field declarations.

```
constr → con [!] atype1 … [!] atypek (arity con = k, k ≥ 0)
| (btype | ! atype) conop (btype | ! atype) (infix conop)
| con { fielddecl1 , … , fielddecln } (n ≥ 0)
con → conid | ( consym ) (constructor)
```

`!`

is strictness flag to indicate that the corresponding constructor argument
is eagerly evaluated.

### Type synonym

A type synonym is equivalent to its definition are completely interchangeable.

```
topdecl → type simpletype = type
```

### Newtype

A *newtype* declaration introduces a distinct type whose representation is the
same as an existing type.

```
topdecl → newtype [context =>] simpletype = newconstr [deriving]
newconstr → con atype
| con { var :: type }
```

- Unlike type synonyms,
`newtype`

may be used to define recursive types. - New instances can be defined for a type defined by
`newtype`

but may not be defined for a type synonym. - A type created by
`newtype`

has an extra level of indirection compared to an algebraic datatype, so the pattern matching rule is different.

### Ad hoc polymorphism

Ad hoc polymorphism (overloading) is supported by *typeclasses*.

A class declaration introduces a new class and the operations (methods) on it.

## Modules

*To be continued…*