What is GADT (Generalized Algebraic Data Types) in Haskell?

We assume that you are familiar with what we are doing without GADT. Say, for the simplest case, we are defining a list:

List a = Nil | Cons a (List a)

What exactly are we doing?

  • On the LHS, it defines a “type constructor” called List. It takes a type as parameter, and returns a new type. It’s essentially a function of types.
  • On the RHS, it tells us that we have two ways to construct a value of type List a. One is called Nil and the other Cons a (List a).
    • You can directly write Nil to get a value of type List a, with type a to be determined, more precisely inferred from the context.
    • In the case of Cons a (List a), it’s actually a function named Cons, taking two parameters, one with type a, the other with type List a. Again, the type of a should be inferred from the context. But this time it’s often more direct, as a is the type of the first parameter.

Then we can represent these two constructors as normal functions:

Nil :: List a
Cons :: a -> List a -> List a

We can see that they are functions both with the same return type List a. The only thing that GADT does, is allowing a specific a here. For example, we can have a constructor with a specifically set to Int, meaning it only constructs values of type List Int.

IntNil :: List Int

We can experiment with some code in the GADT flavor:

data List a where
  IntNil :: List Int
  Nil :: List a
  Cons :: a -> List a -> List a

a :: List Int
a = IntNil

-- Would raise an error for type incompatibility.
-- c_ :: List Char
-- c_ = IntNil

-- This one is compatible.
c :: List Char
c = Nil

Certainly, a does not have to be a type as simple as Int. It could be an arbitrary type, even List a or List (List a) with a still unspecified:

data Pair a b where Pair :: a -> b -> Pair a b

data List a where IntListNil :: List (List Int)
  ListNil :: List (List a)
  IntNil :: List Int
  Nil :: List a
  Cons :: a -> List a -> List a
  -- This one is invalid, since the top-level `Pair` is not `List`
  -- Bad :: a -> List a -> Pair a (List a)

So what can GADT be used for? What can be additionally done with it? Here are some scenarios as far as I know:

The most famous use case is to embed (the AST of) a language into Haskell. Suppose we have such a simple language of expressions:

data Expr = 
    ILit Int
  | BLit Bool
  | Add Expr Expr
  | Eq Expr Expr

Such a data type definition is sufficient to contain the language, but it also contains expressions beyond the language (that is, invalid expressions). For example, Bool values cannot be Added. But this definition does not prevents us from writing:

addBool :: Expr
addBool = Add (BLit True) (BLit False)

An attempt to fix this is to record the “return type” of an Expr somewhere. Remember we have done that with List a, where we record the member type a in the list type List a. We can do the same to Expr:

data Expr a = 
    ILit Int
  | BLit Bool
  | Add (Expr Int) (Expr Int)
  | Eq (Expr a) (Expr a)

This looks plausible. But there are still problems!

The crucial part is, while BLit True should be Expr Bool so that it cannot be the parameters of Add, it isn’t actually. The BLit True here is still of type Expr a with a undetermined (to be inferred). This is parallel to the case where Nil is of type List a with a undetermined. There is actually no relation between the type of the parameter of the constructor BLit and the type of the constructed value.

Here GADT comes into the play. We can constraint the return type of the constructor only with GADT:

data Expr a where
  ILit :: Int -> Expr Int
  BLit :: Bool -> Expr Bool
  Add :: Expr Int -> Expr Int -> Expr Int
  -- In the case of Eq, we can loosen the type specification into a typeclass.
  Eq :: Eq a => Expr a -> Expr a -> Expr Bool

This would achieve the goal:

-- Won't compile, since `BLit True` and `BLit False` are of type `Expr Bool`
-- addBool =  Add (BLit True) (BLit False)
addInt = Add (ILit 1) (ILit 2)
eqBool = Eq (BLit True) (BLit False)

Fantastic!

This is the so-called “initial embedding” of a language. The final version we achieved with GADT is a “tagless  initial embedding”. Why tagless? Because ADT is the so-called “tagged union”, and we are abandoning it, thus tagless.

The Expr represents a very simple language. You can well embed more complex languages such as simply type lambda calculus.

Contrary to “initial embedding”, there’s “final embedding” as well. But that’s another story.


We can even impose more restrictions on the embed of languages. Suppose we have a language describing stack operations. Push means push a number to the top of the stack. Add means pop the top two numbers and push back their sum:

data StackLang where
  Begin :: StackLang
  Push :: Int -> StackLang -> StackLang
  Add :: StackLang -> StackLang

bug = Add (Push 1 Begin)

The bug line reflects a bug that would happen. When there’s only one element in the stack, an Add could not be performed. If only we could eliminate such silly bugs on the type level!

This could be possible when the number of elements in the stack is encoded in the type StackLang. This is exactly where GADT shines:

data Z
data S n

data SStackLang t where
  SBegin :: SStackLang Z
  SPush :: Int -> SStackLang t -> SStackLang (S t)
  SAdd :: SStackLang (S (S t)) -> SStackLang (S t)

The Z (for 0) and S (for successor) here are simulations of the Peano system of natural numbers. Now compiling the following

sBug = SAdd (SPush 1 SBegin)

would raise an error:

• Couldn’t match type ‘Z’ with ‘S t’
Expected type: SStackLang (S (S t))
Actual type: SStackLang (S Z)

Safe as expected.

This is actually a simulation of the dependent type. You can define types representing “lists of length n” and so on in this way.

GADT can also convert types into values. You can do this without GADT:

data Witness a = Witness a

But you need to supply a value of type a.  What if you cannot find such a value? It’s not reasonable to require a value to encode a type, as such a value may not exist at all! (consider Void)

You can drop this requirement with GADT:

data Witness a where
  IntWitness :: Witness Int
  BoollWitness :: Witness Bool

In this way, you directly get a value in the encoded type Witness a, even if there’s no value of the original type a.

This is the so called “type witness”. But how can this be used? (to be continued)

freetiger18 :