Handling errors with Either

There are quite a few ways to indicate and handle errors in Haskell. We are going to look at one solution: using the type Either. Either is defined like this:

data Either a b
  = Left a
  | Right b

Simply put, a value of type Either a b can contain either a value of type a, or a value of type b. We can tell them apart from the constructor used.

Left True :: Either Bool b
Right 'a' :: Either a Char

With this type, we can use the Left constructor to indicate failure with some error value attached, and the Right constructor with one type to represent success with the expected result.

Since Either is polymorphic, we can use any two types to represent failure and success. It is often useful to describe the failure modes using an ADT.

For example, let's say that we want to parse a Char as a decimal digit to an Int. This operation could fail if the Character is not a digit. We can represent this error as a data type:

data ParseDigitError
  = NotADigit Char
  deriving Show

And our parsing function can have the type:

parseDigit :: Char -> Either ParseDigitError Int

Now when we implement our parsing function, we can return Left on an error describing the problem, and Right with the parsed value on successful parsing:

parseDigit :: Char -> Either ParseDigitError Int
parseDigit c =
  case c of
    '0' -> Right 0
    '1' -> Right 1
    '2' -> Right 2
    '3' -> Right 3
    '4' -> Right 4
    '5' -> Right 5
    '6' -> Right 6
    '7' -> Right 7
    '8' -> Right 8
    '9' -> Right 9
    _ -> Left (NotADigit c)

Either a is also an instance of Functor and Applicative, so we have some combinators to work with if we want to combine these kinds of computations.

For example, if we had three characters and we wanted to try and parse each of them and then find the maximum between them; we could use the applicative interface:

max3chars :: Char -> Char -> Char -> Either ParseDigitError Int
max3chars x y z =
  (\a b c -> max a (max b c))
    <$> parseDigit x
    <*> parseDigit y
    <*> parseDigit z

The Functor and Applicative interfaces of Either a allow us to apply functions to the payload values and delay the error handling to a later phase. Semantically, the first Either in order that returns a Left will be the return value. We can see how this works in the implementation of the applicative instance:

instance Applicative (Either e) where
    pure          = Right
    Left  e <*> _ = Left e
    Right f <*> r = fmap f r

At some point, someone will actually want to inspect the result and see if we get an error (with the Left constructor) or the expected value (with the Right constructor) and they can do that by pattern-matching the result.

Applicative + Traversable

The Applicative interface of Either is very powerful and can be combined with another abstraction called Traversable - for data structures that can be traversed from left to right, like a linked list or a binary tree. With these, we can combine an unspecified amount of values such as Either ParseDigitError Int, as long as they are all in a data structure that implements Traversable.

Let's see an example:

ghci> :t "1234567"
"1234567" :: String
-- remember, a String is an alias for a list of Char
ghci> :info String
type String :: *
type String = [Char]
      -- Defined in ‘GHC.Base’

ghci> :t map parseDigit "1234567"
map parseDigit "1234567" :: [Either ParseDigitError Int]
ghci> map parseDigit "1234567"
[Right 1,Right 2,Right 3,Right 4,Right 5,Right 6,Right 7]

ghci> :t sequenceA
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
-- Substitute `t` with `[]`, and `f` with `Either Error` for a specialized version

ghci> sequenceA (map parseDigit "1234567")
Right [1,2,3,4,5,6,7]

ghci> map parseDigit "1a2"
[Right 1,Left (NotADigit 'a'),Right 2]
ghci> sequenceA (map parseDigit "1a2")
Left (NotADigit 'a')

The pattern of doing map and then sequenceA is another function called traverse:

ghci> :t traverse
traverse
  :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
ghci> traverse parseDigit "1234567"
Right [1,2,3,4,5,6,7]
ghci> traverse parseDigit "1a2"
Left (NotADigit 'a')

We can use traverse on any two types where one implements the Applicative interface, like Either a or IO, and the other implements the Traversable interface, like [] (linked lists) and Map k (also known as a dictionary in other languages - a mapping from keys to values). For example, using IO and Map. Note that we can construct a Map data structure from a list of tuples using the fromList function - the first value in the tuple is the key, and the second is the type.

ghci> import qualified Data.Map as M -- from the containers package

ghci> file1 = ("output/file1.html", "input/file1.txt")
ghci> file2 = ("output/file2.html", "input/file2.txt")
ghci> file3 = ("output/file3.html", "input/file3.txt")
ghci> files = M.fromList [file1, file2, file3]
ghci> :t files :: M.Map FilePath FilePath -- FilePath is an alias of String
files :: M.Map FilePath FilePath :: M.Map FilePath FilePath

ghci> readFiles = traverse readFile
ghci> :t readFiles
readFiles :: Traversable t => t FilePath -> IO (t String)

ghci> readFiles files
fromList [("output/file1.html","I'm the content of file1.txt\n"),("output/file2.html","I'm the content of file2.txt\n"),("output/file3.html","I'm the content of file3.txt\n")]
ghci> :t readFiles files
readFiles files :: IO (Map String String)

Above, we created a function readFiles that will take a mapping from output file path to input file path and returns an IO operation that, when run will read the input files and replace their contents right there in the map! Surely this will be useful later.

Multiple errors

Note, since Either has the kind * -> * -> * (it takes two type parameters) Either cannot be an instance of Functor or Applicative: instances of these type classes must have the kind * -> *. Remember that when we look at a type class function signature like:

fmap :: Functor f => (a -> b) -> f a -> f b

And if we want to implement it for a specific type (in place of the f), we need to be able to substitute the f with the target type. If we'd try to do it with Either we would get:

fmap :: (a -> b) -> Either a -> Either b

And neither Either a or Either b are saturated, so this won't type check. For the same reason, if we'll try to substitute f with, say, Int, we'll get:

fmap :: (a -> b) -> Int a -> Int b

Which also doesn't make sense.

While we can't use Either, we can use Either e, which has the kind * -> *. Now let's try substituting f with Either e in this signature:

liftA2 :: Applicative => (a -> b -> c) -> f a -> f b -> f c

And we'll get:

liftA2 :: (a -> b -> c) -> Either e a -> Either e b -> Either e c

What this teaches us is that we can only use the applicative interface to combine two Eithers with the same type for the Left constructor.

So what can we do if we have two functions that can return different errors? There are a few approaches; the most prominent ones are:

  1. Make them return the same error type. Write an ADT that holds all possible error descriptions. This can work in some cases but isn't always ideal. For example, a user calling parseDigit shouldn't be forced to handle a possible case that the input might be an empty string
  2. Use a specialized error type for each type, and when they are composed together, map the error type of each function to a more general error type. This can be done with the function first from the Bifunctor type class

Monadic interface

The applicative interface allows us to lift a function to work on multiple Either values (or other applicative functor instances such as IO and Parser). But more often than not, we'd like to use a value from one computation that might return an error in another computation that might return an error.

For example, a compiler such as GHC operates in stages, such as lexical analysis, parsing, type-checking, and so on. Each stage depends on the output of the stage before it, and each stage might fail. We can write the types for these functions:

tokenize :: String -> Either Error [Token]

parse :: [Token] -> Either Error AST

typecheck :: AST -> Either Error TypedAST

We want to compose these functions so that they work in a chain. The output of tokenize goes to parse, and the output of parse goes to typecheck.

We know that we can lift a function over an Either (and other functors), we can also lift a function that returns an Either:

-- reminder the type of fmap
fmap :: Functor f => (a -> b) -> f a -> f b
-- specialized for `Either Error`
fmap :: (a -> b) -> Either Error a -> Either Error b

-- here, `a` is [Token] and `b` is `Either Error AST`:

> fmap parse (tokenize string) :: Either Error (Either Error AST)

While this code compiles, it isn't great, because we are building layers of Either Error, and we can't use this trick again with typecheck! typecheck expects an AST, but if we try to fmap it on fmap parse (tokenize string), the a will be Either Error AST instead.

What we would really like is to flatten this structure instead of nesting it. If we look at the kind of values Either Error (Either Error AST) could have, it looks something like this:

  • Left <error>
  • Right (Left error)
  • Right (Right <ast>)

Exercise: What if we just used pattern matching for this instead? What would this look like?

Solution
case tokenize string of
  Left err ->
    Left err
  Right tokens ->
    case parse tokens of
      Left err ->
        Left err
      Right ast ->
        typecheck ast

If we run into an error in a stage, we return that error and stop. If we succeed, we use the value in the next stage.


Flattening this structure for Either is very similar to that last part - the body of the Right tokens case:

flatten :: Either e (Either e a) -> Either e a
flatten e =
  case e of
    Left l -> Left l
    Right x -> x

Because we have this function, we can now use it on the output of fmap parse (tokenize string) :: Either Error (Either Error AST) from before:

> flatten (fmap parse (tokenize string)) :: Either Error AST

And now, we can use this function again to compose with typecheck:

> flatten (fmap typecheck (flatten (fmap parse (tokenize string)))) :: Either Error TypedAST

This flatten + fmap combination looks like a recurring pattern which we can combine into a function:

flatMap :: (a -> Either e b) -> Either a -> Either b
flatMap func val = flatten (fmap func val)

And now, we can write the code this way:

> flatMap typecheck (flatMap parse (tokenize string)) :: Either Error TypedAST

-- Or using backticks syntax to convert the function to infix form:
> typecheck `flatMap` parse `flatMap` tokenize string

-- Or create a custom infix operator: (=<<) = flatMap
> typeCheck =<< parse =<< tokenize string

This function, flatten (and flatMap as well), have different names in Haskell. They are called join and =<< (pronounced "reverse bind"), and they are the essence of another incredibly useful abstraction in Haskell.

If we have a type that can implement:

  1. The Functor interface, specifically the fmap function
  2. The Applicative interface, most importantly the pure function
  3. This join function

They can implement an instance of the Monad type class.

With functors, we were able to "lift" a function to work over the type implementing the functor type class:

fmap :: (a -> b) -> f a -> f b

With applicative functors we were able to "lift" a function of multiple arguments over multiple values of a type implementing the applicative functor type class, and also lift a value into that type:

pure :: a -> f a

liftA2 :: (a -> b -> c) -> f a -> f b -> f c

With monads we can now flatten (or "join" in Haskell terminology) types that implement the Monad interface:

join :: m (m a) -> m a

-- this is =<< with the arguments reversed, pronounced "bind"
(>>=) :: m a -> (a -> m b) -> m b

With >>=, we can write our compilation pipeline from before in a left-to-right manner, which seems to be more popular for monads:

> tokenize string >>= parse >>= typecheck

We had already met this function before when we talked about IO. Yes, IO also implements the Monad interface. The monadic interface for IO helped us with creating a proper ordering of effects.

The essence of the Monad interface is the join/>>= functions, and as we've seen we can implement >>= in terms of join, we can also implement join in terms of >>= (try it!).

The monadic interface can mean very different things for different types. For IO this is ordering of effects, for Either it is early cutoff, for Logic this means backtracking computation, etc.

Again, don't worry about analogies and metaphors; focus on the API and the laws.

Hey, did you check the monad laws? left identity, right identity, and associativity? We've already discussed a type class with exactly these laws - the Monoid type class. Maybe this is related to the famous quote about monads being just monoids in something something...

Do notation?

Remember the do notation? It turns out it works for any type that is an instance of Monad. How cool is that? Instead of writing:

pipeline :: String -> Either Error TypedAST
pipeline string =
  tokenize string >>= \tokens ->
    parse tokens >>= \ast ->
      typecheck ast

We can write:

pipeline :: String -> Either Error TypedAST
pipeline string = do
  tokens <- tokenize string
  ast <- parse tokens
  typecheck ast

And it will work! Still, in this particular case, tokenize string >>= parse >>= typecheck is so concise it can only be beaten by using >=> or <=<:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c

-- compare with function composition:
(.) ::              (b ->   c) -> (a ->   b) -> a ->   c
pipeline  = tokenize >=> parse >=> typecheck

or

pipeline = typecheck <=< parse <=< tokenize

Haskell's ability to create very concise code using abstractions is great once one is familiar with the abstractions. Knowing the monad abstraction, we are now already familiar with the core composition API of many libraries - for example:

Summary

Using Either for error handling is useful for two reasons:

  1. We encode possible errors using types, and we force users to acknowledge and handle them, thus making our code more resilient to crashes and bad behaviours
  2. The Functor, Applicative, and Monad interfaces provide us with mechanisms for composing functions that might fail (almost) effortlessly - reducing boilerplate while maintaining strong guarantees about our code and delaying the need to handle errors until it is appropriate