A tutorial on the enumerator library
Kazu Yamamoto
Created: 2011/03/29
This is a tutorial on the enumerator library, which is one implementation of Enumerator/Iteratee (EI) concept discovered by Oleg Kiselyov. The author of the library is John Millikin.
EI is an API between a supplier (enumerator) and a consumer (iteratee). This API makes the following possible:
- Abstraction of data input sources
- If you implement a network code as iteratee, you can test it with just a list.
- Modular programming
- You can compose an enumerator and an iteratee.
- You can compose enumerators.
- You can compose iteratees.
- Explicit control
- If you use the getContents family, it returns immediately. You cannot do something after the end of input explicitly. You cannot use System.Timeout.timeout for getContents. Worse, the getContents family might cause errors in pure functions.
- EI is an alternative of the getContents family with strict data. You can concentrate on strict ByteString for input and can forget lazy ByteString, for instance.
An example to motivate you
Let's borrow an example from Section 9 of Real World Haskell. We implement the "find" command of UNIX.
If you are a beginner Haskell programmer from imperative programming languages, you probably implement it in the imperative way like this:
import Control.Monad import Control.Applicative import Data.List import System.Directory import System.FilePath getValidContents :: FilePath -> IO [String] getValidContents path = filter (`notElem` [".", "..", ".git", ".svn"]) <$> getDirectoryContents path isSearchableDir :: FilePath -> IO Bool isSearchableDir dir = (&&) <$> doesDirectoryExist dir <*> (searchable <$> getPermissions dir) findImperative :: FilePath -> String -> IO () findImperative dir pattern = do cnts <- map (dir </>) <$> getValidContents dir forM_ cnts $ \path -> do when (pattern `isInfixOf` path) $ putStrLn path isDirectory <- isSearchableDir path when isDirectory $ findImperative path pattern
The problem is that "findImperative" does many works. It's one of main causes of bugs. A functional programmer splits it into a collector of file names and a filter, then composes them. Here is a functional "find" (findFunctional) composing a collector (getRecursiveContents) and a filter (grep):
getRecursiveContents :: FilePath -> IO [FilePath] getRecursiveContents dir = do cnts <- map (dir </>) <$> getValidContents dir cnts' <- forM cnts $ \path -> do isDirectory <- isSearchableDir path if isDirectory then getRecursiveContents path else return [path] return . concat $ cnts' grep :: String -> [FilePath] -> [FilePath] grep pattern = filter (pattern `isInfixOf`) findFunctional :: FilePath -> String -> IO () findFunctional dir pattern = grep pattern <$> getRecursiveContents dir >>= mapM_ putStrLn
But we meet a problem again. The getRecursiveContents function returns a list of file name all at once after all directory searches are finished. Thus, the findFunctional function makes a user waiting for a while and then prints matched file names all together.
What we want is:
- Modular programming
- Printing each file names immediately when found
We solve this problem with the enumerator library later. Before that, let's study how to use the enumerator library. This tutorial assumes version 0.4.9 or later.
Actors
There are three actors in the enumerator library.
- Iteratee
- takes input and does calculation.
- is executed by "run_".
- can perform side-effects if IO is specified.
- can be considered as automaton.
- can be composed with another Iteratee by (>>=):
- Iteratee >>= Iteratee --> Iteratee
- Enumerator
- can be composed with an Iteratee by ($$) resulting in another Iteratee:
- Enumerator $$ Iteratee --> Iteratee
- carries forward the state of Iteratee (automaton) by feeding inputs.
- can be composed with another Enumerator by (<==<):
- Enumerator <==< Enumerator --> Enumerator
- can be composed with an Iteratee by ($$) resulting in another Iteratee:
- Enumeratee
- locates in between Enumerator and Iteratee, and modifies inputs.
- can be composed with Iteratee by =$ resulting in another Iteratee:
- Enumeratee =$ Iteratee --> Iteratee
- can be composed with Enumerator by $= resulting in another Enumerator:
- Enumerator $= Enumeratee --> Enumerator
A spell
The following Sections assume that the following modules are imported:
{-# LANGUAGE OverloadedStrings #-} module EnumExamples where import Control.Monad.IO.Class (liftIO) import Data.ByteString (ByteString) import qualified Data.ByteString as BS -- for OverloadedStrings import Data.ByteString.Char8 () import Data.Enumerator hiding (map, filter) import qualified Data.Enumerator.Binary as EB import qualified Data.Enumerator.List as EL import Data.Maybe
Note that we are using strict ByteString, not lazy ByteString.
Implementing an Iteratee
Let's implement our first Iteratee.
When you are using a parse combinator library, you implement a Parser from a small Parser. We can implement an Iteratee on a small Iteratee, too. Let's use EB.head.
EB.head :: Monad m => Iteratee ByteString m (Maybe Word8)
We use IO monad. So, this signature becomes:
EB.head :: Iteratee ByteString IO (Maybe Word8)
You can read this like this:
ByteString -> IO (Maybe Word8)
That is, this Iteratee takes ByteString and returns Maybe Word8. Our first Iteratee just prints input and calls itself recursively.
consumer :: Iteratee BS.ByteString IO () consumer = do mw <- EB.head case mw of Nothing -> return () Just w -> do liftIO . putStr $ "XXX " liftIO . BS.putStrLn . BS.singleton $ w consumer
Now we can run it with "run_".
> run_ consumer
Oh my! Nothing happened. It's no surprises because we don't give any input!
Implementing an Enumerator
To give input to our Iteratee, let's implement our first Enumerator. We can use enumList.
enumList :: Monad m => Integer -> [a] -> Enumerator a m b
Please ignore the first argument at this moment. We create an Enumerator whose input source is a list of ByteString:
-- OverloadedStrings allows ByteString literal. listFeeder :: Enumerator ByteString IO a listFeeder = enumList 1 [ "12", "34" ]
Now we can compose "consumer" and "listFeeder" by ($$), and can execute the resulting Iteratee by "run_":
> run_ $ listFeeder $$ consumer XXX 1 XXX 2 XXX 3 XXX 4
Yes! We made it!
Increasing input source
Let's add a file input after the list input. EB.enumFile makes an Enumerator for a given file:
EB.enumFile :: FilePath -> Enumerator ByteString IO b
Suppose a file called "FILE" contains string of "5678". Let's define an Enumerator to read this file:
fileFeeder :: Enumerator BS.ByteString IO a fileFeeder = EB.enumFile "FILE"
Now we specify both listFeeder and fileFeeder, and execute it.
> run_ $ fileFeeder $$ listFeeder $$ consumer XXX 1 XXX 2 XXX 3 XXX 4 XXX 5 XXX 6 XXX 7 XXX 8
I believe that you now consider EI is useful for you.
Since ($$) is right associative, the example above compose Enumerator and Iteratee (twice).
fileFeeder $$ (listFeeder $$ consumer)
You can also compose two Enumerators first then compose it with the Iteratee:
> run_ $ (fileFeeder <==< listFeeder) $$ consumer the same results above
Increasing jobs
An Iteratee can pass left-over input to another Iteratee. So, the second Iteratee can work after the first Iteratee finishes its job. Since our first Iteratee consumes all inputs, there is no room that another Iteratee can take left-over input. So, Let's define an Iteratee that leaves unconsumed input.
consumer2 :: Iteratee ByteString IO () consumer2 = do mw <- EB.head case mw of Nothing -> return () Just w -> do liftIO . putStr $ "YYY " liftIO . BS.putStrLn . BS.singleton $ w
Please note that "XXX " is replaced with "YYY " and it does not call itself recursively. Remember that two Iteratee can be composed by (>>=). So, we can let two Iteratees work:
> run_ $ fileFeeder $$ listFeeder $$ (consumer2 >> consumer) YYY 1 XXX 2 XXX 3 XXX 4 XXX 5 XXX 6 XXX 7 XXX 8
Note that "comuser2" does not pass any computation results to "consumer". So, we compose them by (>>).
Using a pipe
Let's invite the third actor, Enumeratee. One of the simplest Enumeratee is EB.isolate. This takes N elements out of input where N is the first arguments.
EB.isolate :: Monad m => Integer -> Enumeratee ByteString ByteString m b
We can use it like this:
> run_ $ listFeeder $$ EB.isolate 2 =$ consumer XXX 1 XXX 2
Note that the number of output lines changes from 4 to 2.
Enumeratee can be also composed with Enumerator:
> run_ $ (listFeeder $= EB.isolate 2) $$ consumer XXX 1 XXX 2
The modules of the library
- Data.Enumerator.Binary (EB)
- manages ByteString with a buffer. If you use EB.head, one character (Word8) can be taken.
- Data.Enumerator.Text (ET)
- manages Text with a line. If you use ET.head, one character (Char) can be taken.
- Data.Enumerator.List (EL)
- If you use EL.head, ByteString and Text can be taken for EB and ET, respectively. That is, an entire buffer and an entire line are available, respectively.
If you want a line-based Enumerator for ByteString, you can implemnet it as follows:
enumHandleLines :: MonadIO m => Integer -> Handle -> Enumerator ByteString m ByteString enumHandleLines n hdl = EB.enumHandle n hdl $= byteLines byteLines :: Monad m => Enumeratee ByteString ByteString m b byteLines = EB.splitWhen (== 10) -- 10 is LF
EI find
Let's go back to the first problem, the functional "find". Real World Haskell solves this problem by defining a simple API for a collector of file names and a filter. You probably notice that this API is similar to EI. Here we use the EI API instead.
It is easy to implement a filter as Enumeratee and Iteratee.
import Control.Applicative import Control.Monad import Control.Monad.IO.Class import Data.Enumerator hiding (map, filter, filterM) import qualified Data.Enumerator.List as EL import Data.List import System.Directory import System.FilePath grepE :: String -> Enumeratee String String IO b grepE pattern = EL.filter (pattern `isInfixOf`) printI :: Iteratee String IO () printI = do mx <- EL.head case mx of Nothing -> return () Just file -> do liftIO . putStrLn $ file printI
If you remember Enumerators can be composed by (<==<) and learn the internals of the library a bit, a collector of file names can be implemented as follows:
enumDir :: FilePath -> Enumerator String IO b enumDir dir = list where list (Continue k) = do (files,dirs) <- liftIO getFilesDirs if null dirs then k (Chunks files) else k (Chunks files) >>== walk dirs list step = returnI step walk dirs = foldr1 (<==<) $ map enumDir dirs getFilesDirs = do cnts <- map (dir </>) <$> getValidContents dir (,) <$> filterM doesFileExist cnts <*> filterM isSearchableDir cnts
We can implement "findEnum" by composition:
findEnum :: FilePath -> String -> IO () findEnum dir pattern = run_ $ enumDir dir $$ grepE pattern =$ printI
The findEnum function works as exactly what we want.
Be free from termination condition
Let's modify "findEnum" so that it finishes after displaying N elements. We can make it just by inserting EL.isolate:
findEnum :: FilePath -> String -> Integer -> IO () findEnum dir pattern n = run_ $ enumDir dir $$ grepE pattern =$ EL.isolate n =$ printI
This example clearly shows that the Enumerator "enumDir" is free from termination condition. It's a heart of functional programming!
Futher readings
This tutorial concentrates on how to use the enumerator library. If you want to know how the enumerator library is implemented, I strongly recommend to read Yesod Book: Enumerator Package by Michael Snoyman.