How To Write A Program In Haskell
Parsing a simple imperative language. (introduced in a book 'Principles of Program. First of all we create the language definition using Haskell's. Basic Structure of a Haskell Program. I already know A LOT of Haskell syntax. I can write recursive list comprehensions, and manipulate strings, integers.
My name is Chris. I teach Haskell to people that are new to programming and as well as long-time coders. Haskell is a general purpose programming language that is most useful to mere mortals.
I’m going to show you how to write a package in Haskell and interact with the code inside of it. Installing tools for writing Haskell code We’re going to use Stack to manage our project dependencies, compiler, building our code, and running our tests.
After you’ve finished the install instructions, stack should all be in your path. Ghci is the REPL (read-eval-print loop) for Haskell, though as often as not, you’ll use stack ghci to invoke a REPL that is aware of your project and its dependencies. What we’re going to make We’re going to write a little csv parser for some baseball data. I don’t care a whit about baseball, but it was the best example of free data I could find. Project layout There’s not a prescribed project layout, but there are a few guidelines I would advise following.
One is that is not only a fantastic library in its own right, but is also a great resource for people wanting to see how to structure a Haskell project, write and generate Haddock documentation, and organize your namespaces. Kmett’s library follows on what namespaces and categories to use for his libraries. There is an alternative namespacing pattern demonstrated.
It uses a top-level eponymous namespace. For an example of another popular project you could also look at for examples of how to organize non-trivial Haskell projects. Once we’ve finished laying out our project, it’s going to look like this. I’m also going to add the gitignore from Github’s gitignore repository plus some additions for Haskell so we don’t accidentally check in unnecessary build artifacts or other things inessential to the project. This should go into a file named.gitignore at the top level of your bassbull project. Dist dist-. cabal-dev.o.hi.chi.chs.h.dyno.dynhi.hpc.hsenv.cabal-sandbox/ cabal.sandbox.config.prof.aux.hp.eventlog.stack-work/ Editing the Cabal file First we need to fix up our cabal file a bit.
Mine is named bassbull.cabal and is in the top level directory of the project. Here’s what I changed my cabal file to: name: bassbull version: 0.1.0.0 synopsis: Processing some csv data description: Baseball data analysis homepage: bitemyapp.com license: BSD3 license-file: LICENSE author: Chris Allen maintainer: copyright: 2016, Chris Allen category: Data build-type: Simple cabal-version: =1.10 executable bassbull ghc-options: -Wall hs-source-dirs: src main-is: Main.hs build-depends: base = 4.7 &&. One thing to note is that for a module to work as a main-is target for GHC, it must have a function named main and itself be named Main.
Most people make little wrapper Main modules to satisfy this, sometimes with argument parsing and handling done via libraries like. For now, we’ve left Main very simple, making it just a putStrLn of the string 'Hello'. To validate that everything is working, let’s build and run this program.
Then we install our dependencies by building our project. This can take some time on the first run, but Stack will cache and share dependencies across your projects automatically. If you do, you should see a bunch of stuff about loading packages installed for the project and then a Prelude Main prompt. 1 of 1 Compiling Main ( Main.hs, interpreted ) Ok, modules loaded: Main.
Prelude Main Now we can load our src/Main.hs in the REPL. $ stack ghci Preprocessing executable 'bassbull' for bassbull-0.1.0.0. GHCi, version 7.8.3::? For help Loading package ghc-prim. Loading package integer-gmp. Loading package base. Loading package array-0.5.0.0.
Loading package deepseq-1.3.0.2. Loading package bytestring-0.10.4.0. Loading package containers-0.5.5.1.
Loading package text-1.2.0.0. Loading package hashable-1.2.2.0. Loading package scientific-0.3.3.2. Loading package attoparsec-0.12.1.2. Loading package blaze-builder-0.3.3.4. Loading package unordered-containers-0.2.5.1.
Loading package primitive-0.5.4.0. Loading package vector-0.10.12.1. Loading package cassava-0.4.2.0. 1 of 1 Compiling Main ( src/Main.hs, interpreted ) src/Main.hs:3:1: Warning: Top-level binding with no type signature: main:: IO Ok, modules loaded: Main.Main:load src/Main.hs 1 of 1 Compiling Main ( src/Main.hs, interpreted ) src/Main.hs:3:1: Warning: Top-level binding with no type signature: main:: IO Ok, modules loaded: Main.Main Becoming comfortable with the REPL can be a serious boon to productivity. There is editor integration for those that want it as well. Now we’re going to update our src/Main.hs. Our goal is to read a CSV file into a ByteString (basically a byte vector), parse the ByteString into a Vector of tuples, and sum up the “at bats” column.
First, we’re importing our dependencies. Qualified imports let us give names to the namespaces we’re importing and use those names as a prefix, such as BL.ByteString. This is used to refer to values and type constructors alike. In the case of import Data.Csv where we didn’t qualify the import (with qualified), we’re bringing everything from that module into scope. This should be done only with modules that have names of things that won’t conflict with anything else. Other modules like Data.ByteString and Data.Vector have a bunch of functions that are named identically to functions in the Prelude and should be qualified.
Here we’re creating a type alias for BaseballStats. I made it a type alias for a few reasons. One is so I could put off talking about algebraic data types! I made it a type alias of the 4-tuple specifically because the Cassava library already understands how to translate CSV rows into tuples and our type here will “just work” as long as the columns that we say are Int actually are parseable as integral numbers. Haskell tuples are allowed to have heterogenous types and are defined primarily by their length.
The parentheses and commas are used to signify them. For example, (a, b) would be both a valid value and type constructor for referring to 2-tuples, (a, b, c) for 3-tuples, and so forth.
We need to read in a file so we can parse our CSV data. We called the lazy ByteString namespace BL using the qualified keyword in the import.
From that namespace we used BL.readFile which has type FilePath - IO ByteString. You can read this in English as I take a FilePath as an argument and I return a ByteString after performing some side effects. We’re binding over the IO ByteString that BL.readFile 'batting.csv' returns. CsvData has type ByteString due to binding over IO. Remember our tuples that we signified with parentheses earlier? Wellis a sort of tuple too, but it’s the 0-tuple!
In Haskell we usually call it unit. It can’t contain anything; it’s a type that has a single value - , that’s it. It’s often used to signify we don’t return anything.
Since there’s usually no point in executing functions that don’t return anythingis often wrapped in IO. Printing strings are a good example of the result type IO as they do their work and return nothing. In Haskell you can’t actually “return nothing;” the concept doesn’t even make sense. Thus we use as the idiomatic “I got nothin’ for ya” type and value.
Usually if something returns you won’t even bother to bind to a name, you’ll just ignore it. Here we’re using a let expression to bind the expression fmap (V.foldr summer 0) v to the name summed so that the expressions that follow it can refer to summed without repeating all the same code. First we fmap over the Either String (V.Vector BaseballStats). This lets us apply (V.foldr summer 0) to V.Vector BaseballStats. We partially applied the Vector folding function foldr to the summing function and the number 0.
The number 0 here is our “start” value for the fold. Generally in Haskell we don’t use recursion directly. Instead in Haskell we use higher order functions and abstractions, giving names to common things programmers do in a way that lets us be more productive. One of those very common things is folding data. You’re going to see examples of folding and the use fmap from Functor in a bit. We say V.foldr is partially applied because we haven’t applied all of the arguments yet. Haskell has something called currying built into all functions by default which lets us avoid some tedious work that would require a “Builder” pattern in languages like Java.
Unlike previous code samples, these examples are using my interactive ghci REPL. This lets us apply some, but not all, of the arguments to a function and pass around the result as a function expecting the rest of the arguments.
Fully explaining the fmap in let summed = fmap (V.foldr summer 0) v would require explaining Functor. I don’t want to belabor specific concepts too much, but I think a quick demonstration of fmap and foldr would help here. This is also a transcript from my interactive ghci REPL. I’ll explain Either, Right, and Left after the REPL sample.
The:type or:t command is a command to my ghci REPL, not part of the Haskell language. It’s a way to request the type of an expression. Either in Haskell is used to signify cases where we might get values of one of two possible types. Either String Int is a way of saying, “you’ll get either a String or an Int”. This is an example of sum types.
You can think of them as a way to say or in your type, where a struct or class would let you say and. Either has two constructors, Right and Left. Culturally in Haskell Left signifies an “error” case.
This is partly why the Functor instance for Either maps over the Right constructor but not the Left. If you have an error value, you can’t keep applying your happy path functions. In the case of Either String Int, String would be our error value in a Left constructor and Int would be the happy-path “yep, we’re good” value in the Right constructor. Also, Haskell has type inference. You don’t have to declare types explicitly like I did in the example from my REPL transcript - I did so for the sake of explicitness. Either isn’t the only type we can map over.
We use parens on the left-hand side here to pattern match at the function declaration level on whether our Either e Int is Right or Left. Parentheses wrap (addOne numberWeWanted) so we don’t try to erroneously pass two arguments to Right when we mean to pass the result of applying addOne to numberWeWanted, to Right. If our value is Right 1 this is returning Right (addOne 1) which reduces to Right 2. As we process the CSV data we’re going to be doing so by folding the data. This is a general model for understanding how you process data that extends beyond specific programming languages. You might have seen fold called reduce. Here are some examples of folds and list/string concatenation in Haskell.
We’re switching back to REPL demonstration again. Streaming We can improve upon what we have here. Currently we’re going to use as much memory as it takes to store the entirety of the csv file in memory, but we don’t really have to do that to sum up the records! Since we’re just adding the current records’ “at bats” with the sum we’ve accumulated so far, we only really need to read one record into memory at a time.
By default Cassava will load the csv into a Vector for convenience, but fortunately it has a streaming module so we can stream the data incrementally and fold our result without loading the entire dataset at once. First, we’re going to drop Cassava’s default module for the streaming module. Changing from this. The core here is the Records datatype Cassava gives us via the Streaming module.
You can read more about the Records datatype. Records is a sum type, you could read out in English like so:. data Records a - Records is a datatype that takes a type variable a. Cons (.) Nil (.) - It is a sum type of two possible constructors, Cons or Nil (note the list-like nomenclature).
This is way of saying a Record a is always either Cons or Nil. Cons (Either String a) (Record a) - the Cons data constructor is a product of Either String a and Record a. We’re saying Cons is always Either String a and Record a. Also, this Cons resembles the cons-cells in Lisp, Haskell, ML, etc. The library has the following comment about it: “A record or an error message, followed by more records.”. Nil (Maybe String) BL.ByteString - the Nil data constructor is a product of Maybe String and BL.ByteString. The library has the following comment: “End of stream, potentially due to a parse error.
If a parse error occured, the first field contains the error message. The second field contains any unconsumed input.” What the Records type is doing for us is letting us process the records like a lazy list, but with a little extra context in the Nil case. Because Haskell has abstractions like the Foldable typeclass, we can talk about folding a dataset without caring about the underlying implementation! We could’ve used the foldr from Foldable on our Vector, a List, a Tree, a Map - not just Cassava’s streaming API. Foldr from Foldable has the type: Foldable t = (a - b - b) - b - t a - b. Note the similarity with the foldr for the list type, (a - b - b) - b - a - b. What we’ve done is abstracted the specific type out and made it into a generic interface.
In case you’re wondering what the Foldable instance is doing under the hood. Adding tests Now we’re going to add tests to our package.
First we are going to add a test suite to our bassbull.cabal file. The name of our test suite will just be tests. Test-suite tests ghc-options: -Wall type: exitcode-stdio-1.0 main-is: Tests.hs hs-source-dirs: tests build-depends: base, bassbull, hspec default-language: Haskell2010 We’re also going to add a library and shift over some code so that our package is exposed as a proper library rather than only working as an executable.
We’re exposing a single module named Bassbull. With an hs-source-dirs of src and an exposed module named Bassbull, Cabal will expect a file to exist at src/Bassbull.hs. Library ghc-options: -Wall exposed-modules: Bassbull build-depends: base = 4.7 &&. The above will then give you a REPL which can see anything the build in your Cabal named tests can see.
You can then run the main function or individual test suites - if you bother to split them out. Tests are useful and important in Haskell, although I often find I need much fewer of them. Often my process for working on an existing Haskell project will involve working on the code I’m changing with Emacs and a REPL instantiated via stack ghci.
As my code starts passing the type-checker, I start running the tests as another layer of assurance that I’m doing the right thing. I like having a lot of feedback and help from my computer when writing code! Making your Haskell packages available to the Haskell community is the main community repository of Haskell packages and will usually be where you look to find libraries you need. Mostly you’ll find libraries and the occasional executable utility, but utilities should also be exposing library APIs that make their functionality accessible via Haskell code. This is not only more useful to other people but enforces good practices and more modular projects. Sabita vabi comics in bengali. Haskell users are accustomed to documentation that is accessible via the Hackage website directly.
The tool that builds this documentation is called. I strongly recommend you look at well-established libraries like for examples of how to and with your Haskell projects. To learn more and for more information on building a package for uploading to Hackage see. How I work When I’m working with Haskell code, I interact with my code in a few ways. One is that I’m writing the code itself in Emacs. I’ll also have a terminal with a REPL open, usually via stack ghci as I am almost always working on a specific project.
My Emacs config is pretty sundry, it’s just haskell-mode and flycheck. My basic happy-path event-loop for writing Haskell is:. Import module I’m working on in the REPL before I’ve changed anything. Change/add/delete code.:reload in the REPL. Flycheck will give me type errors, but I sometimes like to see them in the REPL too. Sometimes I’ll use eta-reduction to refactor code.
How To Write A Program
You can see an example of this. Making code point-free makes the most sense when it’s primarily about composing functions rather than about applying them. If code still type-checks after some cleaning, I’ll run the tests. If tests pass, I move on unless I’m suspicious about test coverage. If tests break or I want more coverage, I write more tests until I’m satisfied. When that’s done, I return to step #1 in this loop for the next unit of work I want to perform. My diagnosis process when something isn’t working:.
If I can’t get something to type-check, I’ll break down sub-expression, query the types of those sub-expressions and make certain they were what I expected. If have expressions I am trying to combine and I trying to make the types thereof make sense, but I haven’t implemented them yet I will use undefined and work with only application, composition, and monadic variations thereof to figure out how I need to get to where I’m going before I’ve implemented anything.
You can see a good. I wrote the solution @ifesdjeen displays in his final comment. If I have a function expecting arguments I can’t figure out how to satisfy, I will sometimes use or a similar trick with implicit parameters to see what type I need to provide. Since Haskell functions are pure and lazy, I can replace references to functions with their contents with a high degree of confidence that it will not change the semantics of my program. To that end, sometimes it’s easier to understand what’s going on by inlining the code by hand and seeing what your code turns into.
If something type-checks but doesn’t work, I’ll run the tests. If the coverage isn’t catching it, I add it. This is less common for me in Haskell than you’d think. If I can frame the test as an assertion about some property the code should satisfy like with I will do so. You can learn more about using QuickCheck in. Emacs.
vim. Sublime Text 2/3. My personal dotfiles. Wrapping up This is the end of our little journey in playing around with Haskell to process CSV data. Learning how to use abstractions like Foldable, Functor or use techniques like eta reduction takes practice! I have for learning Haskell which has been compiled based on my experiences learning and teaching Haskell with many people over the last year or so.
If you are curious and want to learn more, I strongly recommend you do a course of basic exercises first and then explore the way Haskell enables you think about your programs in terms of abstractions. Once you have the basics down, this can be done in a variety of ways. Some people like to attack practical problems, some like to follow along with white papers, some like to hammer out abstractions from scratch in focused exercises & examples. Things to do after finishing this article:.
and look at the many of Haskell package. More than anything else, my greatest wish would be that you develop a richer and more rewarding relationship with learning. Haskell has been a big part of this in my life. Special thanks to and for helping me test & edit this article. I couldn’t have gotten it together without their help.