Defining Types for a Simple HTTP Server

Nov 3

In the last several months, we’ve gone through solutions for a multitude of LeetCode problems in Haskell. Practicing problems like these is a great step towards learning a new language. However, you’ll only get so far solving contrived problems with no extra programming context.

Another great step you can take to level up your programming skills is to write common tools from scratch. This forces you to tackle a larger context than simply the inputs and outputs of a single function. You’ll also get more familiar with techniques that are entirely absent from LeetCode problems, like filesystem operations and network mechanics. This is beneficial whether you’re learning more with your primary language or getting started with a new language.

We’re going to spend the rest of the year writing a couple small projects like this in Haskell. We’ll start by writing a simple HTTP Server in these first few weeks. Then we’ll try something more complicated.

What you’ll find with projects like these is that parsing is extremely important. In a LeetCode problem, you’re typically receiving pre-structured input data. When you’re writing a tool from scratch, your input is more often a stream of unstructured data from a file or the network, and one of your main jobs is making sense of that data! To learn some great techniques for parsing in Haskell, you should sign up for our course, Solve.hs! In Module 4, you’ll learn all about the Megaparsec library that we’ll use in this series!

Outlining Our Server

Before we dive into any code, let’s outline the basic expectations for our server - what do we expect it to do? We’re going to keep things very simple.

Our program should start a server listening on port 3000 When a user pings our server with a valid HTTP request, we should reply with a valid HTTP response using the code “200 OK”. This response should have a simple body like “This is the response body!” If we receive an invalid HTTP request, we should reply with a valid HTTP response using the code “400 Bad Request”. This 400 response should give an error message in the body.

Now there are many libraries out there for writing HTTP Servers. In fact, if you take our Practical Haskell course, you’ll learn about Servant, which uses some really cool type-level mechanics that are unique to Haskell! By using a server library, you could get all this functionality in about 10-20 lines of code (if that).

But when you’re writing something “from scratch”, you want to limit which libraries you use, so that you can focus on learning some of the lower level details. In our case, we want to focus on the details of the HTTP Protocol itself. Our objective will be to improve our understanding of the message format behind HTTP requests and responses.

This means we’re going to write our own parsing code for HTTP requests, and our own serialization code for responses. We’ll follow this guide for HTTP version 1.1. We’ll use this to help structure our data, but we won’t get too complicated. We’ll aim to correctly parse (almost) all valid requests. But as we’ll explain below, there are a lot of rules we won’t enforce, so our server will “accept” a wide variety of “invalid” requests.

Defining Types

The first thing we want to do when writing a parser is define the types of our system. This is especially true in Haskell, where it’s easy for us to define the structure of new types, and to combine our elements using sum and product types.

If you’re using open-source documentation, coming up with types is usually easy! The docs will often lay out the structure for you. For example, the the doc linked above defines an HTTP Message like so:

HTTP-message = Request | Response; HTTP/1.1 messages

We could translate this into Haskell types:

data HttpRequest = HttpRequest

data HttpResponse = HttpResponse

data HttpMessage =
  RequestMessage HttpRequest |
  ResponseMessage HttpResponse

Of course our request and response types are incomplete, and we’ll fill them in next. If we wanted, we could define each field as we parse it. When you’re writing an entirely new system, you might take this approach. Once again though, good documentation can give us an overview of the entire type. Let’s see how we can use the documentation to produce a complete “request” type.

HTTP Request

For the request, we can read the following definition in the docs:

Request       = Request-Line;
                *(( general-header;
                  | request-header;
                  | entity-header)
                 CRLF);
                CRLF
                [ message-body ];

Note that the CRLF items refer to the consecutive characters \r\n, a “carriage return” and “line feed” (AKA “new line character”). We read the full definition as 4 parts.

The request line (we’ll see below what information this has)
0 or more headers, each terminated by CRLF. There are 3 types of headers, but they all have the same structure, as we’ll see.
A mandatory CRLF separating the headers from the body
An optional message body

The Request Line

This still isn’t specific enough to write our types. Let’s examine the “request line” for more details.

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

The request line has 3 components and 3 separators (SP means a single space character ’ ‘). The first component is the “method” of the request (e.g. “Get”, “Post”). The protocol defines a series of valid methods for a request.

Method = "OPTIONS"
       | "GET"
       | "HEAD"
       | "POST"
       | "PUT"
       | "DELETE"
       | "TRACE"
       | "CONNECT"
       | extension-method
extension-method = token

If we ignore the “extension” method, we can make a simple enumerated type for the different methods, and add this as the first field in our request!

data HttpMethod =
    HttpOptions | HttpGet | HttpHead | HttpPost | HttpPut |
    HttpDelete | HttpTrace | HttpConnect
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    ...
    } deriving (Show, Eq)

The “request URI” has a few different options as well.

Request-URI    = "*" | absoluteURI | abs_path | authority

Each of these has a particular structure and rules, but we’re going to simplify it considerably. We’ll just treat the URI as a ByteString, with the only restriction being that it can’t have any “space” characters, since the space is the separator.

One of the biggest gains you’ll get from a good HTTP library is breaking down request URIs into component parts, like path components and query parameters. The Servant library does this very well.

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    ...
    } deriving (Show, Eq)

The last item in the request line is the “HTTP Version”. Here’s the spec from the documentation:

HTTP-Version   = "HTTP" "/" 1*DIGIT "." 1*DIGIT

The two values we care about are the major and minor version numbers. For example, HTTP/1.0 gives the major version 1 and the minor version 0. As a practical matter, we only care about very small integer (<256), data-preserve-html-node="true" so each of these. So we can represent the version of the request with a tuple (Word8, Word8).

import Data.Word (Word8)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    ...
    } deriving (Show, Eq)

So now our type is representing all the parts of the request line. Let’s move on to the rest of the request.

Headers & Body

Now let’s tackle headers. As we mentioned before, there are several types of headers (general, request, response, entity), but they all have the same basic structure. Here is that structure:

message-header = field-name ":" [ field-value ]
 field-name     = token
 field-value    = *( field-content | LWS )
 field-content  = <the OCTETs making up the field-value
                  and consisting of either *TEXT or combinations
                  of token, separators, and quoted-string>

There are references to LWS, which is “leading white space”. But at a basic level, a header consists of a “name” and a “value”, separated by a colon. We’ll treat both the name and value as bytestrings. Then we want to use some kind of a map to match the names with the values. So we’ll add this field to our type:

import qualified Data.HashMap.Lazy as HM

newtype HttpHeaders = HttpHeaders
    (HM.HashMap ByteString ByteString)
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    , requestHeaders :: HttpHeaders
    ...
    } deriving (Show, Eq)

We use a newtype to package this map away in a type-safe manner.

Finally, we have the “Body” of the request. In general, this is simply a ByteString. We could represent empty request bodies with an empty bytestring. But since there’s a meaningful semantic difference between a request that has a body and one that doesn’t, we can also use a Maybe value.

data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)

This completes our request type!

The Response Type

Any server that receives a request should be able to produce a valid response, so we need to define that type as well. The good news is that the documentation shows that a response is very similar to a request in its structure:

Response = Status-Line;
           *(( general-header;
           | response-header;
           | entity-header ) CRLF);
           CRLF
           [ message-body ]

There are only two differences. First a response has a “status line” instead of a “request line”. Second, it has “response-header” as an option instead of “request-header”. This second difference doesn’t affect our type, so we’ll go ahead and start outlining the response like this:

data HttpResponse = HttpResponse
    { ...
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)

Now we just have to understand the response line. Here is its specification:

Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

This has a similar structure to the request line, but different data in a different order. The HTTP version comes first. Then comes a status code (e.g. 200 = OK, 400 = client error, etc.). Finally, we have a “reason” for the response code (e.g. “OK”, “Bad Request”, “Forbidden”).

We’re already representing the version as (Word8, Word8). The status code is a straightforward Int, and the reason is just going to be a Bytestring. So it’s easy to fill out the rest of this response type:

data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)

Now we have our fundamental types! Here’s the complete code for our request and response types, including imports and subtypes:

import Data.Word (Word8)
import qualified Data.HashMap.Lazy as HM
import Data.ByteString.Lazy (ByteString)

data HttpMethod =
    HttpOptions | HttpGet | HttpHead | HttpPost | HttpPut |
    HttpDelete | HttpTrace | HttpConnect
    deriving (Show, Eq)

newtype HttpHeaders = HttpHeaders
    (HM.HashMap ByteString ByteString)
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    , requestHeaders :: HttpHeaders
    , requestBody :: Maybe ByteString
    }
    deriving (Show, Eq)

data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)

Conclusion

That’s all for the first part of this series. Next week in Part 2, we’ll write code to parse a request on our server using Megaparsec. For an in-depth tutorial on parsing in Haskell, including using this powerful library, you should sign up for Solve.hs, our Haskell problem solving course! Module 4 goes into a lot of detail on parsing, and allows you to build your own parser from scratch!

James Bowen