How to Structure a Clojure Web App 101

From a purely mechanical perspective there is a lot to teach. It uses multimethods to register lifecycle hooks, idiomatic use demands namespaced keywords, and in testing we've needed to incorporate special libraries.

None of that is fundamentally a problem though. All the libraries which do this sort of thing use some weirder part of Clojure's arsenal. For component it is records and protocols. For clip it is namespaced symbols and dynamic lookup. For donut it's a secret, more complex third thing.

What has been a challenge is explaining what exactly it is that these libraries do. Doing that - really doing that - requires a mountain of shared context that folks simply do not have.

This article is an attempt to convey some of that shared context. Apologies if it gets a bit ranty.

Ring

Basically the entire Clojure world has agreed to a specification called "ring" which says how these requests and responses translate to data structures in Clojure.

Clojure web servers are functions that take "ring requests" which look like the following

{:uri            "/echo"
 :request-method :get
 :headers        {}
 :body           ...
 :protocol       "HTTP/1.1"
 :remote-addr    "127.0.0.1"
 :server-port    80
 :content-length nil
 :query-string   nil
 :scheme         :http}

{:status  200
 :headers {"Content-Type" "application/json"}
 :body    "{\"success\":\"true\"}"}

(ns example
   (:require [ring.adapter.jetty :as jetty]))

(defn handler 
  [request]
  (cond 
    (= (:uri request) "/hello")
    {:status 200 
     :body   "Hello, World"}
    
    :else
    {:status 404}))

(defn start-server 
  [] 
  (jetty/run-server handler {:port 1234}))

So this code, as written, will run a Jetty server which responds to all requests to /hello with Hello, World and all other requests with a 404.

The REPL

One issue that is already relevant with preceding example, and will be a common theme going forward, is "REPL Friendliness."

Clojure and other Lisps have the unique property that the "unit" of code isn't a file, but instead an individual "form."

print("Start")
    
3di92d93209032

(println "Start")

903f903jf939cn34f934fj9j39f4

Unlike with the Python example, the very first println will actually run before a crash.

The reason for this is that the Clojure reader will evaluate each "form" one at a time. There is no full pass of the file before running code.

This enables a workflow where a developer has a file open in one window with the full contents of their code and another window open at the same time with their "live" program - the "REPL".

Through editor magic, a developer can then load new code one form at a time into the live program. If in doing so a function is redefined, then the new definition of the function will start to be used.

So with that context, what is "not REPL friendly" about the example server code?

(defn handler
   [request]
   (cond
      (= (:uri request) "/hello")
      {:status 200
       :body   "Hello, World"}

      :else
      {:status 404}))

Assuming that first we load the handler function, we will next load the start-server function.

(defn start-server
   []
   (jetty/run-server handler {:port  1234
                              :join? false}))

(start-server)

At this point, a developer might want to modify the handler function to respond to requests on the /marco route.

(defn handler
   [request]
   (cond
      (= (:uri request) "/hello")
      {:status 200
       :body   "Hello, World"}

      (= (:uri request) "/marco")
      {:status 200
       :body   "POLO!"}

      :else
      {:status 404}))

If they did this and tried making a request to /marco, the server would still respond with a 404.

The reason for this is that whenever start-server is called it will be passed the current "value" backing the handler function. Future updates won't be picked up unless the server is stopped and restarted.

(defn start-server
   []
   (jetty/run-server #'handler {:port  1234
                                :join? false}))

In this case, putting the #' in front of handler makes it so that whenever it is called the current value of the handler function will be used. If a developer were to re-load a new definition of handler into the REPL it would be immediately picked up and used.

This is what REPL friendly code looks like. It makes it easier for a developer to have changes picked up on the fly in a running program and rapidly experiment with new things.

There are other associated techniques like leaving a comment at the bottom of a file with code only intended to be used with the REPL.

(ns example
   (:require [ring.adapter.jetty :as jetty]))

(defn handler 
  [request]
  ...)

(defn start-server 
  [] 
  ...)

;; The Server will not start automatically, but a dev
;; can conveniently start it by putting their cursor in
;; the comment and loading the call into the repl
(comment
   (start-server))

Global Stateful Resources

Of course, most web apps are not written entirely in a single function. The most natural point at which to split out logic tends to be at handlers for different paths.

(ns example
   (:require [ring.adapter.jetty :as jetty]))

(defn hello-handler 
   [request]
   {:status 200
    :body   "Hello, World"})

(defn marco-handler
   [request]
   {:status 200
    :body   "POLO!"})

(defn handler
   [request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler request)

      (= (:uri request) "/marco")
      (marco-handler request)

      :else
      {:status 404}))

(defn start-server
   []
   (jetty/run-server #'handler {:port  1234
                                :join? false}))

(comment
   (start-server))

And of course the actual declarations of the routes can be separated from the code that starts the server, but that would get hard to follow here.

At this point most of the code is fairly easy to test. You just make fake requests, pass them to the handlers, and check that the responses are what you expect.

(ns example-test
   (:require [clojure.test :as t]
             [example]))
   
(t/deftest handler-test
  (t/testing "Request to /hello gets Hello, World"
     (let [response (example/handler {:uri "/hello"})]
        (t/is (= (:status response) 200))
        (t/is (= (:body response) "Hello, World"))))
        
  (t/testing "Request to /marco gets POLO!"
     (let [response (example/handler {:uri "/marco"})]
        (t/is (= (:status response) 200))
        (t/is (= (:body response) "POLO!"))))
        
  (t/testing "Request to unknown path gets 404"
     (let [response (example/handler {:uri "/jdkdawdoaddwadad"})]
        (t/is (= (:status response) 404)))))

No real programs can ever stay easy to test pure functions. Handling a request often implies the need for dependence on some "stateful resources" such as external services and connection pools.

External Services

As an example, lets say when you make a request to /marco we still want to respond with POLO!, but if the user specifies that they are not in a pool with a query string /marco?nopool then we want to respond with the entire Wikipedia page for Marco Polo.

(defn marco-handler
   [request]
   (if (= (:query-string request) "nopool")
      {:status 200 
       :body   (slurp "https://en.wikipedia.org/wiki/Marco_Polo")}
      {:status 200
       :body   "POLO!"}))

While we can still test this conveniently, the test will have an implicit dependence on Wikipedia being online. It also makes our tests slower than they need to be since we are making an actual http call.

(ns example-test
   (:require [clojure.string :as string]
             [clojure.test :as t]
             [example]))
             
(t/deftest marco-handler-test        
  (t/testing "Request to /marco gets POLO!"
     (let [response (example/marco-handler {:uri "/marco"})]
        (t/is (= (:status response) 200))
        (t/is (= (:body response) "POLO!"))))
        
  (t/testing "Request to /marco with no pool gets info"
     (let [response (example/marco-handler {:uri          "/marco"
                                            :query-string "nopool"})]
        (t/is (= (:status response) 200))
        (t/is (string/includes?
                (:body response) 
                "The Travels of Marco Polo")))))

This isn't ideal, but it could be worse. Imagine if you wanted to alert an admin every time the /hello route was called. A bit of a silly example, but calls to APIs like Sendgrid aren't unreasonable to do in response to some requests.

(defn hello-handler
   [request]
   (sendgrid/send-email "admin@website.com" "You got a user!")
   {:status 200
    :body   "Hello, World"})

The problem with the last solution, even though it does mechanically solve the issue, is that you need to know what external services a piece of code will use. Since our handlers are just taking a request, there is not enough information at call-sites or in the function header to say for sure.

(defn hello-handler
   [request]
   ;; Have to read every function this calls
   ;; to see what stateful stuff is going on...
   (some-other-code request))

So tests end up looking like the following, with pretty low confidence that everything has been stubbed out.

(with-redefs [sendgrid/send-email (constantly nil)]
   (t/testing ... ACTUAL TEST ...))

Connection Pools

Handlers also very often need to talk to a database. It is wasteful to make a new database connection on every request, so a really common technique is to keep a certain number of connections alive in a "pool" and re-use them over and over again.

What is common, and saddening, to find is a connection pool stored in a top-level constant and referenced by a large part of the codebase.

(ns example.db
   (:import (com.zaxxer.hikari
              HikariConfig
              HikariDataSource)))

(def pool (HikariDataSource. 
            (doto (HikariConfig.)
              (.setJdbcUrl "..."))))

(defn hello-handler
   [request]
   ;; Information like this can come from middleware.
   (let [user-id   (:user-id request)
         user-name (jdbc/execute-one! 
                      db/pool 
                      ["SELECT name FROM user 
                        WHERE user.user_id = ?"
                       user-id])]
     {:status 200
      :body   (str "Hello, " user-name)}))

Even assuming that, like DHH, you are fine with your tests hitting a real database this still creates some practical problems.

For one, if you edit the file where the connection is defined you might accidentally reload the constant and leak a bunch of connections. This isn't the most likely on a large project where you aren't touching this code that often, but over the course of a long lived REPL session it can be an annoying.

But also it is annoying logistically that the connection pool is established immediately when the code is loaded. If you Ahead-of-Time compile your Clojure code then you will pretty immediately want that to not be the case.

You can sidestep that last issue by putting the connection pool behind a "delay", which lazily starts the connection pool when it is needed.

(ns example.db)

(def pool (delay 
             (HikariDataSource. 
               (doto (HikariConfig.)
                 (.setJdbcUrl "...")))))

But now this detail changes how users have to access the actual pool. Usage sites have to add an @ to make sure the pool has been started and to retrieve it.

(defn hello-handler
   [request]
   (let [user-id   (:user-id request)
         user-name (jdbc/execute-one! 
                      @db/pool 
                      ["SELECT name FROM user 
                        WHERE user.user_id = ?"
                       user-id])]
     {:status 200
      :body   (str "Hello, " user-name)}))

Annoying, but that's not all. if you want to sub out the pool in a test fixture and maybe run tests in parallel then the whole pool needs to be dynamically re-bindable as well.

(def ^:dynamic 
   *pool* 
   (delay 
      (HikariDataSource. 
         (doto (HikariConfig.)
            (.setJdbcUrl "...")))))

(defn hello-handler
   [request]
   (let [user-id   (:user-id request)
         user-name (jdbc/execute-one! 
                      @db/*pool*
                      ["SELECT name FROM user 
                        WHERE user.user_id = ?"
                       user-id])]
     {:status 200
      :body   (str "Hello, " user-name)}))

(binding [db/*pool* (delay (make-test-pool))]
   (insert-user 123 "bob")
   (let [response (hello-handler {:user-id 123})]
      (t/is (= (:body response)
               "Hello, bob"))))

All of that is workable - you can use macros and helper functions to alleviate the syntax ugliness and generally speaking your app will just have one database.

But it also is not that uncommon for an app to have two databases. Usually one SQL and one Redis-like. And while it's not as hard as for arbitrary external services - you still don't really know from a call-site whether you need to establish a test database before calling it in a test.

Inversion of Control

The general shape of the solution to those problems is to not have "global" stateful resources.

For external services, this means making an actual object to pass as the first argument to calls.

If the service is like Sendgrid, this could be a convenient place to put information like your API key or make a persistent http client.

(defn make-sendgrid-client 
   [api-key]
   {:api-key api-key
    :client  (hato/build-http-client {:connect-timeout 10000
                                      :redirect-policy :always})})

(defn send-email 
   [sendgrid-client]
   (hato/post (:client sendgrid-client) "/send-email"))

But even if the service is "stupid" and requires no authentication or special treatment like Wikipedia, there is still value.

(defn make-wikipedia-client
   []
   ;; Nothing really to put...
   {:name "Wikipedia Client"})

(defn get-marco-polo-info
  [wikipedia-client]
  (slurp "https://en.wikipedia.org/wiki/Marco_Polo"))

The value being in the fact that having something as a first argument means that later on you have the ability to refactor calls to be behind some dispatch mechanism like a protocol.

(defprotocol WikipediaClient
   (get-marco-polo-info [_]))

(defn make-wikipedia-client
   []
   (reify WikipediaClient
      (get-marco-polo-info [_]
         (slurp "https://en.wikipedia.org/wiki/Marco_Polo"))))

(def fake-wikipedia
   (reify WikipediaClient
      (get-marco-polo-info [_]
         "was a dude, i guess?")))

For connection pools, there is already an actual object to pass so that isn't an issue. The same "maybe make it a protocol later" strategy is applicable to that sort of resource as well.

Then in all the code that wants these dependencies, just expect them to be given as arguments.

(defn marco-handler
   [wikipedia-client request]
   (if (= (:query-string request) "nopool")
      {:status 200 
       :body   (wikipedia/get-marco-polo-info wikipedia-client)}
      {:status 200
       :body   "POLO!"}))

(ns example-test
   (:require [clojure.string :as string]
             [clojure.test :as t]
             [example]))
             
(t/deftest marco-handler-test 
   (let [mock-wikipedia (reify WikipediaClient
                           (get-marco-polo-info [_]
                              "INFO"))]       
     (t/testing "Request to /marco gets POLO!"
        (let [response (example/marco-handler 
                         mock-wikipedia
                         {:uri "/marco"})]
           (t/is (= (:status response) 200))
           (t/is (= (:body response) "POLO!"))))
           
     (t/testing "Request to /marco with no pool gets info"
        (let [response (example/marco-handler 
                        mock-wikipedia
                        {:uri          "/marco"
                         :query-string "nopool"})]
           (t/is (= (:status response) 200))
           (t/is (= (:body response) "INFO")))))

This technique - where we get dependencies as arguments instead of making them locally or getting them from some global place - is commonly called "Inversion of Control."

Dependency Injection and "The System"

While this is a concrete improvement - we can directly see what the dependencies of a process are in the argument list - there are still some unresolved issues.

Let's say our hello-handler wants to use a sendgrid-service and the database pool and our marco-handler wants to use a wikipedia-service and the database pool.

(defn hello-handler
   [sendgrid-service pool request]
   ...)

(defn marco-handler
   [wikipedia-service pool request]
   ...)

This implies that the root handler function will have access to all of these things and pass them down as needed.

(defn handler
   [sendgrid-service wikipedia-service pool request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler sendgrid-service pool request)

      (= (:uri request) "/marco")
      (marco-handler wikipedia-service pool request)

      :else
      {:status 404}))

With just three stateful components and two handlers this is manageable, but beyond three arguments using positional arguments is overly burdensome and error-prone.

(defn handler
   [sendgrid-service 
    wikipedia-service 
    pool
    some-service
    other-thing
    oh-no
    request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler sendgrid-service pool request)

      (= (:uri request) "/marco")
      (marco-handler wikipedia-service pool request)

      (= (:uri request) "/thing")
      (some-handler some-service sendgrid-service request)

      (= (:uri request) "/thing2")
      (some-handler some-service 
                    oh-no 
                    sendgrid-service 
                    other-thing  
                    request)
      
      (= (:uri request) "/thing3")
      (some-handler some-service 
                    oh-no
                    other-thing  
                    request)
      
      ;; ... * 100
      
      :else
      {:status 404}))

The solution is to put all stateful components into a single map, popularly called the "system."

{:sendgrid-service  sendgrid-service
 :wikipedia-service wikipedia-service
 :pool              pool}

(defn handler
   [system request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler system request)

      (= (:uri request) "/marco")
      (marco-handler system request)

      :else
      {:status 404}))

and individual handlers "declare" which of these components they are interested in by only pulling those keys out of the map.

(defn hello-handler
   [{:keys [sendgrid-service pool]} request]
   ...)

(defn marco-handler
   [{:keys [wikipedia-service pool]} request]
   ...)

This way it is still declared up front what stateful components some bit of code needs to do its work, but the "wiring" code for each entry-point can stay uniform.

This technique, where all a piece of code needs to do to get access to a resource is "declare" that they want it is usually called "Dependency Injection."

Important to note also that after this "entry-point" code should generally pass down things explicitly. Passing the whole system is a hand-gun pointed at a foot-foot.

(defn marco-handler
   [{:keys [wikipedia-service pool]
     :as system} request]
   ...
   ;; Back to not knowing what this could be doing deep down...
   (some-code system)
   ...)

Starting and Stopping the System

There needs to be some code that actually starts up all the components of the system.

(defn start-system 
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)]
     {:config            config
      :sendgrid-service  sendgrid-service
      :wikipedia-service wikipedia-service
      :pool              pool}))

Some stateful bits might depend on other stateful bits to get started. In the above example the hypothetical Sendgrid service and database connection pool depend on some config object which is loaded earlier.

Clearest example of that is the server instance itself. If it is to be put into the system, then it will need all the things started before it.

(defn start-system 
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)
        system-so-far     {:config            config
                           :sendgrid-service  sendgrid-service
                           :wikipedia-service wikipedia-service
                           :pool              pool}
        server            (start-server system-so-far)]
     (assoc system-so-far :server server)))

(defn hello-handler
   [{:keys [sendgrid-service pool]} request]
   ...)

(defn marco-handler
   [{:keys [wikipedia-service pool]} request]
   ...)

(defn handler
   [system request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler system request)

      (= (:uri request) "/marco")
      (marco-handler system request)

      :else
      {:status 404}))

(defn start-server
   [system]
   (jetty/run-server 
      (partial #'handler system) 
      {:port  1234
       :join? false}))

The reason you would want the server to be part of the system ties back to the REPL workflow. If you change or add some stateful component you might want to stop an old running system and start up a new one. The running http server is likely to be one of these things you would want to restart.

To properly do this, every stateful resource which might have shutdown logic needs to provide a function which shuts it down.

(defn stop-server 
   [server]
   (.stop server))

And then some larger function needs to be able to stop each component of the system, doing so in the reverse order they were started ideally.

(defn stop-system 
   [system]
   (stop-server (:server system))
   (stop-connection-pool (:pool system))
   ;; In this hypothetical the sendgrid service
   ;; has shutdown logic, but the wikipedia service does not.
   (stop-sendgrid-service (:sendgrid-service system)))

Then to facilitate working with the "current system" in the REPL it does need to be bound to some global value.

(ns example.repl
   (:require [example.system :as system]))

(def system nil)

(defn start-system!
   []
   (alter-var-root #'system (constantly (system/start-system))))

(defn stop-system!
   []
   (system/stop-system system)
   (alter-var-root #'system (constantly nil)))

(comment
   (start-system!)

   (stop-system!))

A developer can then reference example.repl/system in their REPL session to see the currently running system and pull out values to test calls to functions they are playing with.

(some-db-function 
   (:pool example.repl/system) 
   123 
   "abc")

And while this does give birth to a global stateful thing, the problems of that are fairly mitigated.

For one, it can reasonably exist only in development. In the code above there is a distinct namespace just for giving a start-system! and stop-system! to be used in development. On the tooling side you can even make sure this file isn't included in production builds with something like deps.edn aliases.

;; Assuming example/repl.clj is under dev-src
{:paths ["src"]
 :aliases {:dev {:paths ["dev-src"]}}}

So what is integrant for?

As I mentioned before, you need to start all of your stateful components in the right order and stop them all in the reverse of that order.

(defn start-system
   []
   (let [config            (load-config)
         sendgrid-service  (make-sendgrid-service config)
         wikipedia-service (make-wikipedia-service)
         pool              (make-pool config)
         system-so-far     {:config            config
                            :sendgrid-service  sendgrid-service
                            :wikipedia-service wikipedia-service
                            :pool              pool}
         server            (start-server system-so-far)]
      (assoc system-so-far :server server)))

(defn stop-system 
   [system]
   (stop-server (:server system))
   (stop-connection-pool (:pool system))
   (stop-sendgrid-service (:sendgrid-service system)))

A workable metaphor for this is that each component "depends on" the components that need to start before it and that these dependencies form a graph.

Integrant, and libraries like it, provide ways to explicitly model that graph of dependencies.

This reduces boilerplate and potential error-prone-ness with the start-system and stop-system functions that logically need to exist.

{:config            {}
 :sendgrid-service  {:config (ig/ref :config)}
 :wikipedia-service {}
 :pool              {:config (ig/ref :config)}
 :server            {:config            (ig/ref :config)
                     :sendgrid-service  (ig/ref :sendgrid-service)
                     :wikipedia-service (ig/ref :wikipedia-service)}}

and the information about how each thing is started and stopped is registered with the ig/init-key and ig/halt-key multimethods.

(defmethod ig/init-key
  :pool
  [_ {:keys [config]}]
  (HikariDataSource.
    (doto (HikariConfig.)
      (.setJdbcUrl (config/lookup config :JDBC_URL)))))

(defmethod ig/halt-key!
  :pool
  [_ pool]
  (.close pool))

Starting the system now means calling ig/init-key on everything in graph traversal order and calling ig/halt-key in the reverse order.

Partially because multimethod registration is global - and partially because its good practice regardless - the keys for different integrant components are generally made namespaced.

(ns example.system
  (:require [integrant.core :as ig]))

(def system-map
    {::config            {}
     ::sendgrid-service  {::config (ig/ref ::config)}
     ::wikipedia-service {}
     ::pool              {::config (ig/ref ::config)}
     ::server            
     {::config            (ig/ref ::config)
      ::sendgrid-service  (ig/ref ::sendgrid-service)
      ::wikipedia-service (ig/ref ::wikipedia-service)}})

This helps avoid conflicts with multimethod registration, but also can be used in conjunction with features like as-alias to add some semantic and syntactic distinction to pulling components out of the system.

(ns example.handlers
  ;; Without as-alias it would be really easy
  ;; to get circular dependencies doing this.
  (:require [example.system :as-alias system]))

(defn some-handler
  [{::system/keys [pool server]} request]
  ...)

Again, I find it important to note that integrant is just one of many libraries that do this "automatic wiring."

Many have sprung up over the years, and it seems like there are more yet to come. There are tradeoffs and quirks to all of them.

The important idea is just to pass things down as arguments and to start with the system maps at entry-points.

Tying it all together

(defn do-thing
  [name]
  (slurp (str "https://website.com/get-info/" name)))

(def pool (make-db-pool))

(defn lookup-chair 
  [chair-id]
  (jdbc/execute! 
    pool 
    ["SELECT * FROM chair
      WHERE chair.chair_id = ?"]))

(defn root-handler
  [request]
  ...)

(defn start-server 
  []
  (jetty/run-server #'root-handler {:port 1234}))

(defn root-handler
  [request]
  ...)

(defn start-server 
  []
  ...)

(comment
  (start-server))

(defn lookup-chair 
  [pool chair-id]
  (jdbc/execute! 
    pool 
    ["SELECT * FROM chair
      WHERE chair.chair_id = ?"]))

(defn start-system
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)
        system-so-far     {:config            config
                           :sendgrid-service  sendgrid-service
                           :wikipedia-service wikipedia-service
                           :pool              pool}
        server            (start-server system-so-far)]
    (assoc system-so-far :server server)))

(def system nil)

(defn start-system! 
  []
  (alter-var-root #'system ...))

(defn stop-system!
  []
  (alter-var-root #'system ...))

(comment
  (start-system!)

  (stop-system!))

(defn hello-handler 
  [{:keys [pool]} request]
  ...)

(ns example.system)

(defn start-system
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)
        system-so-far     {::config            config
                           ::sendgrid-service  sendgrid-service
                           ::wikipedia-service wikipedia-service
                           ::pool              pool}
        server            (start-server system-so-far)]
    (assoc system-so-far ::server server)))

(ns example.handlers
  (:require [example.system :as-alias system]))

(defn hello-handler 
  [{::system/keys [pool]} request]
  ...)

Expand the section above for further elaboration. Brag about your holiday plans in the comments below.

How to Structure a Clojure Web App 101

Ring

The REPL

Global Stateful Resources

External Services

Connection Pools

Inversion of Control

Dependency Injection and "The System"

Starting and Stopping the System

So what is integrant for?

Tying it all together

Different Places of Injection

Is testing really important enough to do all this?

What about mount?

Contract Narrowing

<- Index