Applied Clojure¶
Collections¶
List (adddition at the head), vectors (addition at the end) and queues (FIFO).
(def new-orders clojure.lang.PersistentQueue/EMPTY)
Use transient
and persistent!
if the transformation is local. The
library medley
from weavejester incorporate useful functions.
Collection accessing: use keyword first (:key m)
, or (m :key)
if it
is certain that m
is a map, otherwise if both might be null use
(get m
k)
. If possible avoid having a stack of left parentheses such as
((f)
x)
. Abuse of select-keys
to subset a map. If performance is
required, we can create custom collection by defining a type and
implementing the protocol, a custom printing is possible.
Use records and maps to describe your entities. Maps should be the default choice, unless you decide to use protocols and need performance for dispatch. Protocols and multimethods are the two ways for dispatching. Protocols are faster in Clojure, but multimethods are more flexible.
Processing Sequential Data.¶
We can use map
, filter
, reduce
to process sequence of values, but
it might not be efficient. Transducers are created to avoid the
concretion of the data structure.
A transducer (usually denoted by xf
or xform
) is a function that
transform a reducing function into an another reducing function. That is
xf: f -> g
where f
and g
have signature whatever, input ->
whatever
. Concrete example of reducing function are conj
with
whatever being a list, +
(with whatever being a number, and input a
number). See here for more
details.
The trick is you can define map, filter and other operations as reducing
function (reducing function are used in reduce
operations usually). It
is important to note that reducing function can actually grow the
whatever (see conj
).
We create them by omitting the coll
argument in the typical sequence
functions, e.g. (map f)
yields a transducer. Use the sequence
function to realize the transducer. The following calls are equal
(= (map #(* 2 %) (range 10))
(sequence (map #(* 2 %)) (range 10)))
If we need eagerness we could use into
(into [] (map #(* 2 %)) (range 10))
The benefit of transducer is intermediate values are not allocated and
there is a decoupling of the transformations with the reducer (reducing
function and reducible collection). They are also polymorphic. We can
compose transducers and reduction with transduce
(def moons-transform
(comp (filter planet?) (map :moons)))
(defn total-moons [entities]
(transduce moons-transform + 0 entities))
See understanding transducers for more details.
Duplicate removal with distinct
and dedupe
(only remove subsequent
duplicate and safer for large input). Use mapcat
instead of (-> map
flatten)
.
Reference, State and Mutation¶
Identity and state are two distinct notions. An identity is a sequence of immutable values, and the state is the actual value of an identity at a certain point of time. The challenge is to always display a single valid value to all the observer at the same time. There are two types of successions (mutation): atomic and transactions. An atomic transaction only cares about the change happening to the identity itself and not about the coordination of other identity. Transactional ensure that either all changes or none are performed.
There are two states: program and runtime states. Program state is concerned with mutation in the problem domain, whereas runtime facilitate the software's execution (e.g. connections to databases or network, config files). Runtime state is often unavoidable whereas program state should be minimized and access through API with curated methods rather than directly.
For managing change, we should build just enough to ensure the application's needs are met. Every side effect and mutable reference slows you down.
We should be responsible over our functions (make them pure) and make choices about what need to be managed. State is a series of snapshots of values (data) which allows to act responsibly when considering the presence of observers in other processes. In Clojure, observers have consistent set of values as of a particular instant thanks to Clojure's mutable references.
Use your cores¶
One of the problem is to send task of the main thread to be completed
asynchronously and retrieve the result (future
and promise
).
Tasks and workers for long lived task-oriented concurrency. We can also
use reducers
and core.async
(with channels and go blocks).
For agents, use send
for computational tasks and won't block for I/O
and send-off
for updates that might block for an arbitrary time
(thread pool will grow accordingly). The advantage is agent
can
maintain state compared to future
.
Use deref
or @
for retrieving back the value of a future
or
promise
. Promise are used to returns several values from a future
block. Use realized?
to check if a promise
is available, otherwise
it will block. The deref
function with an additional argument can
force timeout.
Use Java queues and workers for task oriented programs. This is for
coarse grained task parallelism. Queues
, threads
and executors
are
the tools from Java to perform a queue of incoming work or requests.
For fine-grained parallelism The pmap
function can be used for easy
parallelism, but the overhead might be consequential. The
clojure.core.reducer
library is the solution for parallelism
[fine-grained operations and memory efficient]. A reducer is
reducible collection combined with a reducing function. fold
is
used to perform the reduction [only vectors and maps can be folded in
parallel, but the serial version can be faster thanks to avoiding
intermediary values]. A reducer
splits the data into partition,
reduce the elements and then combine them. The reduce and the combine
functions can hence be different.
Concurrency (design the program as a set of concurrent threads of
execution) we can use core.async
.
Channels come in unbuffered, fixed buffered, dropping (discard new data)
and sliding buffer (discard old data). Creating a channel is done with
chan
the function. nil
can not be passed into channels (as it is the
value for saying the channel is closed). The important operations are
put
and take
. A full channel (once the buffer is complete) blocks a
thread if no process other process is the other end of the channel to
take
the value sent by put
. Backpressure is the efffect that fixed
sized buffers creates by making the producers block when trying to add
to a full queue. Traditionally channels are used in go block
.
In the Communicating Sequential Processes (CSP), process belongs to a
thread pool and are parked when not blocked by a channel operation
(>!
or <!
). Go blocks
are great for building pipelines of data
transformation.
core.async/pipelines
gives up the raw performance of fine-grained data
parallelism but yield a more flexible architecture. The function moves
the value from input to output channel with parallel transducer
execution.
Next step is to break a growing system into pieces using concurrency.
Designing components¶
Use of channels, better to receive and provide channels for interface.
In core.async
, a single go
block is to call the body of the go
block once asynchronously, while go-loop
is intended for looping,
unless we close the channel. go
blocks return a channel, which can be
used for pedestal.
A good design is to split an API layer and implementation layer with a record.
As for core.async
, there are three additional concept for channels
that are useful: pipeline, fan-in and fan-out. In a system, pipelines
link an output channel to an input channel (acting like as a conveyor
belt) and can possibly transform its input values with a transducer
(async, sync, blocking). The pipe
function should be used when no
transformation.
Fan-in channels gather the input of several channels and provide a
single output channel. merge
is a simple way to merge all the incoming
channels into a single output channels, but it can't be modified after
creation.
The mix
(for audio mix) function with its functions admix/unmix
allows channel to participate in the mix. Users can toggle
options for
each input channel: :pause
(no consumption nor inclusion in output
channel), :mute
(consumption but no inclusion), :solo
( if true,
only solo-ed channels in output channel mix, :pause
and :mute
ignored if this is the case).
Fan-out have three ways: mult
, pub/sub
, split
.
The mult
abstraction is multiply traffic from the input channel into
multiple output channels. Output channels (with different blocking
policy) can participate in the connection with tap/untap
(if a tap is
closed, it is removed from the mult
). All the receiving channels must
accept a value from the mult
before the mult
can move on to the next
value. This is where alternative buffering strategy are useful.
The pub/sub
allows to distribute the traffic through a partition
function and subscribers can inform to which partition value they want
to lisen to.
split
divides the traffic two channels based on a truthiness of a
predicate. split
is actually a pub/sub
with a partition function
providing only truthy/falsy
.
Compose Your Application¶
Taking things apart: usualy some portion of the code will work on the same data, or have the data has a common scope or lifetime, likelihehood of change from external requirement is similar are resource needed. If code is reusable when configured differently in more than one context, then it is a component.
Component should communicate with channels, but in order to set up the
system correctly, we need something to orchestrate it. Several library
exist, the book recommends Component
, but it has been super-seeded by
integrant. An example can be
found here
reitit/integrant.
Environment variable also should exist with different settings. The solution in the book are a bit old. Environ still seems to be good on clojure (jvm).
Testing¶
There are three ways to create tests in clojure: repl, example based, generative testing (properties check).
With REPL driven development, the example used for development are stored in a file (these are candidates for examples).
For example based, there is the expectations
library and the following
snippets
(deftest test-range-are
(testing "Testing range(endIndex)"
(are [expected endIndex
(= (range endIndex) expected)
'(0 1 2 3 4) 5
'() 0])))
Generative testing using
(ns generative-testing.core
(:require [clojure.test.check :as tc]
[clojure.test.check.generators :as gen]
[clojure.test.check.properties :as prop]))
I think nowadays we would use spec for it.
We are looking for invariants – properties that are always true. […] mathematical laws, relationships between inputs and outputs, round-trip or complementing functions, and comparing action effects.
Properties like identity, associativity, commutativity and idempotency are an excellent place to start.
Invariants are important because they reduce the number of case your code must consider.
Thinking in Clojure¶
-
Make Reasoned Choices: always compare trade-off of solutions (benefits and costs). Think first, then do. Careful treatment of decisions and weighed trade-off to fully understand the consequences.
-
Be Reasonable: code with clearly expressed intent, limited side effects, neatly separated concerns, and unambiguous naming. Simple.
-
Keep It Simple: Keep distinct concern as distinct as possible and avoid entangling concepts with other concepts.
The code can be reasoned about, test and implemented without any incidental complexity.
Entities are simplest when distinct and composable.
Domain functions avoid complexity by avoiding side effects and concerning themselves only with entities in their domain.
-
Build Just Enough: Keep complexity at bay and avoid overengineering.
-
Compose: compose component, and evaluate your code composability by using it from another component.
Results will be a set of tidy interfaces to distinct independent subsystems with clear communication channels. With a stable interface, a component can grow easily and adapt quickly.
-
Be Precise: avoid ambiguity and communicate clearly with others and your future self. Entities typify one concept. Functions effect a single transformation. Queries ask simple questions and return unambiguous results.
-
Use What Works: look for working libraries, solution in other languages or papers.
Link¶
-
tags :: clj core-async design