JobsPortal logo

Search Jobs In Finland

Pitfalls and bumps in Clojure’s Extensible Data Notation (EDN)

Published date more than one year ago
Posted: more than one year ago
Company Nitor
Company: Nitor
Location Helsinki
Location: Helsinki

Let me go through some points we've bumped into at Nitor.

Note: most of these problems don’t apply when using EDN as a configuration language, where it’s written by a human. However if you try to do programmatic transformations of EDN config files you might get bitten by these.

1. There's no way to generate correct EDN

There is a safe EDN reading library that implements the EDN spec: clojure.edn.

But how would I generate EDN? The internet tells me to just use pr-str. It's that simple!

1A. Namespaced maps

Starting with Clojure 1.9, namespaced maps are printed using special syntax:

user=> (pr-str {:foo/bar 1 :foo/quux 2}) "#:foo{:bar 1, :quux 2}"

However this syntax isn't part of the EDN spec. Weirdly enough, clojure.edn has been updated to support this syntax but the spec hasn't. Of course older versions of clojure.edn do not support it either: (e.g. the one shipped with Clojure 1.8):

user=> (clojure.edn/read-string " #:foo{:bar 1, :quux 2}") RuntimeException No dispatch macro for: : clojure.lang.Util.runtimeException (Util.java:221)

You can work around this problem by binding *print-namespace-maps* to false when generating EDN.

(Sidenote: this caused a bug where newer versions of Leiningen were unable to produce jars that worked with older versions of Clojure)

1B. Print-length

Many Clojure IDEs, for example Cider for Emacs, use the *print-length* feature in Clojure to truncate printing of big objects. If you then for example run your tests via an IDE like this, generating EDN in those tests will be broken:

=> (pr-str (repeat 1000 1)) "(1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...)" => (count (read-string (pr-str (repeat 1000 1)))) 101 => *print-length* 100

(Other similar variables that affect printing are *print-level*, *print-meta* and *print-dup*.)

1C. NaN

Some valid Clojure values don’t round-trip as EDN, that is, they get read back differently from what was written.

The Not-a-Number floating point value specified by IEEE754 has been an endless source of fun. One area where it keeps cropping up is the fact that JSON can't represent NaN even though it can represent other floating point values. See for example Stack Overflow.

The situation here mirrors the handling of namespaced maps. Since Clojure 1.9, NaNs are printed as ##NaN, and clojure.edn parses these correctly. However this syntax is not part of the EDN spec, and is not backwards compatible:

clojure1.8=> (clojure.edn/read-string "##NaN") RuntimeException No dispatch macro for: # clojure.lang.Util.runtimeException (Util.java:221)

(Previously NaN got encoded as NaN, which is valid EDN, but a symbol instead of a number.)

1D. Random objects

pr-str falls back to calling the toString method for Java Objects it encounters. This means that you silently get invalid EDN for things like arrays:

=> (pr-str (int-array 3)) ;; arrays "#object[\"[I\" 0x7dc04b81 \"[I@7dc04b81\"]"

On the ClojureScript side of the fence you sometimes get nice things like #js {:a 1, :b 2} and sometimes things like #object[HTMLCollection [object HTMLCollection]]

1E. Printing while printing

pr-str is implemented by rebinding *out*, which means that if you pr-str a lazy sequence, and your lazy sequence happens to print as a side-effect, you get broken EDN:

=> (pr-str (for [i [1 2]] (do (println "handling" i) (+ i 1)))) "(handling 1\nhandling 2\n2 3)"

Everybody knows mixing laziness and side-effects is bad, but it’s easy to mix e.g. laziness and logging in Clojure and not notice until your EDN API breaks.

1F. Keywords with spaces

Another, rarer, example of values that don’t round-trip is keywords with spaces in them:

user=> (def hello-world (keyword "hello world")) #'user/hello-world user=> hello-world :hello world user=> (pr-str [hello-world]) "[:hello world]" user=> (first (clojure.edn/read-string (pr-str [hello-world]))) :hello


2. Why are we here?

The problem is that EDN is supposed to be a serialization format. Serialization formats are serious things, and you should pay serious attention to reliability and backwards compatibility. However, Clojure's print-method (the thing behind pr-str) is a developer convenience feature - it’s meant for interactive debugging and human consumption. So it gets nice features that break when you try to use it for serialization.

For reading we have clojure.edn since Clojure’s read (the counterpart of print-method) is so unsafe you couldn't trust it with network input. What we need is a clojure.edn/generate-string because pr-str is so fragile we can't trust it with network output.

The current situation in EDN is the antithesis of Postel's law :

Be conservative in what you send, be liberal in what you accept

If you want a nice full-stack Clojure experience, use Transit. It has a spec, and print and read functions come from the same library. Transit writing also fails if it encounters something it doesn't know about, instead of just generating #object["[I" 0x23c76497 "[I@23c76497"] nonsense. Also, Transit is designed to be fast in the browser, so you avoid the performance problems associated with parsing EDN in CLJS.

If you want a robust serialization format, consider sticking with JSON. It's widely supported and fast, you just need to add a bit of code to coerce Clojure values to/from JSON. Plumatic Schema can help with these coercions. It seems similar tooling is also available for Clojure spec.

If you're persisting Clojure data to disk for internal purposes, you might get along fine with print-dup, but you need to know its pitfalls, too.


PS. swap! yourself assoc :employer Nitor :position :Clojurist

Author

Joel Kaasinen prefers to sail, ski or climb, but when he programs he prefers it functional.