Another Way Not to Sign JSON

By David Buchanan, 27^th December 2023

Signing JSON sucks. Don't do it. But if you must, you have two main options:

Serialize your JSON to bytes, sign the bytes, and transmit the bytes+signature.
Serialize your JSON to a canonical byte representation, sign those bytes, and transmit your JSON+signature.

1 is the most straight-forward option, but now you need to ship those bytes around verbatim. You can't transmit your message as plain JSON, because you can't rely on the receiver being able to re-serialize it to the exact bytes that you signed, in the general case. In-context, this might mean now you have to base64-encode your serialized bytes, or escape them as a string, incurring size overheads. Arguably this isn't signing JSON anymore, merely signing bytes, but that's great news because like I said, signing JSON sucks.

2 grants you more flexibility because, as long as your canonicalization rules are watertight and correctly implemented at both ends, you can still transmit your message as JSON. This works in theory, but in practice, canonicalization is fiddly, and the JSON parsers and serializers you'll find in standard libraries will fight your attempts to do it correctly.

These approaches are both detailed in "How (not) to sign a JSON object," along with a third option which amounts to "use something other than JSON," which I'm discounting here because it's really just a way to avoid the problem—which you should absolutely do if you can, but this article assumes you've decided you really do need to be signing JSON.

If option 1 works for your use case, you should probably take it. This is more or less what JWT does.

If you decide to go with option 2, good luck. Consider hiring a team of spiritual consultants to pray for your codebase.

But what if there was another way?

A Secret Third Way (Not) to Sign JSON

Here's the idea: Rather than signing a specific serialization, we want to sign the information itself.

What do I mean by that? Consider these two JSON objects:

{
	"hello": "a",
	"world": "b"
}

{
	"world": "b",
	"hello": "a"
}

As JSON objects, these both convey the same information. In most sensible programming languages, they'll compare as equal objects. That's because JSON doesn't care about the order of object keys (strictly speaking, an implementation is allowed to expose order information to programmers, but generally they don't).

Most (all?) canonicalization schemes will sort the object keys, so you'll end up signing the same bytes in either case. But as I mentioned earlier, canonicalization is really hard to get right, so I want to avoid it.

Sorting is also annoying because it realistically means holding the whole object in memory and performing an $O (n \log n)$ sort algorithm on the keys. In most contexts this is no trouble at all, but it might open you up to resource exhaustion DoS attacks unless you apply sensible limits.

Order-Agnostic Hashing

What if we could define a hash function that, without needing to pre-sort the data, resulted in the same hash regardless of order? This is, in fact, a solved problem: Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking, Clarke et al., ASIACRYPT 2003.

We're uninterested in memory integrity checking, but the rest is exactly what we want. To quote from the abstract:

We introduce a new cryptographic tool: multiset hash functions. Unlike standard hash functions which take strings as input, multiset hash functions operate on multisets (or sets). They map multisets of arbitrary finite size to strings (hashes) of fixed length. They are incremental in that, when new members are added to the multiset, the hash can be updated in time proportional to the change.

A multiset is just a regular set where each member can occur more than once. If we consider a JSON object to be a set of $(k, v)$ pairs, this solves our problem too. For an approachable explanation of how multiset hashing works, along with a concrete implementation as a rust library, I highly recommend checking out this article.

Hashing the rest of the JSON

Now, JSON is more than just Objects; there are other types too. I'll informally define the whole hashing scheme recursively, like so:

The hash of a String is the hash of its UTF-8 byte serialization.
The hash of a Number is the hash of its little-endian IEEE 754 binary64 representation.
The hashes of true, false and null are arbitrary fixed values.
The hash of an Array is the hash of the concatenation of the hashes of each entry.
The hash of an Object is the multiset hash of each $(k, v)$ pair. Each pair is hashed as $H (H (k) ‖ H (v))$ (there's probably room for optimization here, but I'd rather play it safe.)

Hash domain separation is used for each type, to ensure that there are no collisions between, for example, certain numbers and corresponding strings of length 8.

Due to the recursive nature of this definition, it logically forms a Merkle Tree when you parse nested objects.

Recall that our original goal was to sign JSON. We've got a hash (i.e. the merkle root), so all that remains is to sign that hash, and voila!

Concrete Implementation

A prototype-quality Python implementation can be found in this gist. It's concise enough that I could easily paste it into this article, but I figured I might want to update it in the future, and I don't want to leave stale code hanging around.

For my implementation of multiset hashing I used the ed25519 curve as the Group (Edit: this was a bad choice! See addendum), due to its relative efficiency, and because hash-to-curve functionality is conveniently exposed in the pynacl bindings (although at time of writing, this feature hasn't made it into a point release, so you need to install from git). I used sha256 as the hash function because its output size corresponds to what pynacl's hash-to-curve implementation expects.

I'd like to emphasize that I'm not a cryptographer, and in particular, I feel out of my depth here. This is well into "rolling your own crypto" territory. Consider this to merely be an exploration of an idea, and not a serious suggestion for use in the real world.

Thoughts

The key advantages of this approach are:

There are no explicit canonicalization rules (with the exception of those implicitly present in UTF-8 encoding), which means in theory, fewer edge-cases and fewer implementation bugs.
Concise implementation.
It's $O (n)$ , compared to the $O (n \log n)$ required to sort Object keys in a typical canonicalization scheme.
Due to the merkle-tree construction, you could theoretically parallelize it. You could also parallelize the multiset hashing process itself, by arranging the curve point additions in a tree structure, e.g. $((a + b) + (c + d))$ .
It can theoretically be implemented as a "streamed" algorithm consuming $O (1)$ memory (aka an Online Algorithm), but only if maximum object nesting depth is bounded (otherwise you need $O (n)$ memory, where $n$ is nesting depth).

But, there are several caveats:

There are probably some edge cases relating to IEEE 754 binary64 encoding that I haven't fully considered. You could perhaps constrain inputs to integers within some range.
Although it wins in terms of big-O, it probably loses in wall-time for most real-world inputs (I haven't done any benchmarks yet!)
Hash-to-curve is a relatively exotic primitive, and isn't present in many cryptography libraries yet.

One big question is what to do about duplicate Object keys. In my implementation I forbid them, but this approach could still work if duplicate keys were present (due to the multiset construction). The problem is, many environments just don't have an easy way to parse or even detect duplicate object keys, and just silently ignore all but the last. Ignoring some portion of the input is extremely undesirable for a signature scheme!

For example, in JavaScript, JSON.parse('{"a":"","a":"x"}') evaluates to { a: "x" }, and that behavior (among several alternatives) is allowed by "the" JSON spec. The only way to fix that would be to bring your own stricter JSON parser—which is totally possible, but it goes against my initial goal of trying to avoid complexity.

I say "the" JSON spec because there's more than one, and not all implementations conform to any given spec, which is part of what makes signing JSON such a scary idea in the first place—it's not great to sign something if you can't get everyone to agree on how that something should be interpreted. Anyone wanting to make a real JSON signing scheme should also bring their own more tightly constrained JSON spec. At that point though, you're probably better off using something designed to be signable in the first place, like DAG-CBOR.

Addendum

In the article I linked earlier, Lúcás picked the ristretto255 (prime order) group, and notes:

I also think that having a group that isn’t of prime order might lead to potential issues with collisions, but I haven’t thought all that much about it. Regardless, better safe than sorry.

My choice of ed25519 is not a prime order group, which means I am potentially at risk of collisions. At the very least, I can't confidently say that I'm not—I need to do some more research. In the meantime, consider my implementation to be insecure! Thank you to str4d for pointing this out.