Hijacking Bluesky Identities with a Malleable Deputy

By David Buchanan, 28th September 2023

If you don't live under a rock, you might've heard of Bluesky, a decentralised social microblogging app built on top of the AT Protocol. In early June 2023, I identified a vulnerability in Bluesky's core user identity mechanism, did:plc, which allowed me to modify the identity information associated with any* account. I tested my hypothesis by changing the handle of the official @bsky.app account.

A screenshot showing the impact, via Klearsky, a 3rd party client: https://github.com/mimonelu/klearsky

I reported the issue to security@bsky.app (per security.txt) on June 1st, at 04:51 (UTC+0). The issue was acknowledged 34 minutes later, and a patch was deployed by 17:03 the same day (I suppose that's the next day, really, for those who sleep at American times), at which point I was given the green light to discuss it publicly. I wrote an off-the-cuff summary thread on it at the time, but it's taken me until now to get around to writing it up properly.

I've tried to keep this article accessible for people who might be new to the AT Protocol ecosystem, which means there's a lot to explain. Hopefully this will serve as an introduction to atproto internals as much as it is a description of the bug itself, which I think represents a fun cryptographic pitfall.

AT Protocol TL;DR

A concept at the core of atproto is the Repo(sitory). Each user has their own Repo, which is a store of Records. Every time you make a post on Bluesky, it's stored in your repo as a Record of type app.bsky.feed.post. It might look something like this:

1
2
3
4
5
6
{
	"$type": "app.bsky.feed.post",
	"text": "Hello, world!",
	"langs": ["en"],
	"createdAt": "2023-09-25T08:01:43.247Z"
}

Every time you "like" someone else's post, that action is recorded as an app.bsky.feed.like Record in your own repo, alongside a reference to the record that you're liking. Similar records are created for just about any other kind of interaction you can think of. Lexicon schemas (schemata?) define what Record types exist, and their respective structures.

Storing data in your own personal repo is all well and good, but you also want to tell the rest of the world about it (and if you don't want that, fair enough, but atproto is perhaps not the tool you're looking for). This is where the AT comes in: Authenticated Transfer.

Every record in a repo is part of a Merkle Tree—specifically, a Merkle Search Tree [PDF], which has some neat properties that I won't be going into here. Every time a record is created, removed, or updated, the MST is updated and has a new root node. This root node is referenced by hash in a cryptographically signed Commit object. Any individual record can be efficiently authenticated by following the merkle path back to the root node, then the commit object, and verifying the signature of the commit object. If that's all sounding too complicated to follow, the important takeaway is that all the data is signed.

Signing, combined with the structural properties of the MST, means that authenticated data can be efficiently relayed or synced between nodes of a decentralised system. Bluesky PBC's Federation Architecture Overview blog post goes into more detail about how this works in practice.

A proposed federation architecture (via https://blueskyweb.xyz/blog/5-5-2023-federation-architecture)

To give a very brief summary of the key components here:

  • A "Personal Data Server" (PDS) acts as the canonical host of a user's repo data, for one or more users.

  • A "Big Graph Service" consumes repo update events from many PDSes (perhaps ~all of them) in near real-time. It consolidates all these events into a single stream, which it makes available to other services. This stream is affectionately known as "the firehose".

  • An AppView consumes firehose events, indexes them, applies application-level semantics, and makes them available via an API (e.g., it can tell you who liked or replied to a given post). In theory, only the AppView has to know what "bluesky" is (for the most part, the PDS and BGS only need to know the base atproto spec).

Identity

So, we've got signed data being relayed between nodes. But in a vacuum, a signature is worthless. We need to know who signed something. In other words, we need a way to link a user identity back to a cryptographic keypair.

In many decentralised protocols, your public key is your identity. For example, a Bitcoin wallet address is just a public key (actually, a hash of the public key), an Onion V3 service address is just a public key, and a Nostr "npub" address is also, you guessed it, a public key. I could go on.

Using a public key as the root of user identity is very simple, but it has some big drawbacks. The biggest drawback of all is that securely and resiliently maintaining custody of a cryptographic key over a long period of time is hard. If you want your service to have truly mainstream adoption, you can't expect end-users to be able (or willing) to do so.

If your private key leaks to an adversary, your identity is irrevocably compromised.

If you lose your private key, access to your account is gone forever.

When you have a Hard Problem, often the smartest thing to do is to delegate it to someone else.

As such, atproto adopts the W3C's Decentralised Identifiers spec (aka DID) for handling user identity. DID specifies a generalised mechanism for special URIs (did:*) to use a "DID method" to associate a "DID subject" (e.g., a user ID) with a "DID document" (which can enumerate, among other things, signing keys for use with particular protocols). Each DID method specifies its own mechanism for actually making that happen. Many have been specified so far, and if you have a look through them, you'll first notice that there's a lot of them, and then that there's a distinctly blockchain-y theme.

Fortunately, they're not all like that, and atproto has "blessed" only two methods so far: did:web and did:plc.

did:web is a W3C standard, using DNS+HTTPS in a fairly straightforward manner. As a concrete example, did:web:retr0.id maps to a DID document hosted at https://retr0.id/.well-known/did.json. If I want to update the document, I just log in to my web server and edit the file.

did:plc is a bit more interesting, and really it's the focus of this article—I sure took my time getting to it!

did:plc

I'll start by quoting from the README

DID PLC is a self-authenticating DID which is strongly-consistent, recoverable, and allows for key rotation.

An example DID is: did:plc:ewvi7nxzyoun6zhxrhs64oiz

Control over a did:plc identity rests in a set of reconfigurable rotation keys pairs. These keys can sign update operations to mutate the identity (including key rotations), with each operation referencing a prior version of the identity state by hash. Each identity starts from an initial genesis operation, and the hash of this initial object is what defines the DID itself (that is, the DID URI identifier string). A central directory server collects and validates operations, and maintains a transparent log of operations for each DID.

Bluesky PBC developed DID PLC when designing the AT Protocol (atproto) because we were not satisfied with any of the existing DID methods. [...]

We originally titled the method "placeholder", because we didn't want it to stick around forever in its current form. We are actively hoping to replace it with or evolve it into something less centralized - likely a permissioned DID consortium.

There are some blockchain-adjacent ideas in here—sequences of signed operations, each referencing the prior by hash—but saliently, rather than relying on mechanisms like PoW or PoS for distributed consensus, there's just one central server keeping a ledger.

This is perhaps the least satisfying component of the whole atproto ecosystem. But what it loses in Decentralisation Points, it makes up for in Convenience Points—and did:web is still there for those who prefer it.

did:plc Internals

Here's a Python script I wrote to generate a fresh did:plc identity (as part of my picopds project). The script is only 85 lines of generously spaced-out code, which I'll now explain.

1
2
privkey = ec.generate_private_key(ec.SECP256K1())
pubkey = privkey.public_key()

The first step is to generate an asymmetric keypair, which will be used to sign the PLC operation. By the way, atproto currently supports two EC curves: SECP256K1 ("Bitcoin flavour") and SECP256R1 ("NIST flavour").

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
genesis = {
	"type": "plc_operation",
	"rotationKeys": [
		encode_did_pubkey(pubkey),
	],
	"verificationMethods": {
		"atproto": encode_did_pubkey(pubkey), #XXX should really be separate from rotationKeys
	},
	"alsoKnownAs": [
		"at://" + HANDLE
	],
	"services": {
		"atproto_pds": {
			"type": "atprotoPersonalDataServer",
			"endpoint": "https://" + PDS_SERVER
		}
	},
	"prev": None,
}

We assemble a "genesis" PLC operation, which includes:

  • rotationKeys - an array of public keys that are authorised to issue new updates to the DID document (in my case, there's only one).

  • verificationMethods includes the atproto repo signing key (i.e., the key that signs Commits). Really, this should be a separate key, but I reuse the rotation key for simplicity (laziness).

  • The alsoKnownAs array includes an at:// handle - more on these later.

  • The services map advertises the URL of the PDS that hosts the user's atproto repo.

1
2
3
4
5
6
genesis_bytes = dag_cbor.encode(genesis)
signature = base64.urlsafe_b64encode(raw_sign(privkey, genesis_bytes)).decode().strip("=")
signed_genesis = genesis | {"sig": signature}
signed_genesis_bytes = dag_cbor.encode(signed_genesis)

plc = "did:plc:" + base64.b32encode(hashlib.sha256(signed_genesis_bytes).digest())[:24].lower().decode()

The operation is then serialised to bytes using DAG-CBOR, so that it can be signed. The signature is added back to the original object, which is re-serialised, hashed, base-32 encoded, and finally truncated to 24 characters to form the did:plc identifier string.

1
2
plc_url = "https://" + PLC_SERVER + "/" + plc
r = requests.post(plc_url, json=signed_genesis)

Finally, the JSON representation of the signed operation is POST'ed to the PLC server. The "production" plc server is live at plc.directory, but there's now also a "sandbox" instance over at plc.bsky-sandbox.dev.

Resolving a did:plc is a simple matter of making an HTTP GET request to the PLC server. For example, https://plc.directory/did:plc:vwzwgnygau7ed7b7wt5ux7y2 returns a DID document, much like the one I showed earlier for did:web:retr0.id.

One can also GET https://plc.directory/did:plc:vwzwgnygau7ed7b7wt5ux7y2/log/audit, which shows the historic sequence of individually signed PLC operations that led up to the current state of the DID document.

Handles

Unfortunately, these did:plc: identifiers aren't very human-readable or memorable; such is the curse of Zooko's Triangle. To ameliorate this, atproto facilitates binding a DNS domain name to a given DID. For example, I own the domain retr0.id, which I link to my did, did:plc:vwzwgnygau7ed7b7wt5ux7y2, allowing me to reference it (or be referenced) by the handle @retr0.id within the atproto ecosystem.

This linkage is bidirectional. For it to be considered valid:

  • The DID document has to "point at" the domain, via the alsoKnownAs field I mentioned earlier.

  • The domain has to "point at" the DID, via one of two methods: a DNS TXT record, or a file served over HTTPS at /.well-known/atproto-did.

Anyone wanting to verify a DID<->domain mapping has to check both directions, since in isolation either one could be spoofed.

The Vulnerability

I've finally explained enough background information to get to the actual vulnerability! Earlier I described truncating the DID identifier to 24 characters. This wasn't always a hard requirement! At the time, you could truncate to any length you liked, as long as it was greater than or equal to 24. There was an open GitHub issue discussing whether this was a good idea or not and what the implications might be. The prevailing sentiment was that they should fix the lengths to 24, but not for any concrete reasons (having unnecessary flexibility in a component like this is generally Bad Vibes).

Upon learning this, I had a bit of a poke around and confirmed that one could indeed create longer did:plcs, but more importantly, you can create lengthened "duplicates" of existing DIDs. These duplicate DIDs could be used to sign up for a fresh Bluesky account on bsky.social (the "official" Bluesky PDS) - which I pointed out in a reply to the GitHub issue.

At this point in time, I considered it to be a mere curiosity. The duplicate DIDs share a prefix, but they are ultimately non-equal strings, which might break certain developer assumptions but otherwise ought not cause any direct security issues.

However, there's one other factor that raises this from "a curiosity" to "a big problem": bsky.social uses the same rotationKeys for every account. This is an eyebrow-raising decision on its own; apparently the cloud HSM product they use does billing per key, so it would be prohibitively expensive to give each user their own. (I hear they're planning on transitioning from "cloud" to on-premise hosting, so maybe they'll get the chance to give each user their own keypair then?)

But why is this a problem?

If I "clone" a DID from an existing bsky.social account by repeating its genesis operation but with a differently-truncated identifier, sign up for a new account on bsky.social using that cloned DID, and then change the handle of this new account, the PDS will generate a new signed PLC update operation and publish it to the PLC directory on my behalf. Because both DIDs have identical genesis operations, a signed update to one is also a valid signed update to the other.

I can grab the signed update from the /log/audit endpoint of the PLC directory server and "replay" it back, in relation to the other DID, thus updating a DID record which is not my own.

There's a catch, though, which is that the PDS won't let you sign up for a new account if the handle is already in use, which would normally be the case if you've just cloned the DID of an existing account. Fortunately, I came up with a strategy to avoid this, which I'll now explain.

Exploitation

At the beginning, I said that I could "modify the identity information associated with any* account". I'll fill that asterisk in now:

*any account that has changed its handle, leaving the original handle available for registration.

This might sound like an obscure edge-case, but it's pretty common in practice, and it also happens to be true for the official @bsky.app account, which makes it a good target for demonstration purposes.

The first step is to figure out the account's DID. There are a bunch of ways to do this, but in this instance, we can do it like so:

$ dig +short TXT _atproto.bsky.app
"did=did:plc:z72i7hdynmk6r22z27h6tvur"

We can then inspect the DID's history by visiting https://plc.directory/did:plc:z72i7hdynmk6r22z27h6tvur/log/audit. At the time, it looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
[{
	"did":"did:plc:z72i7hdynmk6r22z27h6tvur",
	"operation":{
		"sig":"9NuYV7AqwHVTc0YuWzNV3CJafsSZWH7qCxHRUIP2xWlB-YexXC1OaYAnUayiCXLVzRQ8WBXIqF-SvZdNalwcjA",
		"prev":null,
		"type":"plc_operation",
		"services":{
			"atproto_pds":{
				"type":"atprotoPersonalDataServer",
				"endpoint":"https://bsky.social"
			}
		},
		"alsoKnownAs":["at://bluesky-team.bsky.social"],
		"rotationKeys":[
			"did:key:zQ3shhCGUqDKjStzuDxPkTxN6ujddP4RkEKJJouJGRRkaLGbg",
			"did:key:zQ3shpKnbdPx3g3CmPf5cRVTPe1HtSwVn5ish3wSnDPQCbLJK"
		],
		"verificationMethods":{
			"atproto":"did:key:zQ3shXjHeiBuRCKmM36cuYnm7YEMzhGnCmCyW92sRJ9pribSF"
		}
	},
	"cid":"bafyreigp6shzy6dlcxuowwoxz7u5nemdrkad2my5zwzpwilcnhih7bw6zm",
	"nullified":false,
	"createdAt":"2023-04-12T04:53:57.057Z"
},{
	"did":"did:plc:z72i7hdynmk6r22z27h6tvur",
	"operation":{
		"sig":"1mEWzRtFOgeRXH-YCSPTxb990JOXxa__n8Qw6BOKl7Ndm6OFFmwYKiiMqMCpAbxpnGjF5abfIsKc7u3a77Cbnw",
		"prev":"bafyreigp6shzy6dlcxuowwoxz7u5nemdrkad2my5zwzpwilcnhih7bw6zm",
		"type":"plc_operation",
		"services":{
			"atproto_pds":{
				"type":"atprotoPersonalDataServer",
				"endpoint":"https://bsky.social"
			}
		},
		"alsoKnownAs":["at://bsky.app"],
		"rotationKeys":[
			"did:key:zQ3shhCGUqDKjStzuDxPkTxN6ujddP4RkEKJJouJGRRkaLGbg",
			"did:key:zQ3shpKnbdPx3g3CmPf5cRVTPe1HtSwVn5ish3wSnDPQCbLJK"
		],
		"verificationMethods":{
			"atproto":"did:key:zQ3shXjHeiBuRCKmM36cuYnm7YEMzhGnCmCyW92sRJ9pribSF"
		}
	},
	"cid":"bafyreihmuvr3frdvd6vmdhucih277prdcfcezf67lasg5oekxoimnunjoq",
	"nullified":false,
	"createdAt":"2023-04-12T17:26:46.468Z"
}]

There are two PLC operations here. In the first, the genesis operation, the alsoKnownAs array contained at://bluesky-team.bsky.social. In the second, it was changed to at://bsky.app.

Note that the prev field of the second operation references the cid of the first. CID stands for Content IDentifier, i.e., a hash of the content.

We have all the information here required to reconstruct the original genesis operation. I won't bore you with the code, but it's much the same as the script I described earlier, but with the values from /log/audit plugged in where appropriate—and importantly, with the DID identifier truncated to 26 digits rather than the original 24. Why not 25? Well, I tried that first and messed something up later, so I needed a fresh attempt...

With the reconstructed genesis op submitted to plc.directory (but with a less-truncated identifier string), we have a sort of evil-twin DID. Bluesky lets you "bring your own DID" on registration (but not via the web UI yet; you need to poke the API directly), so I can use the cloned DID to create a fresh account, which I did.

What happened next is perhaps best explained with a sequence diagram:

Maybe you can understand why I messed up my first attempt somewhere...

The main idea here is that the bsky.social PDS is the only agent that holds the rotationKeys for the DID that we care about. We jump through all these hoops in order to coerce the PDS into signing a legitimate operation for one purpose (changing our own handle in our own DID document), which we can maliciously repurpose for something else—changing the handle in the victim's DID document.

Note that the intermediate step (7) of updating the "clone" DID's handle to bsky.app (i.e., replaying the second PLC operation) is important; we need to keep the two DIDs "in sync". Without doing so, the final malicious handle-change operation would end up referencing the wrong prev operation when we replay it, and the PLC directory would reject it.

I'm not sure how to classify this vulnerability exactly. The identifier-extension element is related to the concept of malleability), and the way we manipulate the PDS is related to a Confused Deputy attack, but it's not quite either. I'm going to call it a Malleable Deputy attack, unless someone can come up with a better term, because I like making up terms for things.

Impact

So, I changed the DID document of the official Bluesky account to reference the handle @retr0id-was-here.bsky.app. Somewhat anticlimactically, this change was near-invisible to casual users. Because Bluesky's AppView server didn't know about the change, the official Bluesky app was still showing the old @bsky.app handle in the UI. However, Klearsky has a feature for inspecting handle change history, by querying the PLC directory server, which is what I was showing off in the image at the top of this article.

This is perhaps the ideal outcome—it was only supposed to be a harmless proof-of-concept demonstration, after all.

But it could have been much worse. In the future, Bluesky will support migrating an account from one PDS to another (which is one of atproto's big selling points). Rather than doing a handle change, I could've initiated a PDS transfer (involving adding a new rotationKey of my own choosing), which would have given me total control over the hijacked account. This feature isn't implemented yet, so it's a good thing the bug was caught now!

Aftermath

As I mentioned initially, the bug was patched promptly. The immediate fix was obvious: reject DID PLC identifiers longer than 24 characters.

The question of what to do with the now-invalid operations was slightly less obvious, though. As a complicating factor, before I'd discovered the security implications, I'd already announced to the world that it was possible to register longer DIDs. As such, several curious developers had also been playing around with the "feature".

Bluesky developer @dholms.xyz announced the changes in a thread. Depending on how far into the future you're reading this, you won't be able to visit that link without a Bluesky account, so I'll quote it below:

daniel 🫠 (@dholms.xyz) at 2023-06-01T17:16:49.624Z:

Thanks to @retr0-id.translate.goog, we're locking down the length of PLC DIDs to exactly 24.

Besides his accounts, there are 9 DIDs in the network with longer DIDs. We'll be removing these from the official plc registry in one week. The invalidated operations will be kept around in the git repo

None of these DIDs are registered with the bsky.social PDS and I'm not sure if they're in active use.

Please let me know if this will affect you adversely & we can find a plan going forward

@syui.ai @forza7.org @nokotaro.bsky.social

While we treat PLC as fully immutable, we reserve the right - esp in this beta period - to respond to & correct exploits.

For full transparency, we'll be invalidating the operation that took advantage of this exploit.

But similarly, we'll keep a log of it around in the git repo

For bragging rights, you can currently see it here: https://plc.directory/did:plc:z72i7hdynmk6r22z27h6tvur/log/audit

the operation in question has cid bafyreidaxmtdx6pb3up6tznwdbdse53uytfl7laql4cdlig22zhktkhfjy

The referenced "bragging rights" link no longer shows the fruits of my labour, but fortunately I saved an archive snapshot: https://archive.ph/SqRMf.

The rest of the expunged operations are also preserved in git, here: https://github.com/bluesky-social/did-method-plc/blob/bda24ffb3171f8039df348280533be682208c83f/invalidated-op-log.txt

I think this is a perfectly reasonable resolution. It's a little bit jarring to be reminded that, for now, PLC history can be manually rewritten by a Bluesky PBC employee. But it was done for good reason, and with transparency.

That said, there's definitely room for improvement in the transparency department; we shouldn't have to rely on admins self-reporting these sorts of changes. "We", the wider atproto ecosystem, should be able to detect them ourselves. Ideally, we'd have several independent 3rd parties monitoring the PLC directory to ensure consistency and integrity of the reported data. A few people have pointed towards Certificate Transparency as a blueprint for how this could be done. The /export endpoint lays the groundwork for monitoring, but it's not really happening yet.

atscan is one very cool project that indexes DID PLC information (among many other useful things), although it appears to be dormant for now.

In the longer term, there's talk of moving control of DID PLC into some kind of consortium or other independent body.

Closing Thoughts

Through my explanations above, you might come away with the impression that atproto is unnecessarily complicated. It is complicated, but if you dig deeper, which I encourage you to do, you'll find that most things have a good reason to be the way they are.

As a platform, Bluesky has lofty ambitions, and there's still a long journey ahead of it. They've come a long way already, though, recently surpassing 1 million users, and they're consistently making progress in the right directions. Despite trying my hardest to poke holes in it, I'm optimistic, and I'm excited to see where it goes next!