UofTCTF 2025: An AES Timing Side-Channel

By David Buchanan, 13^th January 2025

I'm returning to my blogging roots with another CTF writeup. My first one since 2018!

I love cryptographic side-channels, but I've never had the chance to write about them in detail. This CTF challenge caught my eye, and I think it makes for an accessible introduction to the concept. The challenge didn't have many solves though - evidently there's a need for more writeups.

Without further ado, the challenge description:

Timed AES

I made my own AES implementation. To test its performance, I timed the encrypting and decryption operations. However, I noticed that the decryption is a lot slower than the encryption. Help me find out what is wrong.

Wrap the flag in the uoftctf{} format.

Author: White

(500pts, 4 solves)

You can get a copy of the challenge data here: https://github.com/UofTCTF/uoftctf-2025-chals-public/tree/master/timed-aes

The Challenge

There are a few files in the download zip:

Archive:  /home/david/Downloads/aes.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
 43950712  01-05-2025 03:48   out
    15653  12-12-2024 04:40   aes.c
      283  12-12-2024 04:42   aes.h
     3191  01-04-2025 17:33   chall.c
---------                     -------
 43969839                     4 files

chall.c is a CLI program that lets the user AES encrypt or decrypt messages of their choice, using a key loaded from flag.txt. We don't have flag.txt, but we do have out, which is a log of someone else using the program. Presumably, the flag was loaded as the key, at the time.

The log looks like this:

1. Encrypt
2. Decrypt
3. Exit
> 1
Input the message to encrypt (in hex): 845E4551E164748C339242F4F0D387C0
Result: 4AEDF15DD920135775274A797E8E94A4
Encryption took 5423.000000 ns
> 1
Input the message to encrypt (in hex): 8AEDB8A424E6B54F623335CD0791A202
Result: A486AE81CC81622957840868EAC1D8EE
Encryption took 4026.000000 ns
> 1
Input the message to encrypt (in hex): 395A4A646AF2D2CB6F8D6F52408D4376
Result: 818E6A5071FB8BE8D2EE113E6F9FF4BB
Encryption took 3681.000000 ns
> 2
Input the message to decrypt (in hex): AE3CE1EFE517D36569961249192C84B1
Result: A668A522391B0454AED0FEDAA67283F5
Decryption took 48345.00 ns
[...]

It's a big file, and there are 300,000 total encrypt/decrypt operations logged. It still doesn't give us a key, but we do get to know the plaintext and ciphertext for each operation - and crucially, the time it took to execute, with nanosecond precision.

In the world of side-channel exploits, these sorts of logs are often called "traces" (if this were a hardware attack, a "trace" might be a graph of power consumption over time).

In a secure AES implementation, the timing information wouldn't tell us anything important. But this is a CTF challenge, and (spoiler alert) the implementation is insecure - and the timing information can be used to recover the key, and the flag.

Lies, damned lies

Cryptography engineers love to talk about "constant time" implementations. On a modern computer though, nothing is truly constant-time. If you run a command like

time openssl enc -aes-128-ecb \
-K 00000000000000000000000000000000 \
-in <(echo secret) \
-out /dev/null

you'll probably get a different duration on each run. But that doesn't mean openssl is broken!

When we say "constant time", what we really mean is that the execution time is not dependent on any information that's supposed to stay secret. The wall-time of the openssl command's execution is variable, but none of those variations have anything to do with the value of the key, or the plaintext (except for its length, which could be mitigated by padding it to a fixed length). Or at least, that's the idea.

Clever tools like TIMECOP exist to help audit software for timing side-channels, but for this CTF we can just use our eyes.

Code Review

Back to the challenge. We're looking for code that might take a different amount of time to execute, depending on the value of the key (since that's the secret we're trying to extract). The challenge description hints that decryption is probably buggy, because you'd expect a correct implementation to have similar performance in both encrypt and decrypt modes. AES decryption is "just" the inverse of the encryption process, so any structural differences between the two ought to stick out.

If you're new to AES internals, check out the "high level description" section on Wikipedia too (you don't need to understand $G F (2^{8})$ arithmetic for this, btw).

This is your last chance to review the code to see if you can spot the bug for yourself, because I'm about to tell you.

Ok, here's the bug:

u_int8_t doSbox(u_int8_t num){
    return sbox[num];
}

u_int8_t doSboxInv(u_int8_t num){
    for (size_t i = 0; i < 256; i++)
    {
        if (sbox[i] == num) return i;
    }
}

The doSbox() function (used during encryption) is typical*, but doSboxInv() not so much.

The number of loop iterations will depend on the output value, i.e. the sbox-inverse of the input. The time differences will be on the order of nanoseconds, and we don't get to time the function individually (it gets called many times on each decryption), but that's fine because

a) we have nanosecond-precision timing information from the logs (all 300k of them)

b) we have the power of statistics on our side

it's the meme kid who has the power of god and anime on his side

*But it would not pass secure code review! Table lookups are not-constant-time on modern hardware, due to caching etc., but exploiting that is much more nuanced and not practical in this context.

A less-insecure implementation would use a separate inverse-sbox lookup table, and a security-hardened implementation wouldn't use lookup tables at all.

What information do we learn from timing?

Here's a closer look at the AES decryption function from the challenge, with some added comments.

void addRoundKey(u_int8_t *roundKey, u_int8_t *block){
    for (size_t i = 0; i < 16; i++)
    {
        block[i] ^= roundKey[i];
    }
}

void subBytesInv(u_int8_t *block){
    for (size_t i = 0; i < 16; i++)
    {
        block[i] = doSboxInv(block[i]); // <- the vulnerable function
    }
}

void aesBlockDecrypt(u_int8_t *key, const u_int8_t *enc, u_int8_t *msg){
    u_int8_t expandedKey[16 * 11];
    keyExpand(key, expandedKey);
    memcpy(msg, enc, 16);

    addRoundKey(expandedKey + 10*16, msg);

    for (size_t i = 9; i > 0; i--)
    {
        u_int8_t *roundKey  = expandedKey + 16 * i;
        shiftRowsInv(msg);
        subBytesInv(msg);  // <- timing leaks happen in here
        addRoundKey(roundKey, msg);
        mixColumnsInv(msg);
    }
    shiftRowsInv(msg);
    subBytesInv(msg);  // <- and also here
    addRoundKey(expandedKey, msg);
}

AES-128 has 10 rounds, where the last round is slightly different (and thus happens outside of the loop). There are 10 invocations of subBytesInv(), each calling doSboxInv() 16 times - so 160 invocations of the leaky function overall.

Recall that the execution logs also tell us the plaintext and the ciphertext values.

When subBytesInv() is called for the first time, its input buffer will be shiftRowsInv(ciphertext ^ expandedKey[10]) (treating expandedKey as an array of 11x 16-byte buffers). shiftRowsInv() is just a byte-shuffle and we can ignore it for our purposes.

When doSboxInv() is called for the first time, its input will be ciphertext[0] ^ expandedKey[10][0].

The timing leak means that its execution time will be proportional to the integer magnitude of inverse_sbox[ciphertext[0] ^ expandedKey[10][0]]. Our immediate goal is to determine the value of expandedKey[10][0], and there are 256 possible values.

We can't know the execution time of one function invocation individually, though - the time information we have is effectively the sum of all 160 invocations (plus the rest of the code, but we can assume that's constant).

And, Statistics

If we guessed the correct value of expandedKey[10][0], we'd expect the execution of the whole decryption to be statistically correlated to the integer magnitude of inverse_sbox[ciphertext ^ expandedKey[10][0]], over all 300000 traces. The 159 other invocations are essentially just random noise, which the correlation function will "ignore".

We'll test all 256 possible values of expandedKey[10][0] and see which one produces the strongest correlation. Then, we do the same thing for the other 15 expandedKey[10] bytes.

Here's my solution script:

import numpy as np

SBOX_INV = [
	0x52,0x09,0x6A,0xD5,0x30,0x36,0xA5,0x38,0xBF,0x40,0xA3,0x9E,0x81,0xF3,0xD7,0xFB,
	0x7C,0xE3,0x39,0x82,0x9B,0x2F,0xFF,0x87,0x34,0x8E,0x43,0x44,0xC4,0xDE,0xE9,0xCB,
	0x54,0x7B,0x94,0x32,0xA6,0xC2,0x23,0x3D,0xEE,0x4C,0x95,0x0B,0x42,0xFA,0xC3,0x4E,
	0x08,0x2E,0xA1,0x66,0x28,0xD9,0x24,0xB2,0x76,0x5B,0xA2,0x49,0x6D,0x8B,0xD1,0x25,
	0x72,0xF8,0xF6,0x64,0x86,0x68,0x98,0x16,0xD4,0xA4,0x5C,0xCC,0x5D,0x65,0xB6,0x92,
	0x6C,0x70,0x48,0x50,0xFD,0xED,0xB9,0xDA,0x5E,0x15,0x46,0x57,0xA7,0x8D,0x9D,0x84,
	0x90,0xD8,0xAB,0x00,0x8C,0xBC,0xD3,0x0A,0xF7,0xE4,0x58,0x05,0xB8,0xB3,0x45,0x06,
	0xD0,0x2C,0x1E,0x8F,0xCA,0x3F,0x0F,0x02,0xC1,0xAF,0xBD,0x03,0x01,0x13,0x8A,0x6B,
	0x3A,0x91,0x11,0x41,0x4F,0x67,0xDC,0xEA,0x97,0xF2,0xCF,0xCE,0xF0,0xB4,0xE6,0x73,
	0x96,0xAC,0x74,0x22,0xE7,0xAD,0x35,0x85,0xE2,0xF9,0x37,0xE8,0x1C,0x75,0xDF,0x6E,
	0x47,0xF1,0x1A,0x71,0x1D,0x29,0xC5,0x89,0x6F,0xB7,0x62,0x0E,0xAA,0x18,0xBE,0x1B,
	0xFC,0x56,0x3E,0x4B,0xC6,0xD2,0x79,0x20,0x9A,0xDB,0xC0,0xFE,0x78,0xCD,0x5A,0xF4,
	0x1F,0xDD,0xA8,0x33,0x88,0x07,0xC7,0x31,0xB1,0x12,0x10,0x59,0x27,0x80,0xEC,0x5F,
	0x60,0x51,0x7F,0xA9,0x19,0xB5,0x4A,0x0D,0x2D,0xE5,0x7A,0x9F,0x93,0xC9,0x9C,0xEF,
	0xA0,0xE0,0x3B,0x4D,0xAE,0x2A,0xF5,0xB0,0xC8,0xEB,0xBB,0x3C,0x83,0x53,0x99,0x61,
	0x17,0x2B,0x04,0x7E,0xBA,0x77,0xD6,0x26,0xE1,0x69,0x14,0x63,0x55,0x21,0x0C,0x7D
]

# parse traces
traces = []
for trace in open("out").read().split("Input the message to decrypt (in hex): ")[1:]:
	inline, outline, timeline = trace.split("\n")[:3]
	ciphertext = bytes.fromhex(inline)
	plaintext = bytes.fromhex(outline.removeprefix("Result: "))
	duration = float(timeline.split(" ")[2])
	traces.append((ciphertext, plaintext, duration))

times = [trace[2] for trace in traces]

round10key = []
for keyi in range(16):
	candidates = []
	for candidate_key_byte in range(0x100):
		expected_times = [
			SBOX_INV[ct[keyi] ^ candidate_key_byte]
			for ct, pt, _ in traces
		]

		correlation = np.corrcoef(times, expected_times)[0,1]
		candidates.append((correlation, candidate_key_byte))

	candidates.sort(reverse=True)
	print("=======")
	print(keyi)
	# print top candidates just to check they look sensible
	for corr, k in candidates[:4]:
		print(corr, hex(k))
	round10key.append(candidates[0][1])

print(bytes(round10key).hex())

I use numpy.corrcoef() to measure the statistical correlation. Internally this uses "Pearson product-moment correlation" - I have no idea if that's the best option, but it works.

Running this spits out the value 085c118eefe65786e3b2640d9b4ac16c, but that's still not a flag.

We've just leaked expandedKey[10], i.e. the last round key. What we actually want is expandedKey[0], or "the" AES key. Fortunately this is trivial, because the AES key schedule is invertible. The SideChannelMarvels/Stark repo contains a tool to invert the key schedule, like so:

./aes_keyschedule 085c118eefe65786e3b2640d9b4ac16c 10
K00: 74316D316E675F6C33346B735F613373
K01: 9AF2E2FEF495BD92C7A1D6E198C0E592
K02: 222BADB8D6BE102A111FC6CB89DF2359
K03: B80D661F6EB376357FACB0FEF67393A7
K04: 3FD13A5D51624C682ECEFC96D8BD6F31
K05: 5579FD3C041BB1542AD54DC2F26822F3
K06: 30EAF0B534F141E11E240C23EC4C2ED0
K07: 59DB807B6D2AC19A730ECDB99F42E369
K08: F5CA79A098E0B83AEBEE758374AC96EA
K09: 7F5AFE32E7BA46080C54338B78F8A561
K10: 085C118EEFE65786E3B2640D9B4AC16C

All we have to do now is decode the hex value of K00, and we have the flag:

1 2	>>> bytes.fromhex("74316D316E675F6C33346B735F613373") b't1m1ng_l34ks_a3s'

Why not leak `expandedKey[0]` directly?

While I was writing this up, I thought "huh, wouldn't it be simpler to just leak expandedKey[0] directly?" - that way we'd skip the need to invert the key schedule.

Unfortunately, this doesn't work. Well, it almost works. Modifying my script to do this (replacing SBOX_INV[ct[keyi] ^ candidate_key_byte] with pt[keyi] ^ candidate_key_byte) produces the following result:

1 2	>>> bytes.fromhex("6e3b6e206c624d6d313f6f755b633973") b'n;n lbMm1?ou[c9s'

Not the flag, but they're all ASCII characters at least. The high-order bits are mostly-correct while the low-order bits are mostly-incorrect. What's going on?

Let's say we're evaluating a particular candidate_key_byte value, but the LSB is wrong. pt[keyi] ^ candidate_key_byte will also be off-by-one, and the time it takes the leaky doSboxInv() function to execute will be different by only a single loop iteration. Thus, the expected correlation score for an incorrect guess may be very close to that of a correct guess. The timing leak is still there, but we'd need extremely high-resolution timing to tell the difference between a nearly-correct key guess and an actually-correct key guess. As-is, the signal gets lost in the noise.

Whereas, with the original leak strategy, getting the LSB off-by-one would completely change the result of SBOX_INV[]. Yes, there are a couple of values of ct[keyi] for which an off-by-one key guess will produce off-by-one loop timing, but ct[keyi] is ~random over the thousands of samples that we have, so most of the time a wrong guess will produce a weak correlation score.

CTF Meta

As I said, this was my first CTF writeup since 2018. I've still been playing CTFs in the meantime, just wayyyy less consistently (and I even started some writeups but never finished them...). I used to be super competitive about CTFing, and I kinda got burnt out (along with being busier with life™). These days I try to treat CTFs more like a social event than a competitive grind, and I have more fun that way.

Thanks to the UofTCTF organizers for a fun CTF!