CTF Writeup: Hackvent 2017 Day 23 - "Only Perl"

By David Buchanan, 6th of January 2018

This challenge was a highly obfuscated perl script. So much so, that it contained a lot of non-ASCII printable characters. (You can download the original file from here). When run with perl, it prompts the user for a password, prints some gibberish, and then says "Decryption done, are you happy now?".

Layer 1: Perl

My first step was to "deparse" the perl source, using the command perl -MO=Deparse onlyperl.pl. This results in something much more readble, relatively speaking:

1
2
3
4
5
6
7
8
'???';
'???';
$, = "\273\000\000\200\267\cZ\cAMC\201\373\cN\cC|\364\363\202L\364^M\245\333M\367\202L\371D\200l\375G\371A\367]N\213K]N\205\200l\367\201L\371D\200l\315s\\NE8\"\363\257L\364CM\245'M\367\257L\371D\200l\375G\371A\367BI\213KBI\205\200l\367\201L\371D\200l\315s]IH8\cN|\226\363MM\307\301\275L\304\275\376E\273\276\305\256O\302_N\315\244\303\305\206\307\302CO\304\275\376H\273\276\305\256\177\302\\I\305\301\275L\cK\314\263SM1\235\367\275L\371D\200l\367\201L\371D\200l\371\cA\200l\366MM\304\225HgM}M\cNt\2061\271\216MMM\@GiYXA\cP\cF\cP\cQ\\\cO\037\n\cK\f\cU\a\cPMVUYXA\$,1]\037\cP\cZ\034B[^\257\326T\e\cU\351\cBg\cW\302\cT\311\345\363\035X\265i\231\cK2\261\256\252\cB\325\cR\360)\210\000\331\eJ\224\251\263v\342\cD\334SR\216\320\254Y\352\374\330L\327\376\cC\205\@%\241\326X\305\cK\266-\267\272\354\221\362\350\"\240U\203\244\333\340\256\cO!\347m>\264\270W\262\366n\343\037/\225]OM\332\265\376e\321}6\233\cVgDa\373\cN\261^\375F\210\337\034 f\222\363#'3\252\253\cY\322\257\cT\215\214\201\351\cQ\314\a\302B\234\247,\276o\275b\226.\335\211\336\310\2320\325\301\361\212\217dQu\3571~\260\r\274\3565\243\cXH\f_\324\365\245\242w\317`\cH\220\$\200<\\)P9\202\300(:\323\312|qi\367\3537\206\2358\277r\t\316\313\341\236\371\250C&\cZy=\355\255\344j\cW\cSV\271[\177pE*\306Z?4\304\364\cAGA\246\360\231\345;\204KT\307c\213\cE\303\223\cU\346\230z\n2\227\237\cB\207x\cP+k\315h{\370\cFs\372\cRN\035\036Il\311t\273";
$d = q[Myp@I!rU,j2NbO&[DRQaXq4t)?%=1_kK6;/*(T0hd$.f 9}"^coSE8>|zBYn\\\\Pw5Fu+`AZgCJWi#{<s]7':\\-x~e3LvVGmHl];
$X = qq/\223&\305>}"\372BF~\370\cH\311~\364\363?\223\360FZf\345H\374\177A\362BAi\370F4>\354]\cO\202F_(Zyj\253\213G\3764\207h\330l\rN=i*\232\267N\361H\375\345 F\374c\@\346Bpp\345R\0007\243"~\221>\cZ\347cu_9\204F\cA>\205\037\320cj5\cA;\cKlF6I9\314\301n<\303\362\cE\354\302\301\312ZD\cHXC\206\243\302\300\215\267\205?[\312}\354A\371\201\274\254\205\277\cV\343/;
$_ = 'A"&j[7DO]3nn^R&Hr\\[O8A#3;a["3Q>DOgCO(I(8A#,;a["3Q>DOgCO(I48A#Q;a["3Q>DOgCO(nQ3~3&LmV8A"&j[7DQG&DDI,/I=osI3/I=obIQ/I=:UobFt<FF8.Ft``88LKR&DFJJIk,8A"&j[7LO\\[0vQ&!"7jR[LHR[v(L3&vL!RaLG3""!L[R^9\\[OA';
$__ =~ s//"y\374 -~\374$d\374;";/ee;
s//$_;/ee;

I don't know any perl, so I started by adding some "debug" print statements. By doing so, I realised that the variable $_ contained some unpacked code (newlines added for readability):

1
2
3
4
5
6
;print("Password:\n");
@a=unpack("C*",$,);
@b=unpack("C*",$X);
@c=unpack("C*",scalar <>);
print(chr(($b[$_]-$a[$_]+$c[$_%8]+0x100)&0xFF)) for(0..$#b);
print "\nDecryption done, are you happy now?\n";

After a bit of thinking, I realised what was going on - It's a basic keyed cipher, which adds and subtracts two keys from the corresponding ciphertext ASCII values. The fact that there are two keys doesn't really matter, one just gets subtracted from the other. I dumped the key data from the @a and @b variables into files for analysis. I wrote the following python script to perform frequency analysis and extract the plaintext and key:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
KEYLEN = 8
MODULUS = 0x100


def decrypt_func(char, key):
    return (char + key) % MODULUS


def decipher(cipher, key):
    plaintext = []
    for i in range(len(cipher)):
        plaintext.append(decrypt_func(cipher[i], key[i%len(key)]))
    return plaintext


def count_freqs(data):
    freqs = [0]*MODULUS
    for char in data:
        freqs[char] += 1
    # normalise (probably unnecessary)
    freqs = [x/sum(freqs) for x in freqs]
    return freqs


def best_key(target_freqs, actual_freqs):
    bestdiff = float("inf")
    bestkey = None
    for key in range(MODULUS):
        diff = 0
        for i in range(MODULUS):
            diff += (target_freqs[decrypt_func(i, key)] - actual_freqs[i])**2
        if diff < bestdiff:
            bestdiff = diff
            bestkey = key
    return bestkey


english_freqs = [
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.022298, 0.000000, 0.000000, 0.022298, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.166921, 0.002692, 0.000806, 0.000006, 0.000012, 0.000006, 0.000000, 0.017219,
    0.000454, 0.000454, 0.000525, 0.000000, 0.015315, 0.004441, 0.007198, 0.000143,
    0.000131, 0.000406, 0.000072, 0.000078, 0.000060, 0.000078, 0.000042, 0.000036,
    0.000060, 0.000066, 0.001528, 0.001158, 0.000000, 0.000000, 0.000000, 0.001206,
    0.000012, 0.004309, 0.000746, 0.001116, 0.001444, 0.001868, 0.000800, 0.001152,
    0.001844, 0.004972, 0.000078, 0.000525, 0.000943, 0.001325, 0.001086, 0.001397,
    0.001027, 0.000507, 0.001265, 0.001725, 0.003408, 0.000663, 0.000310, 0.001528,
    0.000036, 0.000848, 0.000006, 0.000024, 0.000000, 0.000024, 0.000000, 0.000024,
    0.000000, 0.054212, 0.009675, 0.016813, 0.031203, 0.090035, 0.013417, 0.016419,
    0.045247, 0.046572, 0.001325, 0.007174, 0.030159, 0.013399, 0.046978, 0.055173,
    0.010719, 0.000806, 0.038198, 0.041666, 0.069420, 0.023080, 0.005437, 0.016091,
    0.001015, 0.014575, 0.000472, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
    0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000
]

keya = open("keya", "rb").read()
keyb = open("keyb", "rb").read()

cipher = []

for i in range(len(keya)):
    cipher.append((keya[i] - keyb[i] + 0x100) & 0xFF)

print(bytes(cipher))

# cipher_segments[i] will contain the bytes affected by key[i]
cipher_segments = [
    [cipher[i] for i in range(key_index, len(cipher), KEYLEN)]
    for key_index in range(KEYLEN)
]

# generate a character frequency table for each segment
segment_freqs = [count_freqs(segment) for segment in cipher_segments]

# what key would best match the natural letter frequency?
best_keys = [best_key(english_freqs, freqs) for freqs in segment_freqs]

print("Key: " + repr(bytes(best_keys)))
print("Message:")
print(bytes(decipher(cipher, best_keys)).decode())

"""
Key: p0lyglot
Message:
HV17-this-is-not-what-you-are-looking-for
Are you sure that only perl can parse Perl?
Microsoft's ye old shell does not even know /usr/bin/perl.
"""

The frequency table came from my own Github repo: DavidBuchanan314/english-letter-freqs. I actually reused most of this code from a previous CTF solution.

Unfortunately, this alone is not the solution! The key itself and the hidden message hint that this file might be executable in other ways. Given the lack of any kind of header or "magic numbers", I eventually guessed that it was a DOS .COM executable file.

Layer 2: DOS

When executed in DOSBox, it prompts the user for the "perl password" (presumably "p0lyglot"), and then for the "DOS code". Time to load this up in IDA!

The .COM file format is ridiculously simple, and as far as I can tell it just gets loaded into memory at 0x100 and execution starts from the beginning, with the CPU running in 16-bit real mode.

The bytes between offset 0x104 and 0x109 are presumably only there to make the polyglot work.

The bytes from 0x11A onwards are all meaningless instructions, and the small loop starting at address 0x10E explains why - 0x30 bytes get XORed with the value 0x4D, starting at address 0x11A. Self modifying code is awesome!

Using an interactive debugger like GDB would be an ideal way to approach this, but I didn't have time to figure out how to do something like that with DOS executables. Instead, I wrote a simple python script to patch the .COM file, such that the code is already XORed.

1
2
3
4
5
6
7
8
9
data = open("onlyperl.com", "rb").read()

patched = data[:0x1A]

patched += bytes([x^0x4D for x in data[0x1A:]])

f = open("patched.com", "wb")
f.write(patched)
f.close()

Now, when "patched.com" is loaded up in IDA, I can see all the relevant code. In summary, the program takes in a 5 character password, does something to a buffer based on that password, and then prints out the modified buffer. I'll save some time, and zoom in on the relevant "something" section, a loop:

In C-like pseudocode, this effectively does the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
char * flag = (char *) 0x1F0;
char * perl_key = "p0lyglot"; // known user input
char * sbox = (char *) 0x20E;
char * dos_key = "?????"; // unknown user input

for (int i=0; i<0x1E; i++) {
    char cl = flag[i];
    cl += perl_key[i%8];
    cl -= 0x8E;
    cl = sbox[cl];
    cl ^= dos_key[i%5];
    flag[i] = cl;
}

puts(flag);

Given that we already have the "perl key", the only remaining step is bascically just a 5-byte repeating-key XOR cipher. We already know the first 5 bytes of the plaintext, "HV17-", so the rest is easy!

Here's my final solution script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
data = open("patched.com", "rb").read()

encrypted = [x for x in data[0xF0:0xF0+0x1E]]

keya = b"p0lyglot"
keyb = b"S4n7A"
lut = data[0x10E:]

print(encrypted)

output = []

for i in range(0x1E):
    foo = encrypted[i]
    foo += keya[i%8]
    foo -= 0x8E
    foo &= 0xFF
    foo = lut[foo]
    foo ^= keyb[i%5]
    output.append(foo)

crib = b"HV17-"
real_key = [crib[i]^output[i]^keyb[i] for i in range(5)]
print(bytes(real_key))

print(bytes(output))

This script is a bit hacky - on first run, "keyb" is unknown and can be any 5 bytes. print(bytes(real_key)) will print out the correct key regardless of the correctness of "keyb". Then, I just copied the output key into the keyb variable and re-ran the script.

So, the password was S4n7A, revealing the flag of HV17-Ovze-IUGF-W2xs-x2uE-pVRU.

That was a very cool challenge! If I remember correctly, I solved it in around 75 minutes, making me the first person to solve it :P