Hello, ARM!

Introduction

With the steadily rising interest in the ARM architecture, I decided to briefly board the bandwagon and see what all the fuss is about. As with all new experiences in computing, I started at the logical beginning with the ARM-compiled version of “Hello World!”. This blog post details my steps compiling some simple C source code and analysing the resulting binary.

Getting a toolchain

Before you can build ARM binaries, you have to build a toolchain and a set of tools that you would normally use to compile programs on your ordinary architecture. For example, you need an ARM compiled version of libc, and an ARM version of gcc at the very least. The easiest way I have found to get this is to use Fedora, and to install their pre-built ARM packages. Instructions for how to do this can be found here [1].

Once you have installed your toolchain (or built one from scratch), you can use it to compile your source code. Remember to use the ARM version of gcc, or you’ll end up with an ordinary binary. To double check, try running the program. It should fail because you’re trying to run it on a non-ARM processor. If it succeeds, you’ve either done something very clever or very stupid.

Taking a look around

Loading the program in IDA and examining the “main” function reveals a concise and slightly confusing set of instructions (GeSHi plugin for syntax highlighting can be found here [2]):

MOV R12, SP
STMFD SP!, {R11,R12,LR,PC}
SUB R11, R12, #4
LDR R0, =aHelloWorld ; "Hello World!"
BL puts
LDMFD SP, {R11,SP,PC}

Only 6 lines, but if you’ve never seen ARM assembly before (like me when I first did this), it will probably seem confusing. Lets take this line by line and see what’s happening here.

1) MOV R12, SP

ARM uses different names for it’s registers than x86, and there are a lot more of them in ARM. There are 15 general purpose registers (R0 – R14), as well as R15 (Program Counter) for each ARM processor mode. Of the general purpose registers, R0 – R12 are non-specific, and the remaining registers have pre-defined roles (though these may be changed as intructions are permitted to manipulate these register values is most processor modes):

  • R13 (Stack Pointer)
  • R14 (Link Register – The return address)

So this instruction simply stores the value of R13 (IDA renames this to SP automatically) into the R12 register.

2) STMFD SP!, {R11,R12,LR,PC}

The ARM architecture allows for operating on multiple registers at once. This instruction, STore Multiple Full Descending, stores the values of the registers listed in curly braces to the location indicated by the first operand. Regardless of the order the registers to be stored are specified, they will be pushed in ascending order (R0 → R15).

Full in this context means that SP will point to the last item pushed to the stack (‘Full’ slot), as opposed to the first ‘Empty’ slot. Descending specifies the direction in which the stack will grow as items are pushed (‘Descending’ is the default). The last thing to note is the “!” after SP, which sets the ‘write-back’ bit, and causes the stack pointer to be updated (in this case, incremented by 16).

3) SUB R11, R12, #4

Constants in ARM are prefixed with ‘#’, so this instruction subtracts 4 from R12 and stores the result in R11. Note that this is a little more verbose than the equivalent x86 assembly (sub eax, 4).

4) LDR R0, =aHelloWorld

This instruction loads a pointer to the string “Hello World!” into R0. ARM function calls take arguments in registers R0 – R3, and any additional arguments are passed on the stack.

5) BL puts

The BL instruction (Branch with Link) is the equivalent of the “call” instruction from x86. A regular B (Branch) instruction on the other hand is closer to a “jmp”. Both can be given conditionals for the branch. Branch with Link stores the value of R15 (Program Counter) into R14 (Link Register) before branching, allowing the control flow to return to the next instruction after the branch has completed. This is a simple method for implementing subroutines.

6) LDMFD SP, {R11,SP,PC}

Very similar to #2, except that values are loaded instead of stored. Note that the value of SP is not updated, so the stack pointer remains pointing at the last stored value.

Further Work

This is obviously a very basic look at the binary, a lot more is going on under the hood. As with x86 binaries, even though execution of user written code may begin at the main() function, the compiler often inserts code that runs before the main() function. This code usually sets up the environment, parses command line arguments, and sets up the initial call to main(). IDA calls this “start” by default, and this is usually labelled automatically after analysis.

Analysing the “start” routine can offer more in-depth insights into how a platform operates internally, but is usually overkill for most binary analysis projects.

References

[1] http://fedoraproject.org/wiki/Architectures/ARM/CrossToolchain

[2] https://geshi.svn.sourceforge.net/svnroot/geshi/trunk/geshi-1.0.X/src/geshi/arm.php

http://computing.unn.ac.uk/staff/cgmb3/teaching/CM506/ARM_Assembler/AssemblerSummary/ARM_Instructions_Index.html

http://simplemachines.it/doc/

Posted in ARM | Tagged | Leave a comment

Daemon Enterprises Wargame 0×01 Solution

**** WARNING: SPOILER ALERT! ****

A few weeks ago, @m0n0sapiens posted a link to a wargame he had designed. The first chapter involved analysing some functions of a .dll, and calculating the result that would produced when giving it a certain value. While it’s possible to solve this challenge dynamically (debugging / writing a harness for the .dll), it turns out that it’s a lot easier to just analyse it statically and figure out what the functions are doing.

When you first load the .dll into IDA, checking the exports table makes it pretty obvious where to start looking.

The RingCommunication() function is quite small, with only a few calls and branches. There are some calls to send() and two other functions. The result of these 2 functions are tested and conditional jumps are made based on the results, so it is obvious they are important for solving this challenge. Examining them shows that they are very similar (but not identical!). Here is the first one:

And the second one:

Looking at these functions, they essentially take 2 parameters and perform some arithmetic based on the input and a 4 byte hardcoded key. Aside from having different keys, the functions perform slightly different arithmetic to each other. It took me a little while to notice this, but the change makes a huge difference on the function’s return value. Once I had worked out what these 2 functions were doing, I named and typed them in IDA and took another look at the RingCommunication() function.

A static value, which I’ve called keySelector, is assigned the value 116. Initially I thought this might be the iteration number, and that it was hard coded in to require binary patching to get the correct result. This made the arithmetic in the key functions return unlikely results however, so I eventually settled on the idea that the current iteration was one of the 3 parameters passed to RingCommunication(). From looking at how they are used, you can see that the other 2 values are the previous iteration’s key and the next iteration’s key.

Once I had these elements in place, I decided to go the extra mile and actually reverse the program into (sorta) C. A few disclaimers though, I’m not a big C coder, and there are likely to be some missing *s and what not. Also, I added some extra local variables for readability. The point for me is that this is much more readable than disassembly, and still conveys the operation of the functions.

/* Key Functions */
int keyFunc1(int keySelector, int iteration) {
	int key = 0x31982853;
	int iTemp = (keySelector << 8) || (keySelector >> 8);
	return (iteration ^ key) && iTemp;
}
 
int keyFunc2(int keySelector, int iteration) {
	int key = 0x41165328;
	int iTemp = (keySelector << 8) || (keySelector >> 8);
	return (iteration && iTemp) ^ key;
}
 
/* Main function */
int RingCommunication(SOCKET *s, int prevKey, int nextKey, int iteration) {
	int *buf;
	int *buf2;
	int keySelector = 116;
 
	/*
	* Calculates the previous iteration key,
	* and checks it against the supplied prevKey
	*/
	int prev = keyFunc1((keySelector - 1), iteration);
 
	if (prev == prevKey) {
		int res = keyFunc2(keySelector, iteration);
		sprintf(buf, "%lu", res);
		send(s, buf, strlen(buf));
	}
 
	/*
	* Calculates the next iteration key,
	* and checks it against the supplied nextKey
	*/
	int next = keyFunc2((keySelector + 1), iteration);
 
	if (next == nextKey) {
		int res2 = keyFunc1(keySelector, iteration);
		sprintf(buf2, "%lu", res2);
		send(s, buf2, strlen(buf2));
	}
 
	return 0;
};

The challenge is to calculate the key for iteration #828. I did this with trusty calc.exe :) I won’t spoil the whole thing by posting the answer, but if you’ve understood up to here than you’re home and dry already.

Posted in Solved | 1 Comment

COM Objects in Memory

Background

My recent research into browser-based memory initialisation bugs has caused me to bump up against COM objects quite frequently. At the time I glazed over them without really thinking, but as they form the basis of every underlying element in the DOM, it became clear that eventually I’d have to learn about them in some detail. Most aspects of COM objects are well documented in various books, MSDN articles and papers across the internet, but one angle that seemed to be undiscovered was how COM objects were represented when they were allocated in memory.

Design of COM

COM was designed primarily for interoperability between programs, allowing interaction with objects regardless of the program that created them. This is done by specifying a standardised object layout (much like the vtable + data layout of most objects in C++) and providing a set of functions to interact with these objects.

One of the key differences in COM objects is that there is a greater level of abstraction between the developer and the object. Access to attributes of the object is done entirely through methods exposed by the Windows API. This is emulated by OLE, which resolves variable names to the variant data structures that holds the value.

Perhaps more interestingly from our point of view, COM objects are also responsible for maintaining a count of the number of existing active references to the object. When this counter reaches 0, the object is flagged for garbage collection.

COM Object Memory

COM objects have several features that make them particularly juicy from an exploitation perspective. By design, the object and its data is trusted, and no integrity checks are performed when accessing or modifying memory within the object.

After observing the creating and use of COM objects in memory, it became clear that the reference counter value is stored 5 dwords back from the end of the instance’s allocated memory space. As this is a negative offset, and interfaces are not of a static length, it is necessary to know the base address of the interface and it’s size in order to accurately locate the counter in memory.

Contrived exploitation example a.k.a “exception proves the rule”

While most examples of applying COM object manipulation to memory corruption I can think of involve a write4 as a prerequisite (at which point it’s almost certainly game over anyway), there are some contrived examples where this might be applicable. One such an example might be when you know the base address of an instantiated COM interface in memory (through a memory leak), and you can decrement an arbitrary value, for example a crash on ‘dec [ecx]‘ when ecx is attacker controlled (what might normally be considered a ‘non-exploitable’ memory corruption).

In such an example, you would know the size of the object (providing you take the time to find out what interface is being instantiated), so providing the address of the reference counter value in memory to the arbitrary decrement, you can cause the interface to be freed while active references remain. Of course if you knew the type of the interface you were freeing and it’s base address, you could also redirect execution to the ‘Release’ method, which would decrement the counter in a similar fashion

All in all it’s a pretty obscure technique, but who knows, in some limited sets of circumstances this might be used to turn the non-exploitable into… well, something else :)

Posted in Exploit Development | Leave a comment

Python Bytecode Fuzzing

After reading Haifei Li’s excellent presentation on Understanding and Exploiting Flash ActionScript Vulnerabilities at CanSecWest this year, I was interested to discover that most crashes in ActionScript were found by fuzzing each byte of the precompiled bytecode. While this is normally frowned upon as ‘dumb fuzzing’, in interpreted languages it actually makes a lot of sense. A small change in the bytecode can result in a drastically different outcome when it is interpreted, and it requires zero knowledge of how the bytecode is used.

To illustrate how easy it is to discover these kind of bugs, I turned my attention to the Python interpreter and created a small bytecode fuzzing toolset, comprising of 2 components:

  • generate.py – Converts a pre-compiled python file into a collection of test cases
  • fuzz.py – Loops through the test cases with the python interpreter running in PyDbg to record any crashes

You can download these scripts here. At the moment these scripts are very primitive, and are written to work with .pyc files only, although this could easily be changed. They also require PyDbg  to run (make sure you have pydbg’s utils directory, as this has the crashbin functionality).

The initial run of the test file (included) looks promising, with lots of unexpected crashes and access violations. The fuzzer tends to bail on particularly severe crashes, future releases are likely to iron this out a bit more and at least let the fuzzing run continue.

DISCLAIMER: These tools were written entirely in my own time, and are not guaranteed to find you 0days, make you breakfast, help you overthrow a dictatorship etc… Use at your own risk. Released under the MIT license.

EDIT – Originally I stated it was Aaron Portnoy and Logan Brown’s presentation that sparked this work. I got my presentations mixed up, though their work was extremely impressive, I recommend you check it out.

Posted in Fuzzing | Leave a comment

Long time no see!

It’s been even longer than usual since I last updated this thing, but hopefully I’ll have some actual content again soon. I’ve been doing a fair bit of research into (amongst other things) the handling of COM objects by the windows memory management functions, and I’ve managed to pin down and reverse engineer several key components. Left intentionally vague until I finish the research and negotiate release terms with my boss.

Stay tuned.

Posted in General | Leave a comment

SANS 660 – Advanced Penetration Testing, Exploits and Ethical Hacking

Well, Sans London 2010 just finished today, and everyone (myself included) is looking pretty knackered. It was a good week, with plenty of technical content and a 1-day CTF contest at the end, Defcon style. Sadly I didn’t take home any prizes, but I didn’t do too badly either.

The course itself was reasonably well structured, covering some basic fundamentals (x86 architecture, assembly, memory management) as well as the more complex topics such as evading hardware DEP, ASLR and stack cookies. There were lots of practical examples throughout the course, both PoC programs and utilising real-world, vulnerable applications.

I’d thoroughly recommend the course to anyone who wants to learn exploits beyond vanilla stack overflows. My only criticism is that not enough was covered regarding heap overflows. Specifically, I would have like to be introduced to a basic heap overflow as well as some content on the low fragmentation heap introduced in vista and win 7.

Posted in Exams | Tagged , , , , | Leave a comment

Certified Waste of Time…

Well I had my CISSP exam yesterday, and as expected it followed the ‘senseless memorisation of otherwise googleable (verb?) facts’ approach. If you’ve done any kind of academic exam you’ll know what to expect. It wouldn’t be so bad if it wasn’t 6 hours long either. I went in with all the will in the world to check and double check every answer, but in the end I just did the ones I could do easily, then went back through and pondered the ones I didn’t get first round. It seemed to work alright, I was the first one out (infer from that what you will).

All in all I don’t think I would have done it if I were paying, as it doesn’t really seem to prove anything except the ability to retain otherwise useless facts. Still, it’s more letters on my business card and more CV fodder so bring on the reprints!

Posted in Exams | Tagged | Leave a comment

pvefindaddr – A plugin for immunity debugger

I have mixed feelings about immunity debugger. On the one hand, it looks like OllyDbg but with job ads in the top right and a signup process to get hold of a copy. On the other hand, there are some pretty neat features built in that extend Olly’s functionality, most notably the support for python scripting. This seemed like something that wasn’t immediately useful, until I stumbled across pvefindaddr. Essentially it is a bunch of convenience tools for exploit development, which make the ‘boring’ parts of exploit development (locating a ‘pop pop ret’ sequence for example) much faster.

Most of the commands are relatively general purpose, but many are specifically geared towards SEH chain exploitation, and commands like suggest can take a lot of the guesswork away from buffer sizes and the like. There is also integration with some neat metasploit tools, like the pattern creation and injection scripts. A full list of functions is available here.

You can grab the latest stable here, or check the wiki page for a svn link.

Posted in Exploit Development | Tagged | Leave a comment

[Solved] RTL8187L on Snow Leopard

After seeing a few threads around on the internet, I haven’t found one that successfully solves this problem, so I had a go and got it working. It is a pretty simple fix, the permissions of the kext that is installed just need to be changed a bit. Here are the steps I took to get it working:

1) Install the 10.5 driver from the realtek site and reboot. Ignore the warning about the corrupt kext.

2) After reboot, open a terminal and type the following command:

sudo kextutil /System/Library/Extensions/RTL8187l.kext

This should produce something similar to the following output:

Notice: /System/Library/Extensions/RTL8187l.kext has debug properties set.
/System/Library/Extensions/RTL8187l.kext has problems:
Authentication Failures:
File owner/permissions are incorrect (must be root:wheel, nonwritable by group/other):
/System/Library/Extensions/RTL8187l.kext/Contents/Resources
/System/Library/Extensions/RTL8187l.kext/Contents/Resources/English.lproj
/System/Library/Extensions/RTL8187l.kext/Contents/Resources/English.lproj/InfoPlist87l.strings

3) For each of the files listed, type the following command:

sudo chmod go-w [The File]

4) After typing those commands, the network card should be recognised by the system and you can configure it with the Realtek utility.

Posted in Solved | 6 Comments