Introduction
With the steadily rising interest in the ARM architecture, I decided to briefly board the bandwagon and see what all the fuss is about. As with all new experiences in computing, I started at the logical beginning with the ARM-compiled version of “Hello World!”. This blog post details my steps compiling some simple C source code and analysing the resulting binary.
Getting a toolchain
Before you can build ARM binaries, you have to build a toolchain and a set of tools that you would normally use to compile programs on your ordinary architecture. For example, you need an ARM compiled version of libc, and an ARM version of gcc at the very least. The easiest way I have found to get this is to use Fedora, and to install their pre-built ARM packages. Instructions for how to do this can be found here [1].
Once you have installed your toolchain (or built one from scratch), you can use it to compile your source code. Remember to use the ARM version of gcc, or you’ll end up with an ordinary binary. To double check, try running the program. It should fail because you’re trying to run it on a non-ARM processor. If it succeeds, you’ve either done something very clever or very stupid.
Taking a look around
Loading the program in IDA and examining the “main” function reveals a concise and slightly confusing set of instructions (GeSHi plugin for syntax highlighting can be found here [2]):
MOV R12, SP STMFD SP!, {R11,R12,LR,PC} SUB R11, R12, #4 LDR R0, =aHelloWorld ; "Hello World!" BL puts LDMFD SP, {R11,SP,PC}
Only 6 lines, but if you’ve never seen ARM assembly before (like me when I first did this), it will probably seem confusing. Lets take this line by line and see what’s happening here.
1) MOV R12, SP
ARM uses different names for it’s registers than x86, and there are a lot more of them in ARM. There are 15 general purpose registers (R0 – R14), as well as R15 (Program Counter) for each ARM processor mode. Of the general purpose registers, R0 – R12 are non-specific, and the remaining registers have pre-defined roles (though these may be changed as intructions are permitted to manipulate these register values is most processor modes):
- R13 (Stack Pointer)
- R14 (Link Register – The return address)
So this instruction simply stores the value of R13 (IDA renames this to SP automatically) into the R12 register.
2) STMFD SP!, {R11,R12,LR,PC}
The ARM architecture allows for operating on multiple registers at once. This instruction, STore Multiple Full Descending, stores the values of the registers listed in curly braces to the location indicated by the first operand. Regardless of the order the registers to be stored are specified, they will be pushed in ascending order (R0 → R15).
Full in this context means that SP will point to the last item pushed to the stack (‘Full’ slot), as opposed to the first ‘Empty’ slot. Descending specifies the direction in which the stack will grow as items are pushed (‘Descending’ is the default). The last thing to note is the “!” after SP, which sets the ‘write-back’ bit, and causes the stack pointer to be updated (in this case, incremented by 16).
3) SUB R11, R12, #4
Constants in ARM are prefixed with ‘#’, so this instruction subtracts 4 from R12 and stores the result in R11. Note that this is a little more verbose than the equivalent x86 assembly (sub eax, 4).
4) LDR R0, =aHelloWorld
This instruction loads a pointer to the string “Hello World!” into R0. ARM function calls take arguments in registers R0 – R3, and any additional arguments are passed on the stack.
5) BL puts
The BL instruction (Branch with Link) is the equivalent of the “call” instruction from x86. A regular B (Branch) instruction on the other hand is closer to a “jmp”. Both can be given conditionals for the branch. Branch with Link stores the value of R15 (Program Counter) into R14 (Link Register) before branching, allowing the control flow to return to the next instruction after the branch has completed. This is a simple method for implementing subroutines.
6) LDMFD SP, {R11,SP,PC}
Very similar to #2, except that values are loaded instead of stored. Note that the value of SP is not updated, so the stack pointer remains pointing at the last stored value.
Further Work
This is obviously a very basic look at the binary, a lot more is going on under the hood. As with x86 binaries, even though execution of user written code may begin at the main() function, the compiler often inserts code that runs before the main() function. This code usually sets up the environment, parses command line arguments, and sets up the initial call to main(). IDA calls this “start” by default, and this is usually labelled automatically after analysis.
Analysing the “start” routine can offer more in-depth insights into how a platform operates internally, but is usually overkill for most binary analysis projects.
References
[1] http://fedoraproject.org/wiki/Architectures/ARM/CrossToolchain
[2] https://geshi.svn.sourceforge.net/svnroot/geshi/trunk/geshi-1.0.X/src/geshi/arm.php



