Program Execution Internals - Part1
Hello Fellow pwners!
In the previous post, we discussed what a process is and the memory layout of a running process. It gave a general picture of what types of variables are present in what part of memory.
In this post, we will go a little deep. We will take different C Coding constructs like if , for , struct etc., and see how they are implemented at the assembly level. Let us look at how memory is allocated for different types of variables and more.
To do this, we will use the famous Debugger - gdb (GNU Debugger) . A Debugger will help us dissect any program the way we want which help in understanding the program better. Along with gdb, we will use an add-on python script called PEDA (Python Exploit Development Assistance) . This script will help us use the GNU Debugger in convenient manner.
Also, create a directory named post_5 in the rev_eng_series directory and store all the stuff we explore in the post_5 directory.
1. Installing gdb and peda:
-
gdb is installed by default in any gnu/linux system. As we are using Ubuntu, it is already present.
-
Installing peda:
$ git clone https://github.com/longld/peda.git ~/peda $ echo "source ~/peda/peda.py" >> ~/.gdbinit
-
Now when you start gdb, you have to get something like this:
$ gdb GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". gdb-peda$
With this, you are ready to go!
This post will be fully hands-on and there won’t be much theory to learn. It is all what we have discussed in previous articles. Let us take good number of examples and examine how C language is converted into assembly language by the compiler.
In the first set of examples, let us discuss about how variables of different datatypes are stored in memory. In the second set, we will have a look into different C constructs.
2. Practicals
Program1
~/rev_eng_series/post_5$ cat code1.c
#include<stdio.h>
int gv = 2000;
int main() {
char c = 'a';
short int s = 1234;
int i = 4837538;
long int l = 2348027502937;
static int si = 394738;
return 0;
}
~/rev_eng_series/post_5$ gcc code1.c -o code1 -g
a. This is code1.c . In this program, we will look at the very basics. We have been telling that stack is used to store local variables, but never saw how. This program is all about that. There are local variables of different datatypes. Let us see how they are stored in stack .
- Compile the code1.c sourcefile with the -g option. It stores Debugging Symbols in the executable which is used by the Debugger and make our analysis easy.
Analysis:
-
To load an executable to gdb,
~/rev_eng_series/post_5$ gdb code1 GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from code1...done. gdb-peda$
-
Observe the line Reading symbols from code1…done. . These are the symbols which are added to the executable(here code1) when compiled with -g option.
-
Everytime when you open an ELF File with gdb, gdb will print all of it’s version details, License details. To suppress it from print all those details, you can do this:
~/rev_eng_series/post_5$ gdb -q code1 Reading symbols from code1...done. gdb-peda$
-
We have not yet run our executable. At this stage, gdb is fully aware of our executable, it’s symbols, functions etc.,
-
If we run the program like the way we run normally on the terminal, then we are not making use of the amazing facilities offered by gdb. gdb offers a special facility called BreakPoints . That is, if you want to stop your execution at a particular instruction, you can set a BreakPoint there, you can inspect the instruction, memory at that moment and then go to the next instruction.
-
We want to analyze the whole main function. So, let us set a BreakPoint at main function.
gdb-peda$ break main Breakpoint 1 at 0x4004da: file code1.c, line 7. gdb-peda$
NOTE: The addresses like 0x4004da might change on your computer. It might be different. Follow the concept properly.
-
Note that this is BreakPoint 1 and is set at the address 0x4004da . This means, this address is where the main function begins. gdb also know in which line of the C sourcefile we are breaking. If you want print your sourcecode, you can use the list intruction.
gdb-peda$ list 1 #include<stdio.h> 2 3 int gv = 2000; 4 5 int main() { 6 7 char c = 'a'; 8 short int s = 1234; 9 int i = 4837538; 10 long int l = 2348027502937; gdb-peda$ 11 12 static int si = 394738; 13 14 return 0; 15 } gdb-peda$
-
To understand the internal working, it is best if we have the assembly code for the main function. To get the assembly code, you have to disassemble the executable.
gdb-peda$ disass main Dump of assembler code for function main: 0x00000000004004d6 <+0>: push rbp 0x00000000004004d7 <+1>: mov rbp,rsp 0x00000000004004da <+4>: mov BYTE PTR [rbp-0xf],0x61 0x00000000004004de <+8>: mov WORD PTR [rbp-0xe],0x4d2 0x00000000004004e4 <+14>: mov DWORD PTR [rbp-0xc],0x49d0a2 0x00000000004004eb <+21>: movabs rax,0x222b1586159 0x00000000004004f5 <+31>: mov QWORD PTR [rbp-0x8],rax 0x00000000004004f9 <+35>: mov eax,0x0 0x00000000004004fe <+40>: pop rbp 0x00000000004004ff <+41>: ret End of assembler dump. gdb-peda$
-
Note that the BreakPoint 1 is set at 0x4004da which is <+4> . Let us set a BreakPoint at 0x4004d6 .
gdb-peda$ b *0x4004d6 Breakpoint 1 at 0x4004d6: file code1.c, line 5. gdb-peda$
-
We had just mentioned about rsp and rbp being Stack Pointer and Base Pointer in one of the previous articles. Now, let us understand what exactly these pointers are, what they do and why they are important.
-
We know that a function’s Local Variables are stored in stack. Let us understand how exactly these variables are stored.
-
When a function is called, a part of the stack is allocated to the function. This part of the stack which is allocated to the function is known as StackFrame . So, to be more precise, a function’s local variables are stored in it’s private stack space called StackFrame.
-
The size of StackFrame solely depends on the number of local variables in the function. If there are more number of variables, size of StackFrame is big.
-
The Base Address(Address of Bottom) of the StackFrame is stored in rbp . The Top Address(Address of Top) of the StackFrame is stored in rsp .
-
Let us run the program to properly understand what a StackFrame is.
gdb-peda$ run Starting program: /home/adwi/rev_eng_series/post_5/code1
-
This is what you get.
NOTE: The addresses shown in this post and addresses in your executable might be different. But the concepts are the same.
[----------------------------------registers-----------------------------------]
RAX: 0x4004d6 (<main>: push rbp)
RBX: 0x0
RCX: 0x0
RDX: 0x7fffffffdbc8 --> 0x7fffffffdfdf ("XDG_VTNR=7")
RSI: 0x7fffffffdbb8 --> 0x7fffffffdfb8 ("/home/adwi/rev_eng_series/post_5/code1")
RDI: 0x1
RBP: 0x400500 (<__libc_csu_init>: push r15)
RSP: 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax)
RIP: 0x4004d6 (<main>: push rbp)
R8 : 0x400570 (<__libc_csu_fini>: repz ret)
R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp)
R10: 0x846
R11: 0x7ffff7a2d740 (<__libc_start_main>: push r14)
R12: 0x4003e0 (<_start>: xor ebp,ebp)
R13: 0x7fffffffdbb0 --> 0x1
R14: 0x0
R15: 0x0
EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x4004ce <frame_dummy+30>: call rax
0x4004d0 <frame_dummy+32>: pop rbp
0x4004d1 <frame_dummy+33>: jmp 0x400450 <register_tm_clones>
=> 0x4004d6 <main>: push rbp
0x4004d7 <main+1>: mov rbp,rsp
0x4004da <main+4>: mov BYTE PTR [rbp-0xf],0x61
0x4004de <main+8>: mov WORD PTR [rbp-0xe],0x4d2
0x4004e4 <main+14>: mov DWORD PTR [rbp-0xc],0x49d0a2
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax)
0008| 0x7fffffffdae0 --> 0x0
0016| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfb8 ("/home/adwi/rev_eng_series/post_5/code1")
0024| 0x7fffffffdaf0 --> 0x100000000
0032| 0x7fffffffdaf8 --> 0x4004d6 (<main>: push rbp)
0040| 0x7fffffffdb00 --> 0x0
0048| 0x7fffffffdb08 --> 0x5bf888248267542a
0056| 0x7fffffffdb10 --> 0x4003e0 (<_start>: xor ebp,ebp)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 2, main () at code1.c:5
5 int main() {
gdb-peda$
-
This is how your terminal should look like. Important things to note are:
-
=> 0x4004d6 <main>: push rbp</main> : This is the first instruction of main function. The => mark is pointing to this instruction. => always points to the instruction which gets executed next. So, this instruction is still not executed as of now, but will be executed next.
-
Note the rsp and rbp values as of now. rsp = 0x7fffffffdad8 and rbp = 0x400500 .
-
-
Let us consider the next 2 instructions:
0x4004d6 <main>: push rbp 0x4004d7 <main+1>: mov rbp,rsp
-
The push rbp will push the present rbp value onto the stack. Currently, rbp has the Base Address of the Caller function and it is being pushed. Let us discuss why it is being pushed a bit later.
-
Lets execute <main></main> using the ni command. ni stands for next instruction . This executes 1 instruction at a time.
gdb-peda$ ni
-
This is what the state is after ni command is executed.
[-------------------------------------code-------------------------------------] 0x4004d0 <frame_dummy+32>: pop rbp 0x4004d1 <frame_dummy+33>: jmp 0x400450 <register_tm_clones> 0x4004d6 <main>: push rbp => 0x4004d7 <main+1>: mov rbp,rsp 0x4004da <main+4>: mov BYTE PTR [rbp-0xf],0x61 0x4004de <main+8>: mov WORD PTR [rbp-0xe],0x4d2 0x4004e4 <main+14>: mov DWORD PTR [rbp-0xc],0x49d0a2 0x4004eb <main+21>: movabs rax,0x222b1586159 [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdad0 --> 0x400500 (<__libc_csu_init>: push r15) 0008| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0016| 0x7fffffffdae0 --> 0x0 0024| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfb8 ("/home/adwi/rev_eng_series/post_5/code1") 0032| 0x7fffffffdaf0 --> 0x100000000 0040| 0x7fffffffdaf8 --> 0x4004d6 (<main>: push rbp) 0048| 0x7fffffffdb00 --> 0x0 0056| 0x7fffffffdb08 --> 0x5bf888248267542a [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x00000000004004d7 5 int main() { gdb-peda$
-
Now the => is pointing to mov rbp, rsp which means the push rbp is executed.
-
Note the top of the stack: **0000 0x7fffffffdad0 –> 0x400500** . Note that rsp is now 0x7fffffffdad0 and Content at the top of the stack is 0x400500 which is the Old rbp value. So, everything has happened as expected. -
Note that current values of rbp = 0x400500 and rsp = 0x7fffffffdad0 .
-
Let us execute the next instruction.
gdb-peda$ ni
-
This is how the terminal looks like.
[----------------------------------registers-----------------------------------] RAX: 0x4004d6 (<main>: push rbp) RBX: 0x0 RCX: 0x0 RDX: 0x7fffffffdbc8 --> 0x7fffffffdfdf ("XDG_VTNR=7") RSI: 0x7fffffffdbb8 --> 0x7fffffffdfb8 ("/home/adwi/rev_eng_series/post_5/code1") RDI: 0x1 RBP: 0x7fffffffdad0 --> 0x400500 (<__libc_csu_init>: push r15) RSP: 0x7fffffffdad0 --> 0x400500 (<__libc_csu_init>: push r15) RIP: 0x4004da (<main+4>: mov BYTE PTR [rbp-0xf],0x61) R8 : 0x400570 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x846 R11: 0x7ffff7a2d740 (<__libc_start_main>: push r14) R12: 0x4003e0 (<_start>: xor ebp,ebp) R13: 0x7fffffffdbb0 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x4004d1 <frame_dummy+33>: jmp 0x400450 <register_tm_clones> 0x4004d6 <main>: push rbp 0x4004d7 <main+1>: mov rbp,rsp => 0x4004da <main+4>: mov BYTE PTR [rbp-0xf],0x61 0x4004de <main+8>: mov WORD PTR [rbp-0xe],0x4d2 0x4004e4 <main+14>: mov DWORD PTR [rbp-0xc],0x49d0a2 0x4004eb <main+21>: movabs rax,0x222b1586159 0x4004f5 <main+31>: mov QWORD PTR [rbp-0x8],rax [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdad0 --> 0x400500 (<__libc_csu_init>: push r15) 0008| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0016| 0x7fffffffdae0 --> 0x0 0024| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfb8 ("/home/adwi/rev_eng_series/post_5/code1") 0032| 0x7fffffffdaf0 --> 0x100000000 0040| 0x7fffffffdaf8 --> 0x4004d6 (<main>: push rbp) 0048| 0x7fffffffdb00 --> 0x0 0056| 0x7fffffffdb08 --> 0x5bf888248267542a [------------------------------------------------------------------------------] Legend: code, data, rodata, value Breakpoint 1, main () at code1.c:7 7 char c = 'a'; gdb-peda$
-
mov rbp, rsp was executed. Value of rsp should have got copied into rbp. And that has happened. Value of rsp = 0x7fffffffdad0 and rbp = 0x7fffffffdad0. There are 2 important points to understand at this point:
-
The value rbp had previously was overwritten by the value of rsp. If we had not pushed the old rbp value onto the stack, then we would have permanently lost that value. That is why, whenever a function is called, the first job done is to push the old rbp value onto the stack so that we won’t lose it.
-
Second point is that, now, rbp = rsp. Base pointer = Stack Pointer which means this is a StackFrame with size = 0 . So, After mov rbp, rsp is executed, a StackFrame with size 0 is constructed.
-
-
From now, let us focus more on Code and the Stack part.
-
Let us execute the next instruction - mov BYTE PTR [rbp-0xf],0x61 . After execution, this is the state.
[-------------------------------------code-------------------------------------] 0x4004d6 <main>: push rbp 0x4004d7 <main+1>: mov rbp,rsp 0x4004da <main+4>: mov BYTE PTR [rbp-0xf],0x61 => 0x4004de <main+8>: mov WORD PTR [rbp-0xe],0x4d2 0x4004e4 <main+14>: mov DWORD PTR [rbp-0xc],0x49d0a2 0x4004eb <main+21>: movabs rax,0x222b1586159 0x4004f5 <main+31>: mov QWORD PTR [rbp-0x8],rax 0x4004f9 <main+35>: mov eax,0x0 [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdad0 --> 0x400500 (<__libc_csu_init>: push r15) 0008| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0016| 0x7fffffffdae0 --> 0x0 0024| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfb8 ("/home/adwi/rev_eng_series/post_5/code1") 0032| 0x7fffffffdaf0 --> 0x100000000 0040| 0x7fffffffdaf8 --> 0x4004d6 (<main>: push rbp) 0048| 0x7fffffffdb00 --> 0x0 0056| 0x7fffffffdb08 --> 0x5bf888248267542a [------------------------------------------------------------------------------] Legend: code, data, rodata, value 8 short int s = 1234; gdb-peda$
-
Note that some value 0x61 is stored at the address rbp - 0xf . 0x61 is the ascii value of a . Let us take a deeper look at how the C code char c = ‘a’ is converted to mov BYTE PTR [rbp-0xf],0x61 .
-
char c = ‘a’ ==> char c = 0x61 ==> char [rbp - 0xf] = 0x61 ==> BYTE PTR [rbp - 0xf] = 0x61 ==> mov BYTE PTR[rbp - 0xf], 0x61 .
-
Notice that c is no more present in the executable and it is replaced directly by a memory address rbp - 0xf . BYTE PTR signifies that it is a char because sizeof(char) is 1 byte .
-
-
Let us now inspect the memory location rbp - 0xf using the x/ command.
gdb-peda$ x/10xb $rbp - 0xf 0x7fffffffdac1: 0x61 0xff 0xff 0xff 0x7f 0x00 0x00 0x00 0x7fffffffdac9: 0x00 0x00
-
Let me explain the above command used. The x/ command is used to check out any portion of the memory. The 10xb stands for show 10 bytes of data from the specified address in hexadecimal form and separate every byte of data. Look at the above output. 0x61 is 1 byte, 0xff is 2nd byte, 0c7f is the 5th byte and so on.
-
You can also use the command like this: x/10xw $rbp - 0xf .
gdb-peda$ x/10xw $rbp - 0xf 0x7fffffffdac1: 0xffffff61 0x0000007f 0x00000000 0x00000000 0x7fffffffdad1: 0x00004005 0x30000000 0xfff7a2d8 0x0000007f 0x7fffffffdae1: 0x00000000 0xb8000000 gdb-peda$
-
Here 10xw stands for 10 words(each word here = 4 bytes) and all data in hexadecimal form(x).
-
Look at the first 4 bytes. It is 0xffffff61 . But the first byte should have been 0x61 right? Just know that even now, 0x61 is the first byte, but there is difference. There is a very important concept behind this. Let us talk about it at the end of the article.
-
-
So, at the end of this instruction, rbp - 0xf address has the value 0x61 which is ‘a’, the character variable which we had defined in our C program.
-
Let us move to the next instruction - mov WORD PTR [rbp-0xe],0x4d2 . Checkout the value of decimal equivalent of 0x4d2 . It is 1234 . While doing analysis, it is good to have python interpreter opened in another terminal. You can do all these conversions very easily.
-
So, let us checkout the memory location specified by rbp - 0xe.
gdb-peda$ x/10xb $rbp - 0xe 0x7fffffffdac2: 0xd2 0x04 0xff 0x7f 0x00 0x00 0x00 0x00 0x7fffffffdaca: 0x00 0x00 gdb-peda$
-
So, it is stored as 0xd2 0x04 , in the reverse order of 0x04d2. Let us talk about this reversing later, but for now know that the number is stored in the memory location.
-
Now, I think you will be able to make sense of the next 3 instructions. So, I will skip them.
-
Directly coming to 0x4004f9 <main+35>: mov eax,0x0 . It is important to note that the return value of the main function is 0 . So, always at the end of any function, the ReturnValue is loaded into rax / eax .
-
Onto the next instruction: 0x4004fe <main+40>: pop rbp .
-
Before executing this instruction, rsp = 0x7fffffffdad0 and rbp = 0x7fffffffdad0 and top of the had this value - 0x400500 .
-
In general, pop Reg1 will do 2 things.
mov Reg1, QWORD PTR [rsp] add rsp, 0x08
-
So, pop rbp will load the Value at the top of the stack into Reg1 and add 0x08 to rsp.
-
Let us execute it and see what happens.
RBP: 0x400500 (<__libc_csu_init>: push r15) RSP: 0x7fffffffdad8 --> 0x7ffff7a2d830
-
Observe the old and new values of rsp and rbp. It has changed as expected.
-
- ret is executed and control goes back the caller function.
With this, we have successfully analyzed a simple C program’s execution. We basically dealt with
- What rsp and rbp are.
- Why we have to store old rbp in the stack - (push rbp)
- How a StackFrame is constructed dynamically.
- How are local variables stored in stack.
- How is return value conveyed to the caller function - (mov eax, 0x0)
- How is control returned back to the caller function - (ret)
Program2
~/rev_eng_series/post_5$ cat code2.c
#include<stdio.h>
int main() {
int a = 10, b = 15;
if(a > b)
printf("a is greater than b\n");
return 0;
}
~/rev_eng_series/post_5$ gcc code2.c -o code2
Let us get straight to analysis!
Analysis:
~/rev_eng_series/post_5$ gdb -q code2
Reading symbols from code2...(no debugging symbols found)...done.
gdb-peda$ disass main
Dump of assembler code for function main:
0x0000000000400526 <+0>: push rbp
0x0000000000400527 <+1>: mov rbp,rsp
0x000000000040052a <+4>: sub rsp,0x10
0x000000000040052e <+8>: mov DWORD PTR [rbp-0x8],0xa
0x0000000000400535 <+15>: mov DWORD PTR [rbp-0x4],0xf
0x000000000040053c <+22>: mov eax,DWORD PTR [rbp-0x8]
0x000000000040053f <+25>: cmp eax,DWORD PTR [rbp-0x4]
0x0000000000400542 <+28>: jle 0x40054e <main+40>
0x0000000000400544 <+30>: mov edi,0x4005e4
0x0000000000400549 <+35>: call 0x400400 <puts@plt>
0x000000000040054e <+40>: mov eax,0x0
0x0000000000400553 <+45>: leave
0x0000000000400554 <+46>: ret
End of assembler dump.
gdb-peda$
Let us understand the assembly code first.
a. Construction of the StackFrame:
- First, mov rbp, rsp means a StackFrame of size 0 is constructed. b. The next instruction is sub rsp, 0x10 => Subtract 0x10 - 16 from current rsp value. This means, the difference between rsp and rbp is 16 bytes. So, a StackFrame of size 16 bytes is constructed for main function.
b. Body of the function:
-
mov DWORD PTR [rbp-0x8],0xa : A value 0xa (10) is being copied into memory specified by address rbp - 0x08 . It is 4 bytes in size. It is the local variable a .
-
mov DWORD PTR [rbp-0x4],0xf : A value 0xf (15) is being copied into memory specified by address rbp - 0x4 . It is 4 bytes in size. It is the local variable b .
-
Notice that names of local variables are no longer present in the program. Only the respective memory locations are present and that is the only way to access those variables.
-
mov eax,DWORD PTR [rbp-0x8] : Value at rbp-0x8 is being copied into eax . So, eax will have 0xa after the execution of this instruction.
-
cmp eax,DWORD PTR [rbp-0x4] : Value in eax is being compared with value at rbp - 0x4 .
-
jle 0x40054e <main+40> : jle is Jump if Less than or Equal to some location. It means, If Value in eax <= Value at rbp - 0x4 , then jump to 0x40054e . 0x40054e is the end of the program.
-
If Value in eax > Value at rbp - 0x4, then jle doesn’t get executed. So, the instructions after jle get executed.
-
mov edi,0x4005e4 ; call 0x400400 puts@plt : puts() is executed.
c. Destruction of the StackFrame:
-
We can see a new instruction leave . This is nothing but the following 2 instructions:
mov rsp, rbp pop rbp
-
First the StackFrame is destructed by that mov instruction.
-
Then, the Old rbp value is loaded into the rbp register.
-
ret => Return back to the caller function.
Now that we have a good overview of the main function, let us execute it by instruction by instruction.
d. Break at main function and run!
gdb-peda$ b main
Breakpoint 1 at 0x40052a
gdb-peda$ run
-
Let us focus on the code and stack part a few required registers.
[-------------------------------------code-------------------------------------] 0x400521 <frame_dummy+33>: jmp 0x4004a0 <register_tm_clones> 0x400526 <main>: push rbp 0x400527 <main+1>: mov rbp,rsp => 0x40052a <main+4>: sub rsp,0x10 0x40052e <main+8>: mov DWORD PTR [rbp-0x8],0xa 0x400535 <main+15>: mov DWORD PTR [rbp-0x4],0xf 0x40053c <main+22>: mov eax,DWORD PTR [rbp-0x8] 0x40053f <main+25>: cmp eax,DWORD PTR [rbp-0x4] [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdad0 --> 0x400560 (<__libc_csu_init>: push r15) 0008| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0016| 0x7fffffffdae0 --> 0x0 0024| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfbb ("/home/adwi/rev_eng_series/post_5/code2") 0032| 0x7fffffffdaf0 --> 0x100000000 0040| 0x7fffffffdaf8 --> 0x400526 (<main>: push rbp) 0048| 0x7fffffffdb00 --> 0x0 0056| 0x7fffffffdb08 --> 0x864ee2b3356e8653 [------------------------------------------------------------------------------] Legend: code, data, rodata, value Breakpoint 1, 0x000000000040052a in main () gdb-peda$
-
At this point, rsp = 0x7fffffffdad0 and rbp = 0x7fffffffdad0 . Size of StackFrame = 0.
-
Execute the sub rsp, 0x10 using the ni command.
gdb-peda$ ni
-
This is the current state:
[----------------------------------registers-----------------------------------] RAX: 0x400526 (<main>: push rbp) RBX: 0x0 RCX: 0x0 RDX: 0x7fffffffdbc8 --> 0x7fffffffdfe2 ("XDG_VTNR=7") RSI: 0x7fffffffdbb8 --> 0x7fffffffdfbb ("/home/adwi/rev_eng_series/post_5/code2") RDI: 0x1 RBP: 0x7fffffffdad0 --> 0x400560 (<__libc_csu_init>: push r15) RSP: 0x7fffffffdac0 --> 0x7fffffffdbb0 --> 0x1 RIP: 0x40052e (<main+8>: mov DWORD PTR [rbp-0x8],0xa) R8 : 0x4005d0 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x846 R11: 0x7ffff7a2d740 (<__libc_start_main>: push r14) R12: 0x400430 (<_start>: xor ebp,ebp) R13: 0x7fffffffdbb0 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x400526 <main>: push rbp 0x400527 <main+1>: mov rbp,rsp 0x40052a <main+4>: sub rsp,0x10 => 0x40052e <main+8>: mov DWORD PTR [rbp-0x8],0xa 0x400535 <main+15>: mov DWORD PTR [rbp-0x4],0xf 0x40053c <main+22>: mov eax,DWORD PTR [rbp-0x8] 0x40053f <main+25>: cmp eax,DWORD PTR [rbp-0x4] 0x400542 <main+28>: jle 0x40054e <main+40> [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdac0 --> 0x7fffffffdbb0 --> 0x1 0008| 0x7fffffffdac8 --> 0x0 0016| 0x7fffffffdad0 --> 0x400560 (<__libc_csu_init>: push r15) 0024| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0032| 0x7fffffffdae0 --> 0x0 0040| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfbb ("/home/adwi/rev_eng_series/post_5/code2") 0048| 0x7fffffffdaf0 --> 0x100000000 0056| 0x7fffffffdaf8 --> 0x400526 (<main>: push rbp) [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x000000000040052e in main () gdb-peda$
-
Here, rsp = 0x7fffffffdac0 and rbp = 0x7fffffffdad0 . There is an important thing to understand at this point. In the previous article, we had just discussed that Stack grows Downward in the Memory Layout of a process. Notice that 0x10 was subtracted from rsp and not added. So, this is Stack growing downward in action. For this reason, Value of rsp is always less than or equal to Value of rbp .
-
Let us check what our StackFrame has:
gdb-peda$ x/16xb $rsp 0x7fffffffdac0: 0xb0 0xdb 0xff 0xff 0xff 0x7f 0x00 0x00 0x7fffffffdac8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
-
You can obviously check out the top of the stack with the gdb-peda output. You can see the top 8 bytes(**0000 0x7fffffffdac0 –> 0x7fffffffdbb0 –> 0x1**) is some address and the next 8 bytes is just zeros. -
At this stage, we have a proper Private StackFrame ready for main function.
-
Now, let us go ahead and execute the next instruction: (Showing only the code and stack part)
[-------------------------------------code-------------------------------------] 0x400527 <main+1>: mov rbp,rsp 0x40052a <main+4>: sub rsp,0x10 0x40052e <main+8>: mov DWORD PTR [rbp-0x8],0xa => 0x400535 <main+15>: mov DWORD PTR [rbp-0x4],0xf 0x40053c <main+22>: mov eax,DWORD PTR [rbp-0x8] 0x40053f <main+25>: cmp eax,DWORD PTR [rbp-0x4] 0x400542 <main+28>: jle 0x40054e <main+40> 0x400544 <main+30>: mov edi,0x4005e4 [------------------------------------stack-------------------------------------] 0000| 0000| 0x7fffffffdac0 --> 0x7fffffffdbb0 --> 0x1 0008| 0x7fffffffdac8 --> 0xa ('\n') 0016| 0x7fffffffdad0 --> 0x400560 (<\_\_libc_csu_init>: push r15) 0024| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<\_\_libc_start_main+240>: mov edi,eax) 0032| 0x7fffffffdae0 --> 0x0 0040| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfbb ("/home/adwi/rev_eng_series/post_5/code2") 0048| 0x7fffffffdaf0 --> 0x100000000 0056| 0x7fffffffdaf8 --> 0x400526 (<main>: push rbp) [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x0000000000400535 in main () gdb-peda$
-
Now, Memory location rbp - 0x8 should have 0xa (10). Check out the the stack. Let us check out using the x command.
0x0000000000400535 in main () gdb-peda$ x/8xb $rbp - 0x8 0x7fffffffdac8: 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00 gdb-peda$
-
Let us execute the next instruction - mov DWORD PTR [rbp-0x4],0xf
[-------------------------------------code-------------------------------------] 0x40052a <main+4>: sub rsp,0x10 0x40052e <main+8>: mov DWORD PTR [rbp-0x8],0xa 0x400535 <main+15>: mov DWORD PTR [rbp-0x4],0xf => 0x40053c <main+22>: mov eax,DWORD PTR [rbp-0x8] 0x40053f <main+25>: cmp eax,DWORD PTR [rbp-0x4] 0x400542 <main+28>: jle 0x40054e <main+40> 0x400544 <main+30>: mov edi,0x4005e4 0x400549 <main+35>: call 0x400400 <puts@plt> [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdac0 --> 0x7fffffffdbb0 --> 0x1 0008| 0x7fffffffdac8 --> 0xf0000000a 0016| 0x7fffffffdad0 --> 0x400560 (<__libc_csu_init>: push r15) 0024| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0032| 0x7fffffffdae0 --> 0x0 0040| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfbb ("/home/adwi/rev_eng_series/post_5/code2") 0048| 0x7fffffffdaf0 --> 0x100000000 0056| 0x7fffffffdaf8 --> 0x400526 (<main>: push rbp) [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x000000000040053c in main () gdb-peda$
-
Now, Memory location rbp - 0x4 should have 0xf . Check the stack: **0008 0x7fffffffdac8 –> 0xf0000000a** . This is the variable: gdb-peda$ x/4xb $rbp - 0x4 0x7fffffffdacc: 0x0f 0x00 0x00 0x00 gdb-peda$
-
Next instruction is straight forward: mov eax,DWORD PTR [rbp-0x8] - So skipping this.
-
Executing cmp eax,DWORD PTR [rbp-0x4] : Compares Value in eax with Value at rbp - 0x04 and sets appropriate flags in the EFLAGS register.
-
Executing jle 0x40054e <main+40> : The jle instruction checks the EFLAGS register. If certain flags are set, only then the jump will happen. If not, the jump is not going to happen. Here, Value in eax < Value at rbp - 0x04 (10 < 15) . So, the jump happens . The puts() is not executed.
-
After executing jle, we are here:
[-------------------------------------code-------------------------------------] 0x400542 <main+28>: jle 0x40054e <main+40> 0x400544 <main+30>: mov edi,0x4005e4 0x400549 <main+35>: call 0x400400 <puts@plt> => 0x40054e <main+40>: mov eax,0x0 0x400553 <main+45>: leave 0x400554 <main+46>: ret 0x400555: nop WORD PTR cs:[rax+rax*1+0x0] 0x40055f: nop [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdac0 --> 0x7fffffffdbb0 --> 0x1 0008| 0x7fffffffdac8 --> 0xf0000000a 0016| 0x7fffffffdad0 --> 0x400560 (<__libc_csu_init>: push r15) 0024| 0x7fffffffdad8 --> 0x7ffff7a2d830 (<__libc_start_main+240>: mov edi,eax) 0032| 0x7fffffffdae0 --> 0x0 0040| 0x7fffffffdae8 --> 0x7fffffffdbb8 --> 0x7fffffffdfbb ("/home/adwi/rev_eng_series/post_5/code2") 0048| 0x7fffffffdaf0 --> 0x100000000 0056| 0x7fffffffdaf8 --> 0x400526 (<main>: push rbp) [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x000000000040054e in main () gdb-peda$
-
We bypassed the puts(). mov eax, 0x0 is return 0 ;
-
Let us understand what leave does as it is a new instruction for us.
-
Current state: rsp = 0x7fffffffdac0 , rbp = 0x7fffffffdad0 .
- After leave: rsp = 0x7fffffffdad8 , rbp = 0x400560 .
Analysis:
- leave = mov rsp, rbp ; pop rbp .
- Current state: rsp = 0x7fffffffdac0 , rbp = 0x7fffffffdad0
- After mov rsp, rbp, the state is: rsp = 0x7fffffffdad0 , rbp = 0x7fffffffdad0 . The point to understand is, the StackFrameis destructed and that stack space of 16 bytes is freed.
-
pop rbp : The top of the stack has the value 0x400560 . So, rsp = 0x7fffffffdad0 points to 0x400560 . When pop rbp is executed, the value at the top of the stack(0x400560) is loaded into rbp AND rsp is incremented by 8 bytes. So, rsp = 0x7fffffffdad0 becomes rsp = 0x7fffffffdad0 + 8 => rsp = 0x7fffffffdad8 .
- The ret will return control to the caller function.
That ends the analysis of a simple C program. So, the 2 examples taken above were to understand 2 things:
a. How variables are stored in stack and how to see the stack using gdb. b. How Assembly instructions are executed.
Now, let us give gdb a break and look at reading only assembly instructions and see what it does.
Program3
~/rev_eng_series/post_5$ cat code3.c
#include<stdio.h>
int main() {
int i = 0;
while(i < 10) {
printf("%d\n", i);
i = i + 1;
}
return 0;
}
~/rev_eng_series/post_5$ gcc code3.c -o code3
-
This is a simple program with a while loop.
-
We want only the Assembly code of this program. Let us not get into running this. Let us understand the program just by reading code.
-
If we want only Assembly code, we don’t have to use gdb and then do disass main to get the Disassembly . You can use this tool called objdump . Let us see how.
~/rev_eng_series/post_5$ objdump -Mintel -D code3 > code3.objdump
-
-M stands for machine / processor. We want it in the Intel Syntax. If you don’t specify it, you will the Disassembly in AT&T Syntax.
-
-D stands for Full Disassembly. Meaning, all parts of the file is disassembled . If -d is used, only selected parts are disassembled. Either ways, we will get the disassembly of the main function which is what we want.
-
> simply puts the output of objdump into a file named code3.objdump .
-
Let’s open that file and see what it has:
-
It has a lot of sections. Let us focus on the main function for now.
0000000000400526 <main>: 400526: 55 push rbp 400527: 48 89 e5 mov rbp,rsp 40052a: 48 83 ec 10 sub rsp,0x10 40052e: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0 400535: eb 18 jmp 40054f <main+0x29> 400537: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 40053a: 89 c6 mov esi,eax 40053c: bf e4 05 40 00 mov edi,0x4005e4 400541: b8 00 00 00 00 mov eax,0x0 400546: e8 b5 fe ff ff call 400400 <printf@plt> 40054b: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1 40054f: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9 400553: 7e e2 jle 400537 <main+0x11> 400555: b8 00 00 00 00 mov eax,0x0 40055a: c9 leave 40055b: c3 ret 40055c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
Analysis:
a. Construction of StackFrame:
400526: 55 push rbp
400527: 48 89 e5 mov rbp,rsp
40052a: 48 83 ec 10 sub rsp,0x10
- A StackFrame of size 0x10(16) bytes is constructed.
b. 40052e :
40052e: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
- 4 bytes specified by rbp - 0x4 is set to 0. It is 4 bytes because of DWORD PTR .
c. 400535 :
400535: eb 18 jmp 40054f <main+0x29>
- This is a simple Jump to instruction at 0x40054f. This jmp is like C’s goto construct. It will just jump without any condition - Unconditional Jump.
d. 40054f :
40054f: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9
400553: 7e e2 jle 400537 <main+0x11>
- It is comparing Value at rbp - 0x4 with 9(Comparing i with 9). If i is less than equal to 9, then it will jump to instruction at 0x400537 . At this point, Value at rbp - 0x4 is 0. So, We jump to instruction at 0x400537. It is important to note that our condition was i < 10 but here, i <= 9 is being done. Both are obviously the same.
** There is important thing to understand: When we describe a while() loop, what we say is First the condition inside is checked, only then the body of the while is executed if the condition satisfies. Here, we just did the same thing. First executed the cmp instruction which is the while condition. The jle instruction decides if we have to jump into body of while loop or just move out of it.
** I hope you are able to appreciate the amount of clarity you get regarding day-to-day code constructs by understanding it’s assembly equivalent code.
e. Body of while() :
400537: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
40053a: 89 c6 mov esi,eax
40053c: bf e4 05 40 00 mov edi,0x4005e4
400541: b8 00 00 00 00 mov eax,0x0
400546: e8 b5 fe ff ff call 400400 <printf@plt>
40054b: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1
-
The printf() is executed. Then add DWORD PTR [rbp-0x4],0x1 adds 1 to i. Then, the following 2 instructions will be executed which are the while() condition check and whether or not to jump into body of while() or out of it.
40054f: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9 400553: 7e e2 jle 400537 <main+0x11>
f. End:
400555: b8 00 00 00 00 mov eax,0x0
40055a: c9 leave
40055b: c3 ret
- When DWORD PTR[rbp - 0x4] becomes 10, jle doesn’t get executed. So, the above instructions are executed.
I hope you have understood how a while loop works in detail. You will have to write programs to understand for and do while loops on your own.
Program4
In this example, we will see how a structure is translated into assembly level.
~/rev_eng_series/post_5$ cat code4.c
#include<stdio.h>
int main() {
struct details {
char name[30];
int rollno;
};
return 0;
}
~/rev_eng_series/post_5$ gcc code4.c -o code4
a. Let us checkout it’s disassembly quickly.
00000000004004d6 <main>:
4004d6: 55 push rbp
4004d7: 48 89 e5 mov rbp,rsp
4004da: b8 00 00 00 00 mov eax,0x0
4004df: 5d pop rbp
4004e0: c3 ret
b. Wow! There is no sign of structure inside the Assembly code. Why is this?
This will take us back to C programming basics. You would have probably studied that Definition of a structure won’t take memory. But, if you declare a struct of that type, that will take memory.
We saw the same thing. We had the definiton of the structure, but there were no instances of that structure. So, Assembly code is pretty simple.
c. To this C program, let us declare a struct details student1 and see what the disassembly looks like.
0000000000400546 <main>:
400546: 55 push rbp
400547: 48 89 e5 mov rbp,rsp
40054a: 48 83 ec 30 sub rsp,0x30
40054e: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
400555: 00 00
400557: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
40055b: 31 c0 xor eax,eax
40055d: b8 00 00 00 00 mov eax,0x0
400562: 48 8b 55 f8 mov rdx,QWORD PTR [rbp-0x8]
400566: 64 48 33 14 25 28 00 xor rdx,QWORD PTR fs:0x28
40056d: 00 00
40056f: 74 05 je 400576 <main+0x30>
400571: e8 aa fe ff ff call 400420 <__stack_chk_fail@plt>
400576: c9 leave
400577: c3 ret
d. Let us analyze it to understand better. For now, ignore instructions at addresses 0x40054e, 0x400557 and 0x400571 . These require a bit of explanation which I will cover in the next post.
Analysis:
i. Construction of StackFrame:
400546: 55 push rbp
400547: 48 89 e5 mov rbp,rsp
40054a: 48 83 ec 30 sub rsp,0x30
- A StackFrame of size 0x30(48) bytes is constructed.
ii. Next:
40055b: 31 c0 xor eax,eax
40055d: b8 00 00 00 00 mov eax,0x0
- xor eax, eax set eax to 0 and so does mov eax, 0x0 . I don’t know why the compiler has compiled these 2 instructions together. Something to think about.
iii. End:
400576: c9 leave
400577: c3 ret
- This will end the main function and return control to the caller function.
In this example, we saw Stack space being allocated for the structure student1. If we had given student1.name a string and a number to student1.rollno, then we would have seen that code.
That is an exercise left to you .
With the 4 examples, I hope you have got some clarity on how programs are executed, how the compiler converts a C program into equivalent assembly code, how different types of variables are stored.
With this article, we know how C programs are converted to executables, what those executables contain, how does it look like in the memory when an executable is run and how our C programs are converted to assembly code and then executed.
Things you can do to understand the concepts better:
-
You can write your own programs, disassemble it and understand the internals. As you also got introduced to gdb, you can run it and see all the registers and memory locations. There are many code constructs which we did not cover like if-else, if-else-if, switch, for(), do-while, union etc., Write programs and understand.
-
We just used 4-5 gdb commands in this article. But, gdb is a Debugger which is very powerful. We will slowly unleash it’s power as we move ahead.
And
A few things which were not covered properly:
-
In every example, there were instructions related to printf() but I did not give much emphasis on them. This was done deliberately because it requires a few new concepts to understand function calls.
-
Why bytes are stored in reverse order? (When we checked out the stack using x/ command).
-
In the structure example, I asked you to ignore a few instructions. Even these require a few concepts to understand.
All 3 are important concepts and will be discussed in depth in the next article.
With this, I will end this article. I hope you enjoyed the article and learnt something out of it. Thank you!
Go to next post: Program Execution Internals - Part2
Go to previous post: Memory Layout of a Process