Exploitation using Code Injection - Part1
Hey fellow pwners!
In the previous article, we saw what Control Flow Hijacking in detail along with a couple of examples. We saw how a BOF can be exploited to execute an exploit function present in the software.
It was a good example to understand Control Flow Hijacking. But a problem with it was that no software would have such evil exploit functions in it. No developer would add malicious code along with the original software.
So, is there a more realistic way to write exploits to abuse vulnerable software? A Big Yes!
In this article, we will look at one of the most famous exploit methods known as Code Injection Exploit method. This method doesn’t expect any exploit function to be present in the vulnerable software to exploit it. It is like a standalone method which does a great job.
Before we start, let us disable the security feature which randomizes addresses(refer to the previous article). This makes our experimentation easier.
~/rev_eng_series/post_8$ cat /proc/sys/kernel/randomize_va_space
2
The value in the file randomize_va_space is 2. Which means the Operating System is instructed to randomize addresses. If it is set to 0, then we are telling the Operating System to stop all the randomization. So, let us do that.
~/rev_eng_series/post_8$ echo 0 > /proc/sys/kernel/randomize_va_space
bash: /proc/sys/kernel/randomize_va_space: Permission denied
~/rev_eng_series/post_8$ sudo echo 0 > /proc/sys/kernel/randomize_va_space
bash: /proc/sys/kernel/randomize_va_space: Permission denied
~/rev_eng_series/post_8$ sudo su root
[sudo] password for adwi:
/home/adwi/rev_eng_series/post_8# sudo echo 0 > /proc/sys/kernel/randomize_va_space
/home/adwi/rev_eng_series/post_8#
Note that even sudo cannot do this. You must be root to do this change. This signifies how important a security feature this is to the Operating System. We will talk about this in detail in one of the future posts.
These are the contents in this article:
- A quick recap!
- What does executing machine code mean? - Explained with an example
- What is a System Call? - An important Operating System concept to be understood before we dive into actual code injection.
- Code Injection - Introduction
With this done, let us start!
A quick recap!
Let us quickly recap what we had done in the previous article. Let us take the following example:
~/rev_eng_series/post_9$ cat code1.c
#include<stdio.h>
#include<string.h>
void vuln_func() {
char buffer[100];
gets(buffer);
printf("%s\n", buffer);
return;
}
int main() {
printf("Before vuln_func. \n");
vuln_func();
printf("After vuln_func. \n");
return 0;
}
-
The example in the previous article had another function called exploit_func. The objective was to exploit the BOF and execute that function. The key was to overwrite the Return Address of vuln_func with the starting address of exploit_func. So, vuln_func instead of returning the control back to main, the control goes over to exploit_func.
-
Our point is no such evil, exploit function would be present in any software and I feel this is a valid argument. So, what do we do?
-
Check this out. In the examples we have seen so far, the buffer was being filled with crap and we only cared what value the Return Address was being overwritten with. In this article, we will see how we can write some exploit code we want to execute in the buffer of 100 bytes. Then overwrite the Return Address with the Starting address of buffer. So, when the ret instruction is executed, the control is transferred to the code in the buffer. As buffer contains the exploit code we want to execute, it gets executed without any problem.
-
This is what we saw did in the previous article:
main ----------> vuln_func ----------> exploit_func (unrealistic function) || | || | |--------<----------------<----------------- | |-------------SegFault
-
This is roughly what we will try to achive in this article:
main -----------> vuln_func -----------> buffer(present in vuln_func)----------> run something cool!
But in what language do we write code into the buffer? Do we use C or assembly or what?
We have written C code, we have written assembly code. We are going one more level lower this time, to the lowest level we can get. We will be writing / injecting machine code directly into the buffer. Yes. We will be dealing with 0s and 1s from here.
Let us keep the BOF aside for a while and understand how do we inject and execute direct machine code.
Executing machine code?
Let us take the following program:
~/rev_eng_series/post_9$ cat code2.c
#include<stdio.h>
#include<string.h>
#include<unistd.h>
int main() {
char buffer[100];
read(0, buffer, sizeof(buffer));
void (*executeme)();
executeme = buffer;
executeme();
return 0;
}
-
The program is simple. It takes in whatever we type and executes it. It typecasts a char pointer into a function pointer. It executes whatever is present in the buffer.
-
Compile the program in the following manner:
~/rev_eng_series/post_9$ gcc code2.c -o code2 -zexecstack code2.c: In function ‘main’: code2.c:11:12: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types] executeme = buffer; ^ ~/rev_eng_series/post_9$
-
The first point to note is we compiled code2.c with this option -zexecstack. This allows execution of code present in stack. We will talk about this in detail in a future post.
Execution: Round1
In this round of execution, we will just give some random strings and see how it gets executed or how the processor tries to execute it.
~/rev_eng_series/post_9$ ./code2
asd
Illegal instruction (core dumped)
~/rev_eng_series/post_9$ ./code2
qwe
Segmentation fault (core dumped)
~/rev_eng_series/post_9$ ./code2
owihowiehfowiehf
Segmentation fault (core dumped)
~/rev_eng_series/post_9$ ./code2
qwerty
Segmentation fault (core dumped)
~/rev_eng_series/post_9$
-
Look at this. They are all some random strings. The program crashed in each one of the cases.
-
It is important to understand what the processor is doing with these strings. The aim of this program is that whatever we input, that gets executed. But we still don’t know what it means. What does it mean when we say that the whole input is getting executed?
Analysis: Round1
Let us open up the executable with gdb and analyse what is going on.
~/rev_eng_series/post_9$ gdb -q code2
Reading symbols from code2...(no debugging symbols found)...done.
gdb-peda$ b main
Breakpoint 1 at 0x40059a
gdb-peda$
- Analysis of main function:
-
This is the disassembly of the main function.
gdb-peda$ disass main Dump of assembler code for function main: 0x0000000000400596 <+0>: push rbp 0x0000000000400597 <+1>: mov rbp,rsp 0x000000000040059a <+4>: add rsp,0xffffffffffffff80 0x000000000040059e <+8>: mov rax,QWORD PTR fs:0x28 0x00000000004005a7 <+17>: mov QWORD PTR [rbp-0x8],rax 0x00000000004005ab <+21>: xor eax,eax 0x00000000004005ad <+23>: lea rax,[rbp-0x70] 0x00000000004005b1 <+27>: mov edx,0x64 0x00000000004005b6 <+32>: mov rsi,rax 0x00000000004005b9 <+35>: mov edi,0x0 0x00000000004005be <+40>: call 0x400470 <read@plt> 0x00000000004005c3 <+45>: lea rax,[rbp-0x70] 0x00000000004005c7 <+49>: mov QWORD PTR [rbp-0x78],rax 0x00000000004005cb <+53>: mov rdx,QWORD PTR [rbp-0x78] 0x00000000004005cf <+57>: mov eax,0x0 0x00000000004005d4 <+62>: call rdx 0x00000000004005d6 <+64>: mov eax,0x0 0x00000000004005db <+69>: mov rcx,QWORD PTR [rbp-0x8] 0x00000000004005df <+73>: xor rcx,QWORD PTR fs:0x28 0x00000000004005e8 <+82>: je 0x4005ef <main+89> 0x00000000004005ea <+84>: call 0x400460 <__stack_chk_fail@plt> 0x00000000004005ef <+89>: leave 0x00000000004005f0 <+90>: ret End of assembler dump. gdb-peda$
-
Construction of StackFrame:
0x0000000000400596 <+0>: push rbp 0x0000000000400597 <+1>: mov rbp,rsp 0x000000000040059a <+4>: add rsp,0xffffffffffffff80
- Here, instead of sub rsp, < ImmValue >, there is add rsp, < ImmValue >. If you add a negative number to rsp, it is the same as subtracting a positive number from it. You can see that the number being added is 0xffffffffffffff80 which is a negative number.
-
Calling the read function:
0x00000000004005ab <+21>: xor eax,eax 0x00000000004005ad <+23>: lea rax,[rbp-0x70] 0x00000000004005b1 <+27>: mov edx,0x64 0x00000000004005b6 <+32>: mov rsi,rax 0x00000000004005b9 <+35>: mov edi,0x0 0x00000000004005be <+40>: call 0x400470 <read@plt>
-
This function takes input from any given file. This is what the manpage has to say:
NAME read - read from a file descriptor SYNOPSIS #include <unistd.h> ssize_t read(int fd, void *buf, size_t count);
-
fd is the File Descriptor. This is a unique number used to identify an open file. By default, 0 is the File Descriptor of Standard Input, 1 is the File Descriptor of Standard Output, 2 is the File Descriptor of Standard Error.
-
Here, I want to read from Standard Input(or take in the User Input). So, The first argument is 0. Register rdi has this value.
-
Second Argument is the Address of the Buffer. The Address of buffer is rbp-0x70. Register rsi has this value.
-
Third Argument is the number of bytes to take in. Obviously, we want to restrict it to the size of the buffer. Size of buffer is 100 bytes. So, 0x64 / 100 is loaded into Register rdx.
-
Then read function is called.
-
-
Calling the Function using the Function Pointer executeme:
0x00000000004005c3 <+45>: lea rax,[rbp-0x70] 0x00000000004005c7 <+49>: mov QWORD PTR [rbp-0x78],rax 0x00000000004005cb <+53>: mov rdx,QWORD PTR [rbp-0x78] 0x00000000004005cf <+57>: mov eax,0x0 0x00000000004005d4 <+62>: call rdx
-
We know that rbp-0x70 is the Address of buffer.
-
After instruction <+53> gets executed, rdx will have the Address of the buffer. rbp-0x78 is the function pointer executeme. We had done this: executeme = buffer. That is what these 3 instructions are about.
-
executeme() is equivalent to call rdx - the fifth instruction.
-
Notice that typecasting is just the compiler doing everything. The processor doesn’t know what typecasting is.
-
-
Now that we have a thorough understanding of what the main function does, let us go ahead and execute the program.
- Run the program!
-
Let us break at 0x00000000004005d4 <+62>: call rdx. This is just before executing what we have entered.
gdb-peda$ b *0x00000000004005d4 Breakpoint 2 at 0x4005d4
-
Let us run!
Breakpoint 1, 0x000000000040059a in main () gdb-peda$ continue Continuing. qwerty1234
-
I entered qwerty1234. The processor tries to execute it. We should understand what that means. Execute q? Execute w? Execute what?
-
This is the state just before call rdx :
[----------------------------------registers-----------------------------------] RAX: 0x0 RBX: 0x0 RCX: 0x7ffff7b04260 (<__read_nocancel+7>: cmp rax,0xfffffffffffff001) RDX: 0x7fffffffda20 ("qwerty1234\n") RSI: 0x7fffffffda20 ("qwerty1234\n") RDI: 0x0 RBP: 0x7fffffffda90 --> 0x400600 (<__libc_csu_init>: push r15) RSP: 0x7fffffffda10 --> 0x0 RIP: 0x4005d4 (<main+62>: call rdx) R8 : 0x400670 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x37b R11: 0x246 R12: 0x4004a0 (<_start>: xor ebp,ebp) R13: 0x7fffffffdb70 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x4005c7 <main+49>: mov QWORD PTR [rbp-0x78],rax 0x4005cb <main+53>: mov rdx,QWORD PTR [rbp-0x78] 0x4005cf <main+57>: mov eax,0x0 => 0x4005d4 <main+62>: call rdx 0x4005d6 <main+64>: mov eax,0x0 0x4005db <main+69>: mov rcx,QWORD PTR [rbp-0x8] 0x4005df <main+73>: xor rcx,QWORD PTR fs:0x28 0x4005e8 <main+82>: je 0x4005ef <main+89> No argument [------------------------------------stack-------------------------------------] 0000| 0x7fffffffda10 --> 0x0 0008| 0x7fffffffda18 --> 0x7fffffffda20 ("qwerty1234\n") 0016| 0x7fffffffda20 ("qwerty1234\n") 0024| 0x7fffffffda28 --> 0xa3433 ('34\n') 0032| 0x7fffffffda30 --> 0x0 0040| 0x7fffffffda38 --> 0x0 0048| 0x7fffffffda40 --> 0x0 0056| 0x7fffffffda48 --> 0xff00 [------------------------------------------------------------------------------] Legend: code, data, rodata, value Breakpoint 2, 0x00000000004005d4 in main () gdb-peda$
-
rdx has the Stack Address 0x7fffffffda20 which is the Address of buffer(Check the stack).
-
Let’s go ahead and execute call rdx. This is like any other function call.
gdb-peda$ si
- This is si / Step Instruction. Not Next Instruction. You will see the difference.
-
This is the current state:
[----------------------------------registers-----------------------------------] RAX: 0x0 RBX: 0x0 RCX: 0x7ffff7b04260 (<__read_nocancel+7>: cmp rax,0xfffffffffffff001) RDX: 0x7fffffffda20 ("qwerty1234\n") RSI: 0x7fffffffda20 ("qwerty1234\n") RDI: 0x0 RBP: 0x7fffffffda90 --> 0x400600 (<__libc_csu_init>: push r15) RSP: 0x7fffffffda08 --> 0x4005d6 (<main+64>: mov eax,0x0) RIP: 0x7fffffffda20 ("qwerty1234\n") R8 : 0x400670 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x37b R11: 0x246 R12: 0x4004a0 (<_start>: xor ebp,ebp) R13: 0x7fffffffdb70 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] => 0x7fffffffda20: jno 0x7fffffffda99 | 0x7fffffffda22: gs jb 0x7fffffffda99 | 0x7fffffffda25: jns 0x7fffffffda58 | 0x7fffffffda27: xor dh,BYTE PTR [rbx] |-> 0x7fffffffda99: fsub DWORD PTR [rdx+0x7ffff7] 0x7fffffffda9f: add BYTE PTR [rax],al 0x7fffffffdaa1: add BYTE PTR [rax],al 0x7fffffffdaa3: add BYTE PTR [rax],al JUMP is taken [------------------------------------stack-------------------------------------] 0000| 0x7fffffffda08 --> 0x4005d6 (<main+64>: mov eax,0x0) 0008| 0x7fffffffda10 --> 0x0 0016| 0x7fffffffda18 --> 0x7fffffffda20 ("qwerty1234\n") 0024| 0x7fffffffda20 ("qwerty1234\n") 0032| 0x7fffffffda28 --> 0xa3433 ('34\n') 0040| 0x7fffffffda30 --> 0x0 0048| 0x7fffffffda38 --> 0x0 0056| 0x7fffffffda40 --> 0x0 [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x00007fffffffda20 in ?? () gdb-peda$
-
Look at the code part. We had qwerty1234\n in the buffer. But here, we see some instructions. What are these? This requires some explanation.
-
Every assembly instruction has it’s own representation at the machine code level. We have seen many instructions like push rbp, mov rax, rbx etc., Each one has it’s own representation at the machine code level. This is what the assembler does. It converts assembly instructions to corresponding machine code - 0s and 1s.
-
So, qwerty1234\n is an array of ascii characters. Each ascii character has a corresponding ascii value. Let’s have a look at this table:
~/rev_eng_series/post_9$ ascii -x 0 NUL 10 DLE 20 30 0 40 @ 50 P 60 ` 70 p 1 SOH 11 DC1 21 ! 31 1 41 A 51 Q 61 a 71 q 2 STX 12 DC2 22 " 32 2 42 B 52 R 62 b 72 r 3 ETX 13 DC3 23 # 33 3 43 C 53 S 63 c 73 s 4 EOT 14 DC4 24 $ 34 4 44 D 54 T 64 d 74 t 5 ENQ 15 NAK 25 % 35 5 45 E 55 U 65 e 75 u 6 ACK 16 SYN 26 & 36 6 46 F 56 V 66 f 76 v 7 BEL 17 ETB 27 ' 37 7 47 G 57 W 67 g 77 w 8 BS 18 CAN 28 ( 38 8 48 H 58 X 68 h 78 x 9 HT 19 EM 29 ) 39 9 49 I 59 Y 69 i 79 y A LF 1A SUB 2A * 3A : 4A J 5A Z 6A j 7A z B VT 1B ESC 2B + 3B ; 4B K 5B [ 6B k 7B { C FF 1C FS 2C , 3C < 4C L 5C \ 6C l 7C | D CR 1D GS 2D - 3D = 4D M 5D ] 6D m 7D } E SO 1E RS 2E . 3E > 4E N 5E ^ 6E n 7E ~ F SI 1F US 2F / 3F ? 4F O 5F _ 6F o 7F DEL
-
Ascii value of q is 0x71, w is 0x77 etc.,
-
The processor doesn’t care if it is ascii character q or w or a or anything. What it cares about it what they are stored as in the memory. Ascii value of each character is stored in the main memory. Let us use the x/ command and confirm this:
gdb-peda$ x/12xb 0x7fffffffda20 0x7fffffffda20: 0x71 0x77 0x65 0x72 0x74 0x79 0x31 0x32 0x7fffffffda28: 0x33 0x34 0x0a 0x00 gdb-peda$
-
Processor sees it like this: \x71\x77\x65\x72\x74\x79\x31\x32\x33\x34\x0a .
-
Let us look at what instructions these 11 bytes represent:
gdb-peda$ x/6i 0x7fffffffda20 => 0x7fffffffda20: jno 0x7fffffffda99 0x7fffffffda22: gs jb 0x7fffffffda99 0x7fffffffda25: jns 0x7fffffffda58 0x7fffffffda27: xor dh,BYTE PTR [rbx] 0x7fffffffda29: xor al,0xa 0x7fffffffda2b: add BYTE PTR [rax],al gdb-peda$
-
The i in x/6i stands for instruction. I am asking gdb to display 6 instructions at the address 0x7fffffffda20.
-
Look at these instructions. gdb has done the work of taking these 11 bytes, disassembling it and displaying it.
-
There are 11 bytes. So, we have to consider bytes from 0x7fffffffda20 to 0x7fffffffda2a. Let us ignore the instruction at 0x7fffffffda2b.
-
qwerty1234\n is disassembled into 5 x64 instructions. And all these 5 instructions seem to be proper valid instructions. Honestly I did not know that qwerty1234\n represented valid instructions.
-
-
Let us take a closer look at these instructions and their machine code.
gdb-peda$ x/i 0x7fffffffda20 0x7fffffffda20: jno 0x7fffffffda99 gdb-peda$ x/2i 0x7fffffffda20 0x7fffffffda20: jno 0x7fffffffda99 0x7fffffffda22: gs jb 0x7fffffffda99 gdb-peda$ x/3i 0x7fffffffda20 0x7fffffffda20: jno 0x7fffffffda99 0x7fffffffda22: gs jb 0x7fffffffda99 0x7fffffffda25: jns 0x7fffffffda58 gdb-peda$ x/4i 0x7fffffffda20 0x7fffffffda20: jno 0x7fffffffda99 0x7fffffffda22: gs jb 0x7fffffffda99 0x7fffffffda25: jns 0x7fffffffda58 0x7fffffffda27: xor dh,BYTE PTR [rbx] gdb-peda$ x/5i 0x7fffffffda20 0x7fffffffda20: jno 0x7fffffffda99 0x7fffffffda22: gs jb 0x7fffffffda99 0x7fffffffda25: jns 0x7fffffffda58 0x7fffffffda27: xor dh,BYTE PTR [rbx] 0x7fffffffda29: xor al,0xa gdb-peda$ x/11b 0x7fffffffda20 0x7fffffffda20: 0x71 0x77 0x65 0x72 0x74 0x79 0x31 0x32 0x7fffffffda28: 0x33 0x34 0x0a
- Machine code of jno 0x7fffffffda99 is \x71\x77.
- gs jb 0x7fffffffda99 => \x65\x72\x74.
- jns 0x7fffffffda58 => \x79\x31.
- xor dh,BYTE PTR [rbx] => \x32\x33.
- xor al,0xa => \x34\x0a.
-
I hope you have understood what executing a stream of bytes mean. For that matter, the text segment in an executable is just a stream of such bytes which gets executed.
-
Let us continue with our execution: jno 0x7fffffffda99 was about to get executed.
-
jno stands for Jump if Not Overflow(Signed). Let us not worry too much about this. Finally, this is a jump instruction which gets executed and this is the state:
[----------------------------------registers-----------------------------------] RAX: 0x0 RBX: 0x0 RCX: 0x7ffff7b04260 (<__read_nocancel+7>: cmp rax,0xfffffffffffff001) RDX: 0x7fffffffda20 ("qwerty1234\n") RSI: 0x7fffffffda20 ("qwerty1234\n") RDI: 0x0 RBP: 0x7fffffffda90 --> 0x400600 (<__libc_csu_init>: push r15) RSP: 0x7fffffffda08 --> 0x4005d6 (<main+64>: mov eax,0x0) RIP: 0x7fffffffda99 --> 0x7ffff7a2d8 R8 : 0x400670 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x37b R11: 0x246 R12: 0x4004a0 (<_start>: xor ebp,ebp) R13: 0x7fffffffdb70 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x7fffffffda92: add BYTE PTR [rax],al 0x7fffffffda95: add BYTE PTR [rax],al 0x7fffffffda97: add BYTE PTR [rax],dh => 0x7fffffffda99: fsub DWORD PTR [rdx+0x7ffff7] 0x7fffffffda9f: add BYTE PTR [rax],al 0x7fffffffdaa1: add BYTE PTR [rax],al 0x7fffffffdaa3: add BYTE PTR [rax],al 0x7fffffffdaa5: add BYTE PTR [rax],al [------------------------------------stack-------------------------------------] 0000| 0x7fffffffda08 --> 0x4005d6 (<main+64>: mov eax,0x0) 0008| 0x7fffffffda10 --> 0x0 0016| 0x7fffffffda18 --> 0x7fffffffda20 ("qwerty1234\n") 0024| 0x7fffffffda20 ("qwerty1234\n") 0032| 0x7fffffffda28 --> 0xa3433 ('34\n') 0040| 0x7fffffffda30 --> 0x0 0048| 0x7fffffffda38 --> 0x0 0056| 0x7fffffffda40 --> 0x0 [------------------------------------------------------------------------------] Legend: code, data, rodata, value 0x00007fffffffda99 in ?? () gdb-peda$
-
We are looking at bytes stored at 0x7fffffffda99. fsub is a valid Intel instruction. It is a floating-point instruction. Let us go ahead and execute it.
[----------------------------------registers-----------------------------------] RAX: 0x0 RBX: 0x0 RCX: 0x7ffff7b04260 (<__read_nocancel+7>: cmp rax,0xfffffffffffff001) RDX: 0x7fffffffda20 ("qwerty1234\n") RSI: 0x7fffffffda20 ("qwerty1234\n") RDI: 0x0 RBP: 0x7fffffffda90 --> 0x400600 (<__libc_csu_init>: push r15) RSP: 0x7fffffffda08 --> 0x4005d6 (<main+64>: mov eax,0x0) RIP: 0x7fffffffda99 --> 0x7ffff7a2d8 R8 : 0x400670 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x37b R11: 0x246 R12: 0x4004a0 (<_start>: xor ebp,ebp) R13: 0x7fffffffdb70 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x10207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x7fffffffda92: add BYTE PTR [rax],al 0x7fffffffda95: add BYTE PTR [rax],al 0x7fffffffda97: add BYTE PTR [rax],dh => 0x7fffffffda99: fsub DWORD PTR [rdx+0x7ffff7] 0x7fffffffda9f: add BYTE PTR [rax],al 0x7fffffffdaa1: add BYTE PTR [rax],al 0x7fffffffdaa3: add BYTE PTR [rax],al 0x7fffffffdaa5: add BYTE PTR [rax],al [------------------------------------stack-------------------------------------] 0000| 0x7fffffffda08 --> 0x4005d6 (<main+64>: mov eax,0x0) 0008| 0x7fffffffda10 --> 0x0 0016| 0x7fffffffda18 --> 0x7fffffffda20 ("qwerty1234\n") 0024| 0x7fffffffda20 ("qwerty1234\n") 0032| 0x7fffffffda28 --> 0xa3433 ('34\n') 0040| 0x7fffffffda30 --> 0x0 0048| 0x7fffffffda38 --> 0x0 0056| 0x7fffffffda40 --> 0x0 [------------------------------------------------------------------------------] Legend: code, data, rodata, value Stopped reason: SIGSEGV 0x00007fffffffda99 in ?? () gdb-peda$
-
And we hit a dead-end. The program crashed due to a Segmentation Fault.
The whole point of this analysis is to understand what execution of a bunch of bytes mean. I hope you have got some clarity over it.
Do the same analysis with different random inputs and reinforce your understanding.
We have to understand some concepts before we actually get into code injection.
What is a system call?
Before we get into examples, we will understand a very important concept. We will understand what a System Call is.
Consider this. The only way to communicate with the processor and get work done is by using it’s Instruction Set. It is a set of well-defined instructions which can be understood, interpreted and executed by the processor.
Now let us come to what an Operating System is. Operating System is a software which is in-between the machine / hardware and the user. It is a middleman which takes care of everything. Take a look at this diagram:
------------- -------------
| | | |
| Processor | | OS |
| | | (Kernel) |
------------- -------------
|
Instruction Set
You can think of an Operating System as a Black Box which does a lot of Resource Management. There is a well-defined way to talk to a processor(Through the Instruction Set only). In the same way, there should be a well-defined way to talk to the Operating System also.
You might ask, why do you even need such a method to talk to the OS? A lot of reasons. As we mentioned above, an Operating System does all the Resource Management. What Resources are we talking about?
-
Processor is itself a resource. There would be different processes running on one Operating System. Each one could belong to different users also. So, if there is no one to manage, the machine would go bonkers!
-
RAM / Main Memory: In any machine, there is limited amount of main memory. My machine has 4GB of RAM. I should be able to manage everything in those 4GB. The OS does the management for us. We don’t have to do anything.
-
HardDisk: The Operating System has what are known as FileSystems which help in managing HardDisk.
-
Peripheral Devices: This is what we use and see whenever we are on a computer. The Input Output devices. I am writing this article. And everytime I press a key on my keyboard, the Operating System is coming in, taking in what I type and sending it to the Monitor where I can see what I just typed. This is just one example. Suppose you are playing a multiplayer game. You are using your monitor to see the game, you are using the keyboard and mouse to control your character, your friends are all using joysticks. Look at this. Managing everything at once if definitely not an easy job.
Because of all this, the Operating System does all the Resource Management for us. To be able to use those resources, we have to talk in a very specific manner similar to a processor’s Instruction Set.
The Operating System provides an Interface which we can use and request for resources. The Interface is a huge collection of System Calls. A System Call is a nothing but a Function Call inside an OS.
Consider this example. You want to display something on the monitor. You use a System Call used to write into something. If you want to read something from a file, you use a System Call which is designed to read from something. You want to use the speaker, you use a different System Call. So, this is how the previous diagram looks like for an OS:
-------------
| |
| OS |
| |
-------------
|
Set of System calls
Why did we even have to take a detour from our code injection and understand about System Calls? The reason is, injecting code to execute System Calls is easy and System Calls are very very powerful. They talk directly to the OS. Injecting code to execute some Library functions is hard, so we go for injecting machine code of System Calls.
The following is the Standard way to execute System Calls in 32-bit Intel Systems.
- Every System Call is identified by a unique number called the System Call Number.
- A System call is a function call inside the OS. It is natural to have arguments.
-
Calling the System Call from userspace is a little different that the normal function calls. We do not use the call instruction because it is not a normal function call.
-
SystemCall Number: eax
-
Argument 0: ebx
-
Argument 1: ecx
-
Argument 2: edx
-
Argument 3: esi
-
Argument 4: edi
-
Argument 5: ebp
-
Return Value: eax
- To execute the System Call, you use this instruction: int 0x80.
Do not worry much about the int 0x80 instruction for now. It requires concepts of Operating Systems.
We can use this way to execute System Calls in 64-bit Intel Systems also. But there is a better, faster way designed for 64-bit systems which we will discuss in the next article.
This is where you can find a list of all System Calls for a 32-bit Intel System: /usr/include/asm/unistd_32.h. These are a few of them.
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
#define __NR_break 17
-
NR simply stands for number. exit’s System call number is 1. read’s number is 3. write’s number is 4 etc.,
-
For all the examples, refer to this file for the System Call number.
Let us start writing programs to understand all this better.
Program 0
To start with, we take an assembly program with no body. Conside this program:
~/rev_eng_series/post_9$ cat none.asm
section .text
global _start
_start:
jmp $
-
That jmp $ will simply jump back to itself. The $ represents the current instruction. So, it is an infinite loop. Let us assemble and link it to get the executable and execute it.
~/rev_eng_series/post_9$ nasm none.asm -f elf32 ~/rev_eng_series/post_9$ ld none.o -o none ~/rev_eng_series/post_9$ ./none
-
You get nothing. It is a program stuck in an infinite loop.
-
Let us use a wonderful program called strace and see what System Calls are executed by our program. The s in strace stands for System Call.
-
This is what I got after running with strace.
~/rev_eng_series/post_9$ strace ./none execve("./none", ["./none"], [/* 81 vars */]) = 0
-
The execve is a System Call executed when you have a new program to run. The shell executes this System Call to run our program none. This System Call is not executed by our program none. So, it can be ignored.
-
This was expected because the body of our program was almost nothing.
Program 1
Let us take up our first System call exit. Consider this equivalent C function for this:
man _exit
_EXIT(2) Linux Programmer's Manual _EXIT(2)
NAME
_exit, _Exit - terminate the calling process
SYNOPSIS
#include <unistd.h>
void _exit(int status);
#include <stdlib.h>
void _Exit(int status);
-
So, it takes 1 argument - the status. Do not get confused with C library’s _exit and the exit System call. They are Different.
-
Let us execute exit(1). Consider this assembly program:
~/rev_eng_series/post_9$ cat exit.asm section .text global _start _start: mov eax, 0x01 mov ebx, 0x01 int 0x80
-
eax is loaded with 1 - The System Call Number.
-
ebx is the Argument 0 - 1 here.
-
int 0x80 to execute the System Call.
-
Let us assemble and link the program to get the executable.
~/rev_eng_series/post_9$ nasm exit.asm -f elf32 ~/rev_eng_series/post_9$ ld exit.o -o exit ~/rev_eng_series/post_9$ ./exit ~/rev_eng_series/post_9$
-
You observe nothing because the program is just terminated because the exit System Call.
-
Let us observe using strace.
~/rev_eng_series/post_9$ strace ./exit execve("./exit", ["exit"], [/* 81 vars */]) = 0 strace: [ Process PID=5586 runs in 32 bit mode. ] exit(1) = ? +++ exited with 1 +++
-
As expected. The exit System Call is executed.
Program 2
Let us execute the write System Call. Let write a Hello World program. Consider this:
~/rev_eng_series/post_9$ cat hello.asm
section .data
str: db "Hello World", 0x0a, 0x00
section .text
global _start
_start:
mov eax, 0x04
mov ebx, 0x01
mov ecx, str
mov edx, 13
int 0x80
mov eax, 0x01
mov ebx, 0x00
int 0x80
-
The program is very simple. Consider the first part:
mov eax, 0x04 mov ebx, 0x01 mov ecx, str mov edx, 13 int 0x80
-
4 is the System Call number of write.
-
File Descriptor of Standard Output is 1 which is Argument 0.
-
Argument 1 is the Address of buffer. Here it is the string str. It is loaded to ecx.
-
Argument 2 is the number of bytes that should be written. Here, we want all 13 bytes to be written to Standard Output.
-
The second part is the exit System Call with Argument 0.
-
Let us get the executable and run it.
~/rev_eng_series/post_9$ nasm hello.asm -f elf32 ~/rev_eng_series/post_9$ ld hello.o -o hello -m elf_i386 ~/rev_eng_series/post_9$ ./hello Hello World ~/rev_eng_series/post_9$ strace ./hello execve("./hello", ["hello"], [/* 81 vars */]) = 0 strace: [ Process PID=5777 runs in 32 bit mode. ] write(1, "Hello World\n\0", 13Hello World ) = 13 exit(0) = ? +++ exited with 0 +++
-
First the write System Call is executed. Then exit is executed.
Program 3
Now that we have seen 2 simple System Calls, let us explore a System Call which is very helpful in Code Injection.
It is the execve System Call. Let us look at the manpage.
~/rev_eng_series/post_9$ man execve
NAME
execve - execute program
SYNOPSIS
#include <unistd.h>
int execve(const char *filename, char *const argv[],
char *const envp[]);
-
The execve System Call executes a new program.
-
Let us write a C program first, then the assembly program.
~/rev_eng_series/post_9$ cat execve.c #include<unistd.h> int main() { execve("/bin/sh", NULL, NULL); return 0; } ~/rev_eng_series/post_9$ gcc execve.c -o execve -m32 execve.c: In function ‘main’: execve.c:5:2: warning: null argument where non-null required (argument 2) [-Wnonnull] execve("/bin/sh", NULL, NULL); ^ ~/rev_eng_series/post_9$
-
Ignore the warning.
-
Let us execute it.
~/rev_eng_series/post_9$ ./execve $ $ $ ls code1 code2.c exit exploit.txt hello.o none.o code1.c execve exit.asm hello none peda-session-code2.txt code2 execve.c exit.o hello.asm none.asm post.md $
-
So, we have the /bin/sh executed. You can see it. Try and play around with the new shell. You will find a lot of differences between a normal shell and this.
-
Now that the C program is done, let us jump into the assembly program.
~/rev_eng_series/post_9$ cat execve.asm section .data str: db 0x2f, "bin", 0x2f, "sh", 0x00 section .text global _start _start: mov eax, 11 mov ebx, str mov ecx, 0x00 mov edx, 0x00 int 0x80 mov eax, 0x01 mov ebx, 0x00 int 0x80
-
The first part is the execve System Call. System Call Number is 11. The arguments are very similar to the C program.
-
Let go ahead, assemble it, link it and execute it.
~/rev_eng_series/post_9$ nasm execve.asm -f elf32 ~/rev_eng_series/post_9$ ld execve.o -o execve_asm_32 -m elf_i386 ~/rev_eng_series/post_9$ ./execve_asm_32 $ $
-
Bingo! It worked.
Best way to get hold of what System Calls do is by writing C programs and their corresponding assembly programs.
So, that was a really quick introduction to System Calls and how they are executed.
Back to code injection!
We had to understand what System Calls are because we will be injecting code which execute System Calls. Without knowing what a System Call is, it wouldn’t be very good to go ahead.
Ok. Let us start by injecting very simple machine code.
Injection 1
Let us inject the code none.asm. As we saw, we cannot inject assembly code. We have to inject direct machine code in the form of an array of hexadecimal values.
Let us get the machine code of the instruction jmp $. We can get it using objdump on none.
~/rev_eng_series/post_9$ objdump -Mintel -d none
none: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: eb fe jmp 8048060 <_start>
-
The Machine code is \xeb\xfe.
-
Let us get back to this program:
~/rev_eng_series/post_9$ cat code2.c #include<stdio.h> #include<string.h> #include<unistd.h> int main() { char buffer[100]; read(0, buffer, sizeof(buffer)); void (*executeme)(); executeme = buffer; executeme(); return 0; }
-
This is how we can run it:
~/rev_eng_series/post_9$ python -c "print '\xeb\xfe'" | ./code2
-
And the program is stuck like the way none got stuck. We cannot believe it until we confirm it.
-
Let us run it using gdb and confirm.
-
Put the code to be injected into a file.
~/rev_eng_series/post_9$ python -c "print '\xeb\xfe'" > none.exploit.txt
-
This is the Disassembly of main function of code2 :
~/rev_eng_series/post_9$ gdb -q code2 Reading symbols from code2...(no debugging symbols found)...done. gdb-peda$ disass main Dump of assembler code for function main: 0x0000000000400596 <+0>: push rbp 0x0000000000400597 <+1>: mov rbp,rsp 0x000000000040059a <+4>: add rsp,0xffffffffffffff80 0x000000000040059e <+8>: mov rax,QWORD PTR fs:0x28 0x00000000004005a7 <+17>: mov QWORD PTR [rbp-0x8],rax 0x00000000004005ab <+21>: xor eax,eax 0x00000000004005ad <+23>: lea rax,[rbp-0x70] 0x00000000004005b1 <+27>: mov edx,0x64 0x00000000004005b6 <+32>: mov rsi,rax 0x00000000004005b9 <+35>: mov edi,0x0 0x00000000004005be <+40>: call 0x400470 <read@plt> 0x00000000004005c3 <+45>: lea rax,[rbp-0x70] 0x00000000004005c7 <+49>: mov QWORD PTR [rbp-0x78],rax 0x00000000004005cb <+53>: mov rdx,QWORD PTR [rbp-0x78] 0x00000000004005cf <+57>: mov eax,0x0 0x00000000004005d4 <+62>: call rdx 0x00000000004005d6 <+64>: mov eax,0x0 0x00000000004005db <+69>: mov rcx,QWORD PTR [rbp-0x8] 0x00000000004005df <+73>: xor rcx,QWORD PTR fs:0x28 0x00000000004005e8 <+82>: je 0x4005ef <main+89> 0x00000000004005ea <+84>: call 0x400460 <__stack_chk_fail@plt> 0x00000000004005ef <+89>: leave 0x00000000004005f0 <+90>: ret End of assembler dump
-
Let us break at 0x00000000004005d4 <+62>: call rdx.
gdb-peda$ b *0x00000000004005d4 Breakpoint 1 at 0x4005d4 gdb-peda$
-
Let us run it.
gdb-peda$ run < none.exploit.txt [----------------------------------registers-----------------------------------] RAX: 0x0 RBX: 0x0 RCX: 0x7ffff7b04260 (<__read_nocancel+7>: cmp rax,0xfffffffffffff001) RDX: 0x7fffffffda20 --> 0xafeeb RSI: 0x7fffffffda20 --> 0xafeeb RDI: 0x0 RBP: 0x7fffffffda90 --> 0x400600 (<__libc_csu_init>: push r15) RSP: 0x7fffffffda10 --> 0x0 RIP: 0x4005d4 (<main+62>: call rdx) R8 : 0x400670 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de7ab0 (<_dl_fini>: push rbp) R10: 0x37b R11: 0x246 R12: 0x4004a0 (<_start>: xor ebp,ebp) R13: 0x7fffffffdb70 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x203 (CARRY parity adjust zero sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x4005c7 <main+49>: mov QWORD PTR [rbp-0x78],rax 0x4005cb <main+53>: mov rdx,QWORD PTR [rbp-0x78] 0x4005cf <main+57>: mov eax,0x0 => 0x4005d4 <main+62>: call rdx 0x4005d6 <main+64>: mov eax,0x0 0x4005db <main+69>: mov rcx,QWORD PTR [rbp-0x8] 0x4005df <main+73>: xor rcx,QWORD PTR fs:0x28 0x4005e8 <main+82>: je 0x4005ef <main+89> No argument [------------------------------------stack-------------------------------------] 0000| 0x7fffffffda10 --> 0x0 0008| 0x7fffffffda18 --> 0x7fffffffda20 --> 0xafeeb 0016| 0x7fffffffda20 --> 0xafeeb 0024| 0x7fffffffda28 --> 0x0 0032| 0x7fffffffda30 --> 0x0 0040| 0x7fffffffda38 --> 0x0 0048| 0x7fffffffda40 --> 0x0 0056| 0x7fffffffda48 --> 0xff00 [------------------------------------------------------------------------------] Legend: code, data, rodata, value Breakpoint 1, 0x00000000004005d4 in main () gdb-peda$
-
Let us go ahead and Step Instruction.
gdb-peda$ si
-
Check out the current state: (showing only code part)
[-------------------------------------code-------------------------------------] => 0x7fffffffda20: jmp 0x7fffffffda20 | 0x7fffffffda22: or al,BYTE PTR [rax] | 0x7fffffffda24: add BYTE PTR [rax],al | 0x7fffffffda26: add BYTE PTR [rax],al |->=> 0x7fffffffda20: jmp 0x7fffffffda20 0x7fffffffda22: or al,BYTE PTR [rax] 0x7fffffffda24: add BYTE PTR [rax],al 0x7fffffffda26: add BYTE PTR [rax],al JUMP is taken
-
Bingo! That is exactly what we wanted. Jump to the current instruction.
-
Congratulate yourself on this!! It a successful code injection.
The obvious question is, can we do better?
Injection 2
Let us inject exit(1) code.
-
First of all, let us get the machine code using objdump.
~/rev_eng_series/post_9$ objdump -Mintel -d exit exit: file format elf32-i386 Disassembly of section .text: 08048060 <_start>: 8048060: b8 01 00 00 00 mov eax,0x1 8048065: bb 01 00 00 00 mov ebx,0x1 804806a: cd 80 int 0x80
-
So, the machine code is \xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80.
-
Let us inject it and see what happens.
~/rev_eng_series/post_9$ python -c "print '\xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80'" | ./code2 ~/rev_eng_series/post_9$
-
Let us confirm by running it with gdb.
~/rev_eng_series/post_9$ python -c "print '\xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80'" > exit.exploit.txt ~/rev_eng_series/post_9$ gdb -q code2 Reading symbols from code2...(no debugging symbols found)...done. gdb-peda$ b *0x00000000004005d4 Breakpoint 1 at 0x4005d4 gdb-peda$ run < exit.exploit.txt
-
This is what the state is just before code at buffer is executed.
[-------------------------------------code-------------------------------------] => 0x7fffffffda20: mov eax,0x1 0x7fffffffda25: mov ebx,0x1 0x7fffffffda2a: int 0x80 0x7fffffffda2c: or al,BYTE PTR [rax]
-
This looks good. We can see the assembly code we have written. So, it is confirmed that exit System Call is executed.
There is a very important thing to talk about at this point:
The machine code of exit system call has NULL characters in it. It has many \x00 in it. In the program code2, we are using read to take input. Suppose we have a software where strcpy is used to copy the user input into the buffer. The strcpy function is programmed to copy everything from source till it encounters a NULL character. So, instead of \xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80 getting copied completely, only \xb8\x01 is copied. You can check this out by writing a program.
So, from now on, we have to write assembly code which gets converted to machine code without any 0 NULL characters. There should not be even a single NULL character in the machine code which we inject. This might be a bit harder but our exploits becomes more reliable because it is protected against functions which behave differently when they encounter NULL characters.
With that said, let us try to write assembly code for exit(1) System Call without \x00s. Take this as an exercise and figure this out.
The following is one way to do it:
~/rev_eng_series/post_9$ cat exit_no_null.asm
section .text
global _start
_start:
xor eax, eax
add al, 1
xor ebx, ebx
add bl, 1
int 0x80
~/rev_eng_series/post_9$ nasm exit_no_null.asm -f elf32
~/rev_eng_series/post_9$ ld exit_no_null.o -o exit_no_null -m elf_i386
~/rev_eng_series/post_9$ ./exit_no_null
~/rev_eng_series/post_9$
-
Let us take a look at objdump’s output:
~/rev_eng_series/post_9$ objdump -Mintel -d exit_no_null exit_no_null: file format elf32-i386 Disassembly of section .text: 08048060 <_start>: 8048060: 31 c0 xor eax,eax 8048062: 04 01 add al,0x1 8048064: 31 db xor ebx,ebx 8048066: 80 c3 01 add bl,0x1 8048069: cd 80 int 0x80
-
So, our machine code would be \x31\xc0\x04\x01\x31\xdb\x80\xc3\x01\xcd\x80.
-
Let us go ahead and test it with code2 program.
~/rev_eng_series/post_9$ python -c "print '\x31\xc0\x04\x01\x31\xdb\x80\xc3\x01\xcd\x80'" | ./code2 ~/rev_eng_series/post_9$
-
Confirm once with gdb also.
Summary
We discussed a lot of things today.
- Our main goal was to understand what Code Injection is.
- We started with some a program which executes whatever we input. So, we saw what it means to execute direct machine code.
- Took a small detour and saw what a System Call is. I felt this is very important to understand before going forward.
- Introduction to Code Injection - We saw how raw machine code can be injected into a buffer and it can be executed.
- Importance of not having NULL characters in machine code we generate.
We wrote assembly code which gave meaning machine code. We did inject it, but nothing exciting happened. In this article, we made sure we understood the concepts properly. In the next article, we will see how we can write complex exploits, abuse the vulnerable software to get real work done and more.
A few interesting things!
1. Library Functions and System Calls ?
We have written so many programs so far, but we never specifically mentioned the use of system calls. When we write programs, we mostly use C / C++ Library functions to get the job done. There is something you have to understand. The Library functions will internally use System Calls to make things happen.
Consider the printf function It prints a string in the Standard Output. Writing on the Standard Output is writing stuff on the Monitor. That is, our program is requesting to use some Hardware. And who manages all the Hardware Resources? The Operating System. So, there should be a very specific System Call to do this. Consider this :
char str[] = "This is a string";
int x = 100;
char c = 'a';
printf("x = %d, c = %c, str = %s\n", x, c, str);
-
There are 3 variables here and there are format specifiers used in the printf statement.
-
Now, let us see the write System Call:
NAME write - write to a file descriptor SYNOPSIS #include <unistd.h> ssize_t write(int fd, const void *buf, size_t count);
-
This write is also a C function. This is a wrapper to the actual System Call. We will see what a wrapper is in a while.
-
The System Call is simple and straight forward. It takes only 3 arguments - which are the only 3 things required to write something.
-
Argument0: Where to write - File Descriptor of the file.
-
Argument1: What to write - The Address of Buffer which has the contents to be written.
-
Argument2: How much to write - Number of bytes to write from Buffer.
-
It is simple and minimalistic.
-
-
But our printf is fancy. it supports different Data Types - integers, strings, characters etc., The write System call is not aware of any of that. It just writes bytes in a specific location.
-
What the printf Library function does it take those 4 arguments(in the example), and create a buffer with whatever we want to print on the screen.
-
The Format String = “x = %d, c = %c, str = %s\n”
-
What we actually want to print is this: “x = 100, c = a, str = This is a string\n”;
-
The printf function takes the Format String and the corresponding arguments and gives the above string which we want to print on Standard Output.
-
To confirm this, you can write a program which uses printf and check it using strace. You will see the use of a write System Call.
-
Similarly, scanf. The System Call used to take input is read. But scanf allows us to specify formats - much like printf. Even scanf uses read System Call.
2. Strange observation while writing assembly code:
-
In one of the examples of generating machine code, we put a constraint to have no NULL characters in it. So, for exit(1) this is the code I have used here:
~/rev_eng_series/post_9$ objdump -Mintel -d exit_no_null exit_no_null: file format elf32-i386 Disassembly of section .text: 08048060 <_start>: 8048060: 31 c0 xor eax,eax 8048062: 04 01 add al,0x1 8048064: 31 db xor ebx,ebx 8048066: 80 c3 01 add bl,0x1 8048069: cd 80 int 0x80
-
This worked perfectly fine. But this was not what I came up with when we discussed about having no NULL characters. I first came up with the following assembly program:
~/rev_eng_series/post_9$ objdump -Mintel -d exit_no_null_faulty exit_no_null_faulty: file format elf32-i386 Disassembly of section .text: 08048060 <_start>: 8048060: 31 c0 xor eax,eax 8048062: 40 inc eax 8048063: 31 db xor ebx,ebx 8048065: 43 inc ebx 8048066: cd 80 int 0x80 ~/rev_eng_series/post_9$ nasm exit_no_null_faulty.asm -f elf32 ~/rev_eng_series/post_9$ ld exit_no_null_faulty.o -o exit_no_null_faulty -m elf_i386 ~/rev_eng_series/post_9$ ./exit_no_null_faulty ~/rev_eng_series/post_9$
-
Perfect. The program as such is working fine. Then, even if I inject it, it should work fine. Let us inject and see what happens.
~/rev_eng_series/post_9$ python -c "print '\x31\xc0\x40\x31\xdb\x43\xcd\x80'" | ./code2 Segmentation fault (core dumped)
-
That is very very strange. There is no problem with code2. We have checked it several times and have run it with few other examples also. I was almost sure that something is wrong with the code we have injected. But everything looks fine here. I wanted to see what the problem is.
-
The following is how the processor is interpretting the machine code we injected :
[-------------------------------------code-------------------------------------] => 0x7fffffffda20: xor eax,eax 0x7fffffffda22: rex xor ebx,ebx 0x7fffffffda25: rex.XB int 0x80 0x7fffffffda28: or al,BYTE PTR [rax]
-
This is unbelievable. There are no inc instructions at all. There are rex prefix added to xor ebx, ebx and int 0x80. So, this is where the problem is.
-
This is why I did not give this version of exit(1) when we were discussing because it would disrupt the flow.
-
I was always under this belief that every Assembly Instruction has a unique Machine Code representation. That is, there is a one-one mapping between an Assembly Instruction and it’s Machine Code Representation. But this example made me rethink about this.
-
And yes. Assembly Instructions and Machine Code Representation are not in a one-one relationship. I also found this amazing DEFCON talk to back this observation: https://www.youtube.com/watch?v=eunYrrcxXfw .
-
It is important to be cautious and expect such things.
3. Enabling the randomizing security feature!
We had disabled it. It is important to enable it.
# echo 2 > /proc/sys/kernel/randomize_va_space
#
Now, your machine is safer.
With that, I will end this article. I learnt a hell lot of stuff while writing this article. I hope even you learn something out of this.
Thank you!
Go to next post: Exploitation using Code Injection - Part2
Go to previous post: Buffer Overflow Vulnerability - Part2