Writing an ELF Parsing Library - Part1 - What is ELF?
Hello Friend!
We will start with the basics needed to start writing this library. First of all, we need to understand what ELF is, what it stands for, where it is used and more.
In this article, we’ll be doing the following.
- Exploring ELF in a superficial manner.
- Skimming through various sources which we will be using later to understand ELF better and writing the library.
Let us get started!
0. What is ELF?
The Executable and Linkable Format is a file format of executable files, libraries, object files in UNIX-like systems.
The ELF is a very interesting and magical format. It is a combination of tightly knit data structures. It has a lot of interesting structures involved, each doing a particular function. We’ll go over every structure in detail and write code to parse that structure and dump it in human readable form.
Let us look at what elf’s manpage has to say.
ELF(5) Linux Programmer's Manual ELF(5)
NAME
elf - format of Executable and Linking Format (ELF) files
SYNOPSIS
#include <elf.h>
DESCRIPTION
The header file <elf.h> defines the format of ELF executable binary
files. Amongst these files are normal executable files, relocatable
object files, core files, and shared objects.
We’ll look at the file format’s internals in future articles. Now, we’ll discuss about the various types of ELF files. Let us take a simple hello program to understand them.
$ cat hello.c
#include <stdio.h>
int main()
{
printf("Hello World!\n");
return 0;
}
and compile it in the following manner.
$ gcc hello.c -o hello --save-temps
$ ls
hello hello.c hello.i hello.o hello.s
1. Executable file: A file which can be run by the Operating System. It is generated by linking one or more object files. It can also be using Dynamic Linking to access functions of other shared object files. The main difference between other ELF files and an Executable file is that this type of file has an entry-point. Let us run hello
, the executable.
$ ./hello
Hello World!
- This file has an entry point to the program we wrote.
$ readelf -h hello | grep Entry
Entry point address: 0x400430
2. Shared Object file: You might have heard of program using libraries. Each of these libraries are present in the form of shared object files. These libraries are even called Shared libraries. Let us see what shared libraries our hello
program uses.
$ ldd hello
linux-vdso.so.1 => (0x00007ffc320ac000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f975fa8e000)
/lib64/ld-linux-x86-64.so.2 (0x00007f975fe58000)
-
The second one is the most familiar -
libc
- The C Library. It is infact usingprintf
function, which is part of the standard C library. -
Why are they called shared objects though? They are called so because they can be shared among multiple processes. Our
hello
program is usinglibc
. Some other program might also be using somelibc
function. The OS generally stored only one copy oflibc
in the main memory and all programs keep using the same code. So, this one copy oflibc
is a shared resource. This applies for all shared objects present. -
This is just a general definition of a shared object. When we get into specifics later, you’ll realize there is a lot more to it than just a simple definition.
3. Object file: Direct machine code equivalent of a C source file. It has just a little metadata(which is part of ELF) to keep the code organized. The hello.o
file is an object file.
- It still is not part of an executable or a shared object. The linker and the programmer will decide it’s fate. Try running it and see what you get.
$ ./hello.o
bash: ./hello.o: cannot execute binary file: Exec format error
4. Core file: You may not have come across this file, but you definitely would have come across this line: Segmentation fault(core dumped)
. Have you ever wondered what this core dumped
means?
-
When a program crashes for some reason, the immediate task is to understand why it crashed. Is it some bug, or some internal system error etc., A core file will help you find it. It takes a snapshot of your program in the main memory when it just crashed. Things like at what function, what address did your program crash? What did the stack have? etc., These details can shed lot of light on why the program crashed.
-
Generally, though we get the
core dumped
message, it may not get dumped. In many systems, this dumping of core is suppressed by default. It needs to be enabled if you want to see what a core file looks like. Take it as an exercise to enable the core dumping.
These 4 files are the most used ELF files. We’ll be talking about them in detail in later posts.
In case if you didn’t understand what linking is, or what that --save-temps
flag did, you can refer to this post which talks in details about all that.
With that, we know what files are ELF. Let us go a bit deeper into the ELF structure now.
1. A bit about ELF
Before analyzing ELF in detail, let us think what all should an ELF file have.
Let us consider an executable file for now. What all should it have?
-
It should have the code we wrote(of course in machine code form). Generally, this is called
text
segment. -
It should have details about any global variables present.
-
It should have all the
constant
data we have used throughout the program. Things like hardcoded strings. -
In the previous section, we saw that the
hello
executable useslibc
library. The executable should have some info about what libraries it depends on. When we run the executable, these libraries are loaded and our program can use it. If you want to know more about how libraries and programs are loaded to main memory for execution, you can read through this post -
Along with libraries, we’d also need the functions used. In our program, we used
printf
. -
It should have which type of machine code it has. Does it have x86(32-bit Intel/AMD) code, x64(64-bit Intel/AMD), ARM, PowerPC etc., It is common sense that a file which has Intel code cannot be run on an ARM machine. It needs to be understood that all machine code are of the form 0s and 1s. But, there is a difference.
mov r1, r2
for an Intel processor might be0010101010
at binary level, but the same mov instruction might be101011100
for an ARM processor. So, the machine type is damn important.
These are obvious things which the ELF should contain.
Let us take a look at the strings hello
has. It’ll give us insight into what more it has.
$ strings hello > hello.str
- The executable has the compiler and it’s version used to make it.
GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
- Look at this:
$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- It has the library functions’ names being used.
puts@@GLIBC_2.2.5
__libc_start_main@@GLIBC_2.2.5
It has lot of other stuff, which we’ll discuss about later.
Now, we’ll open up hello
in a text editor and see it’s contents. It looks like this.
This doesn’t help at all. Some strings here and there, but we already have strings.
We’ll open it up with a hexeditor
. Download it in the following manner.
$ sudo apt-get install ncurses-hexedit
Opening our hello
program using it looks like this.
$ hexeditor hello
The following are the first 16 bytes:
00000000 7F 45 4C 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
-
Notice the first 4 bytes. It is
0x7f 0x45 0x4c 0x46
, which is essentially7f E L F
. These 4 characters are known as the Magic characters. These bytes are an ELF file’s signature. The first 4 bytes of every ELF file are always 7f E L F. You can inspect thehello.o
or/lib/x86_64-linux-gnu/libc.so.6
, every ELF file’s first 4 characters are these. -
These 4 characters don’t give any info about the contents of an ELF file, but will tell if it is an ELF file or not.
Now, let us go back the ELF’s manpage.
An executable file using the ELF file format consists of an
ELF header, followed by a program header table or a section
header table, or both. The ELF header is always at offset
zero of the file. The program header table and the section
header table's offset in the file are defined in the ELF
header. The two tables describe the rest of the particulari‐
ties of the file.
- The file always starts with an
ELF Header
(which would be a C structure). This is the road-map which helps us navigate through rest of the file. It is then followed by some table called Program Header Table or Section Header Table which describes the rest of the particularities of the file.
As you read through the manpage, you’ll find details about the magic characters we were talking about.
EI_MAG0 The first byte of the magic number.
It must be filled with ELFMAG0. (0:
0x7f)
EI_MAG1 The second byte of the magic number.
It must be filled with ELFMAG1. (1:
'E')
EI_MAG2 The third byte of the magic number.
It must be filled with ELFMAG2. (2:
'L')
EI_MAG3 The fourth byte of the magic number.
It must be filled with ELFMAG3. (3:
'F')
-
Going through, there is info about what type of ELF file this is.
e_type This member of the structure identifies the object file type: ET_NONE An unknown type. ET_REL A relocatable file. ET_EXEC An executable file. ET_DYN A shared object. ET_CORE A core file.
-
The processor which the ELF file is targeted at.
e_machine This member specifies the required architecture for an individual file. For example: EM_NONE An unknown machine. EM_M32 AT&T WE 32100. EM_SPARC Sun Microsystems SPARC. EM_386 Intel 80386. EM_68K Motorola 68000. EM_88K Motorola 88000. EM_860 Intel 80860.
As you move through, you’ll see various types of sections, each section has some part of program.
- The .text section:
.text This section holds the "text", or executable instructions, of a program. This section is of type SHT_PROGBITS. The attributes used are SHF_ALLOC and SHF_EXECINSTR.
- The .data section:
.data This section holds initialized data that contribute to the program's memory image. This section is of type SHT_PROGBITS. The attribute types are SHF_ALLOC and SHF_WRITE
Don’t worry what SHT_PROGBITS
or any of those macros are, we’ll slowly go through one by one.
You would have come across way more “sections” than these two. We’ll go through each one of them later, and we’ll also discuss what a section is in detail.
So, the manpage is a really good resource to help us write the library.
The manpage also has a reference to an elf.h
. Let us check it out. It goes like this.
$ nvim /usr/include/elf.h
/* This file defines standard ELF types, structures, and macros.
Copyright (C) 1995-2016 Free Software Foundation, Inc.
This file is part of the GNU C Library.
- It has literally everything an ELF file has - all the structures, types, macros. You can go through it, but don’t get scared :P
This was a brief introduction to ELF.
2. Can ELF be run on Windows?
The ELF file format is specific to UNIX-like systems. Thus, you won’t be able to run ELF executables on Windows.
Windows instead has its own file format for executables and object files. It uses a file format called Portable Executable(PE) for executables and Common Object File Format(COFF) for object files. In fact, PE is an extension of COFF.
The Operating System will be capable of understanding a few specific formats. Linux cannot understand PE or COFF, the same way Windows cannot understand ELF.
Macintosh has its own object file format called Mach-O.
If you want to understand Window Executables better, then you should write a PE parsing library. I really hope I write it someday.
1. Does an ELF parser already exist?
Of course. The binutils project has a few utilities like objdump
, readelf
which parse ELF files and given information in human-readable form.
Let us run readelf
on our hello
program and get ELF Header information.
$ readelf --file-header hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400430
Start of program headers: 64 (bytes into file)
Start of section headers: 6616 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 28
-
Note that these are the first few bytes in the ELF file. Open
hello
in that hexeditor and look at the first few bytes. Can you make out anything? Impossible. -
The tool will go through the ELF file byte by byte,
decode
every byte with the help ofelf.h
, manpage and prints this.
We now know what the ELF Header has. It has details about the data encoding, the OS-ABI, ELF type, Machine etc., But, we don’t know how the structure correponds to this output. What each byte means. There could be some encoding like this: 32-bit means 0, 64-bit means 1, 2 means invalid. Getting that level of clarity is the idea - what C structures, members of each structure, each member’s possible values etc., I personally love getting this type of clarity, rather than just an abstract idea of knowing that the above information is extracted from the ELF Header.
2. Resources to get started with ELF
There are lot of good resources, blogs online which use already existing ELF parsing tools and explain ELF. I don’t want use them to write our library.
We’ll refer to a bunch of standard resources like
elf.h
: Header file which defines everything about an ELF File- ELF’s manpage: Lot of information about ELF is present in English here. Will help us understand the structure better.
- AMD64’s Application Binary Interface: Official document which tells what each member in ELF’s structures should have for a 64-bit AMD(or Intel) file (could be executable, object, shared object, core) should have.
These are sufficient to write a kick-ass ELF parsing library!
3. Conclusion
I hope you got an idea of what ELF is. I urge you to read through the manpage.
In the next article, we’ll see what our library should look like, its design, some security measures while writing a library and finally start coding the library.
With that, I’ll end this article.
Thank you for reading!
Go to Home: libelfp - An ELF parsing library
Go to next article: Writing an ELF Library - Part2 - Piloting the Library