Hello fellow pwners!

This is the second article of Reverse Engineering and Binary Exploitation tutorial series. In the first article, we saw how an executable is generated. In this article, we will see the internal structure of an executable.

Objective:

When I first read about ELF, I realized there is a lot of information about it in the elf manpage(manual page) and elf.h header file. So, I thought explaining the same theory again would be redundant. Instead, what we can do is read the content already there, understand it and write a toy tool pmelf / parsemyelf which will read and process the contents of the executable and will display them in human-readable form. It will be like a mini readelf tool, but you would have written it from scratch.

The ELF format is a very interesting and magical format I can say. It is a combination of tightly knit data structures. As there are many concepts to explore, I have planned to divide this article into 2 parts. In this part - Part1, we will discuss general concepts required to understand ELF and explore about 40% to 50% of the whole format. In the second part, we will do the rest.

Similar to previous post, create a directory post_2 in rev_eng_series directory. If you want, make a copy of /usr/include/elf.h in post_2 directory because you will have to refer it often while writing the tool.

Let us keep the elf.h header file and elf manpage entry as reference because it has lot of details about different data structures, macros etc., used in an ELF executable. This is how it will go:

  • Read a small portion of elf manpage. This is how you open the manpage.

    $ man elf
    
  • Let us understand that portion in good detail.

  • Implement it - write code. Then repeat the same.

NOTE : We will write the tool to parse 64-bit ELF files. We can easily extend it to parse 32-bit ELF files also.

Let us start!

1. What is ELF?

  1. ELF stands for Executable and Linkable Format. This is the file format of any Executable file, Object file / relocatable file, Shared Object, Core file. This binary file format is used by most *NIX systems.
  • Executable file : A file which can be run by the Operating System. It is generated by linking one or more object files and could be using Dynamic Linking to access functions of other shared object files. This file contains the entry point of a program.

  • Object / Relocatable file : Direct machine code equivalent of a C source file. This is yet to be linked by linker to generate an executable.

  • Shared Libraries : It is also known as Dynamically Linked Object file because it contains machine code which is relocated dynamically and executed.

  • Core files : When a segmentation fault occurs, you get an alert segmentation fault(core dumped) . That core is actually the memory image of the process when it crashed. Generally, an ELF file called a core file is created and core is dumped into that file. By default, the dumping of core is suppressed, so no core file is dumped. We can turn it on. A core file can be loaded into a debugger and we can catch the exact cause of crash.

Tool initialization:

  1. Before going to the internal structure of ELF, let us write some code which will initialize the parsing process. Look at the following lines:

    #include<stdio.h>
    #include<stdlib.h>
    #include<sys/mman.h>
        
    Elf64_Ehdr *pme_elf64_hdr;
        
    void pme_err_exit(const char *errmsg) {
        
        fprintf(stderr, "%s\n", errmsg);
        exit(1);
    }
        
    int main(int argc, char **argv) {
        
        if(argc != 3) {
                fprintf(stderr, "Usage: $ %s ELF_FILE_NAME FILE_SIZE\n", argv[0]);
                exit(1);
        }
        
        int pme_fd, pme_file_size, choice;
        char *pme_file_name = argv[1], *pme_file_ptr;
        pme_file_size = atoi(argv[2]);
        
        
        //Open the specified ELF file and get the file descriptor.
        pme_fd = open(pme_file_name, O_RDONLY);
        if(pme_fd == -1)
                pme_err_exit("Error: Unable to open the specified ELF file");
        
        
        //Copy the whole ELF file from disk to main memory.
        pme_file_ptr = (char *)mmap(NULL, pme_file_size, PROT_READ, MAP_PRIVATE, pme_fd, 0);
        if(pme_file_ptr == NULL)
                pme_err_exit("Error: Unable to copy ELF file on disk onto main memory");
        
        
        pme_parse_elf_header(pme_file_ptr);
        pme_display_elf_header(pme_file_ptr);
        return 0;
    }
    

Explanation:

a. The user can specify the ELF File which is to be parsed by passing it as a command line argument. Argument1 is the ELF filename. argument2 is the size of ELF file.

b. The ELF file specified is opened using open function , which returns a file descriptor. The File Descriptor is stored in pme_fd.

c. Now, that can handle the file, let us copy the whole file onto the main memory. Technically, you are mapping the file on disk to the Address space of the tool. The mapping is done by mmap function. If the copying of whole file is successful, mmap returns an Address pointing to beginning of the mapping or file in main memory. This is the syntax of mmap:

       #include <sys/mman.h>

           void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

d. Why should we map the ELF file onto main memory? : There are a few features of C Language which we can make use of when the file is in memory. Two such features are Pointers and Typecasting. Though typecasting is dangerous sometimes, it works like magic when it is done right. We cannot use pointers on Disk files because pointers essentially point to location in main memory. Without pointers, our job of writing the tool becomes very difficult. And for more info about mmap, go to it’s manpage.

f. Ignore the pme_parse_elf_header(pme_file_ptr) and pme_display_elf_header(pme_file_ptr) for now.

e. At the end of above piece of code, we have a Character Pointer pme_file_ptr pointing to beginning of the file in memory.

1. ELF Header:

1.The ELF header is present in every ELF file. It starts from byte 0 of every ELF file. The following is the C-Structure of the 64-bit ELF-header.

    #define EI_NIDENT (16)

    typedef struct
    {
         unsigned char e_ident[EI_NIDENT];     /* Magic number and other info */
         Elf64_Half    e_type;                 /* Object file type */
         Elf64_Half    e_machine;              /* Architecture */
         Elf64_Word    e_version;              /* Object file version */
         Elf64_Addr    e_entry;                /* Entry point virtual address */
         Elf64_Off     e_phoff;                /* Program header table file offset */
         Elf64_Off     e_shoff;                /* Section header table file offset */
         Elf64_Word    e_flags;                /* Processor-specific flags */
         Elf64_Half    e_ehsize;               /* ELF header size in bytes */
         Elf64_Half    e_phentsize;            /* Program header table entry size */
         Elf64_Half    e_phnum;                /* Program header table entry count */
         Elf64_Half    e_shentsize;            /* Section header table entry size */
         Elf64_Half    e_shnum;                /* Section header table entry count */
         Elf64_Half    e_shstrndx;             /* Section header string table index */
    } Elf64_Ehdr;
  1. From this point, it is best to open elf.h header file and elf manpage and refer them because all data structures and other info about ELF are present in it.

  2. ELF64_Ehdr is the C-Structure of a 64-bit ELF header. It is like a map to the whole ELF File because if we have the ELF Header, we can go to any byte of the file with the help of information stored in the header. A few details:

  • It has a character array e_ident which has a lot of critical information about the file like Magic number, Architecture, ELF Type, Endianess and a padding of NULL Bytes. In a 64-bit ELF Header, the padding is 8-bytes long.
  1. Other members of the structure tell us about

    • e_machine: Processor / Machine
    • e_entry: Entry Point Address
    • e_phoff: Offset to Program Header Table
    • e_shoff: Offset to Section Header Table
    • e_ehsize: Size of ELF Header
    • e_phentsize: Size of each entry of Program Header Table
    • e_phnum: Number of Program Table Entries
    • e_shentsize: Size of each entry of Section Header Table
    • e_shnum : Number of entries in Section Header table
    • e_shstrndx : Section Header Table Index of entry belonging to Section Name string table.

We will discuss each entry in detail while writing the tool. Regarding Program Header Table and Section Header Table, we will talk about them later in good detail.

  1. Now that we have discussed about ELF Header, let us try to implement a function pme_parse_elf_header which will parse the ELF Header and store all the details safely. Take a look at the following function:

     void pme_parse_elf_header(char *pme_file_ptr) {
        
            pme_elf64_hdr = (Elf64_Ehdr *)malloc(sizeof(Elf64_Ehdr));
            if(pme_elf64_hdr == NULL)
                    pme_err_exit("Error: Unable to allocate memory to store the ELF Header");
        
            Elf64_Ehdr *pme_elf_hdr = (Elf64_Ehdr *)pme_file_ptr;
        
            strncpy(pme_elf64_hdr->e_ident, pme_elf_hdr->e_ident, 16);
            pme_elf64_hdr->e_type = pme_elf_hdr->e_type;
            pme_elf64_hdr->e_machine = pme_elf_hdr->e_machine;
            pme_elf64_hdr->e_version = pme_elf_hdr->e_version;
            pme_elf64_hdr->e_entry = pme_elf_hdr->e_entry;
            pme_elf64_hdr->e_phoff = pme_elf_hdr->e_phoff;
            pme_elf64_hdr->e_shoff = pme_elf_hdr->e_shoff;
            pme_elf64_hdr->e_flags = pme_elf_hdr->e_flags;
            pme_elf64_hdr->e_ehsize = pme_elf_hdr->e_ehsize;
            pme_elf64_hdr->e_phentsize = pme_elf_hdr->e_phentsize;
            pme_elf64_hdr->e_phnum = pme_elf_hdr->e_phnum;
            pme_elf64_hdr->e_shentsize = pme_elf_hdr->e_shentsize;
            pme_elf64_hdr->e_shnum = pme_elf_hdr->e_shnum;
            pme_elf64_hdr->e_shstrndx = pme_elf_hdr->e_shstrndx;
        
        
            printf("\n\n");
        
            return ;
    }
    

Explanation:

a. Function: pme_parse_elf_header(char *pme_file_ptr)

b. The function takes the pme_file_ptr (Pointer pointing to beginning of mapping of ELF File in memory) and does the job.

c. pme_elf64_hdr is a pointer to a Elf64_Ehdr structure. Declare it as a global variable just after all #include s. Using malloc, let us allocate amount of memory required to store the ELF Header.

d. Consider the following line of code: We are typecasting a character pointer to an Elf64_Ehdr Pointer. After this, we can easily use pme_elf_hdr to extract all the header information.

Elf64_Ehdr *pme_elf_hdr = (Elf64_Ehdr *)pme_file_ptr;

e. All the lines of code following the above line is just copying all members of ELF Header from the file to our global variable pme_elf64_hdr. It was made global because it will be used by several other functions.

f. After the execution of this function, we have our pme_elf64_hdr ready with us. In this function, we really did not go into each element and examine them which we will do next. It is an exercise to get comfortable with typecasting and pointers.

  1. Converting ELF Header details into human-readable form and printing it on the terminal. Let us write a function pme_display_elf_header to do this. This turned out to be a huge function. So, I have broken down into pieces so that it will be easy to understand what is going on!

a. This is the initial piece of code required before we start parsing.

void pme_display_elf_header(char *pme_file_ptr) {

    system("clear");
    printf("##############################ELF Header##############################\n\n");
    Elf64_Ehdr *pme_elf_hdr = (Elf64_Ehdr *)pme_file_ptr;
    char ch, magic_no[5];
    int int_ch, index = 0, pme_arch_bit;

Explanation:

  • These are the variables required to write the function. pme_elf_hdr is a Elf64_Ehdr structure pointer.

  • The array magic_no[5] is used to store magic number of the ELF File.

  • Let us keep elf manpage as reference and parse the header according to it.

  • There are many macros used in this function. You can refer elf.h to find the actual values of those macros. Note that it is compulsory to use these macros. Do not use raw values.

  • In general, we will take 1 member of ELF Header structure, and convert it into human-readable form. For e_ident, we will go byte by byte because each byte is encoding of different information.

b. Dissection of e_ident array :

  • e_ident stands for ELF Identity. That is because it contains details which define an executable like Architecture, Endianess, Application Binary Interface and more.

MAGIC NUMBER: First 4 bytes :

  • The first 4 bytes of any ELF File is 0x7f, E, L, F in the same order. 7fELF is the magic number of an ELF File. A magic number is like a file signature, unique for a given file format (not file).

    //First 4 bytes: Magic numbers. 
    magic_no[0] = pme_elf_hdr->e_ident[0];
    magic_no[1] = pme_elf_hdr->e_ident[1];
    magic_no[2] = pme_elf_hdr->e_ident[2];
    magic_no[3] = pme_elf_hdr->e_ident[3];
    magic_no[4] = '\0';
        
    if(strcmp(magic_no, ELFMAG) != 0)
            pme_err_exit("Error: Magic numbers not matching. Probably not an ELF file");
        
    printf("\n\n%d. Magic number:\t\t\t%x%c%c%c", index, pme_elf_hdr->e_ident[0], pme_elf_hdr->e_ident[1], pme_elf_hdr->e_ident[2], pme_elf_hdr->e_ident[3]);
        
    index++;
    

CLASS - Byte 5 :

  • Byte 5 specifies the class of the binary. It tells whether the ELF binary is 32-bit or 64-bit binary. If it is neither of them(ELFCLASSNONE), the tool will terminate thinking class is invalid. 32-bit binary means it uses a 32-bit Instruction Set Architecture which we will talk in later articles.

  • It is important to know that 32-bit binaries make use of 32-bit registers and 64-bit binaries make use of 64-bit registers.

  • On a 64-bit machine, a 32-bit binary can run because a 64-bit OS is made to emulate a 32-bit Processor on it using the underlying 64-bit processor when a 32-bit binary is run. But the other way round is not possible. You cannot run a 64-bit binary on a 32-bit machine.

    //5th Byte: Architecture of the binary. 
    ch = pme_elf_hdr->e_ident[EI_CLASS]; 
    int_ch = (int)ch;
        
    if(int_ch == ELFCLASSNONE) 
            pme_err_exit("Error: Architecture for this binary is invalid.");
        
    printf("\n%d. Architecture:\t\t\t", index);
        
    if(int_ch == ELFCLASS32) { 
            pme_arch_bit = 32; 
            printf("32-bit"); 
    } 
    else if(int_ch == ELFCLASS64) {
            pme_arch_bit = 64; 
            printf("64-bit"); 
    }
        
    index++;
    

DATA ENCODING - Byte 6 :

  • There are 2 ways to encode data: Little-Endian and Big-Endian. In short, this is the way in which bytes/data is ordered in memory. We will discuss about Endianess in detail at the end.

    //6th Byte: Endianess.
    ch = pme_elf_hdr->e_ident[EI_DATA];
    int_ch = (int)ch;
    if(int_ch == ELFDATANONE)
            pme_err_exit("Error: Unknown data format");
        
    printf("\n%d. Endianess:\t\t\t\t", index);
        
    if(int_ch == ELFDATA2LSB)
            printf("2's Complement, Little-Endian");
        
    else if(int_ch == ELFDATA2MSB)
            printf("2's Complement, Big-Endian");
        
    index++;
    

ELF VERSION - Byte 7 :

  • Byte 7 is the ELF Version. There are only 2 versions. Current and Invalid. Go through elf.h to get the actual values of EV_CURRENT and EV_NONE.

    //7th Byte: Validity of ELF version.
    ch = pme_elf_hdr->e_ident[EI_VERSION];
    int_ch = (int)ch;
    if(int_ch == EV_NONE)
            pme_err_exit("Error: Invalid ELF Version");
        
    else if(int_ch == EV_CURRENT)
            printf("\n%d. ELF Version:\t\t\t\tValid Version. Version %d", index, EV_CURRENT);
        
    index++;
    

OPERATING SYSTEM AND APPLICATION BINARY INTERFACE - Byte 8 :

  • Application Binary Interface(ABI) is a set of rules followed by Compiler, Linker when generating an executable. A few rules are : Function calling Convention in 32-bit and 64-bit Intel binaries are different. This means they are following different ABI.

  • The 8th Byte is the encoding of the OS and ABI used to build this executable.This is how you print the ABI used.

    //8th Byte: Operating System and ABI(Application Binary Interface)
    ch = pme_elf_hdr->e_ident[EI_OSABI];
    int_ch = (int)ch;
        
    printf("\n%d. Application Binary Interface:\t", index);
        
    switch(int_ch) {
        
            case ELFOSABI_NONE:
                    printf("UNIX System V");
                    break;
            case ELFOSABI_HPUX:
                    printf("HP_UX");
                    break;
            case ELFOSABI_NETBSD:
                    printf("NetBSD");
                    break;
            case ELFOSABI_LINUX | ELFOSABI_GNU:
                    printf("ELF using GNU Extensions");
                    break;
            case ELFOSABI_SOLARIS:
                    printf("Sun Solaris");
                    break;
            case ELFOSABI_AIX:
                    printf("IBM AIX");
                    break;
            case ELFOSABI_IRIX:
                    printf("SGI Irix");
                    break;
            case ELFOSABI_FREEBSD:
                    printf("FreeBSD");
                    break;
            case ELFOSABI_TRU64:
                    printf("Compaq TRU64 UNIX");
                    break;
            case ELFOSABI_MODESTO:
                    printf("Novell Modesto");
                    break;
            case ELFOSABI_OPENBSD:
                    printf("OpenBSD");
                    break;
            case ELFOSABI_ARM_AEABI:
                    printf("ARM EABI");
                    break;
            case ELFOSABI_ARM:
                    printf("ARM");
                    break;
            case ELFOSABI_STANDALONE:
                    printf("Independent embedded application");
                    break;
            default:
                    pme_err_exit("Error: Unable to recognize ABI");
        
    }
        
    index++;
    
  • I have considered all the ABIs listed in the man page. As you can see, most of them are different Operating Systems. ARM is an embedded system architecture and has it’s own ABI.

NULL BYTE PADDING :

  • The rest of the 8 bytes is a NULL Byte padding.

    //Rest of the 8 bytes:
    printf("\n%d. Rest of 8 bytes:\t\t\t", index);
    for(int i = 8; i <= 15; i++)
            printf("%d ", (int)pme_elf_hdr->e_ident[i]);
        
    index++;
    

Now that e_ident array is processed, let us go to other members of the array.

c. e_type : This element tells us what type of ELF Binary it is. There are 4 types of ELF Binaries as discussed earlier. They are Relocatable/Object files, Core Files, Executable Files and Shared Libraries/Dynamically linked object files and are encoded as ET_REL, ET_CORE, ET_EXEC, ET_DYN . The following piece of code identifies the ELF Type.

    //ELF Type: e_type
    ch = pme_elf_hdr->e_type;
    int_ch = (int)ch;

    if(int_ch == ET_NONE)
            pme_err_exit("Error: Unknown ELF File Type");


    printf("\n%d. ELF Type:\t\t\t\t", index);

    switch(int_ch) {

            case ET_REL:
                    printf("Relocatable File / Object File");
                    break;
            case ET_EXEC:
                    printf("Executable File");
                    break;
            case ET_DYN:
                    printf("Shared Library / Dynamically Linked Object file");
                    break;
            case ET_CORE:
                    printf("Core File");
                    break;
    }

    index++;

By now, you should have got an idea of how to parse ELF header, process it and display it. From now on, try writing the code by yourself for the rest of the entries. I have given the code for reference.

d. e_machine : This member tells us on which machine / processor the ELF Binary can run on. Some very common examples of processors are Intel Itanium, AMD, ARM, Motorola 68000 etc., In this tool, I have considered only popular processors listed in the manpage. For a bigger, probably exhaustive list, refer elf.h . This is the code:

    //Machine: e_machine
    ch = pme_elf_hdr->e_machine;
    int_ch = (int)ch;

    if(int_ch == EM_NONE)
            pme_err_exit("Error: Unknown machine");

    printf("\n%d. Machine:\t\t\t\t", index);

    /*
     * I have included cases only for popular machines. The exhaustive list is present in elf.h
     */

    switch(int_ch) {
            case EM_M32:
                    printf("AT&T WE 32100");
                    break;
            case EM_SPARC:
                    printf("Sun Microsystems SPARC");
                    break;
            case EM_386:
                    printf("Intel 80836");
                    break;
            case EM_68K:
                    printf("Motorola 68000");
                    break;
            case EM_88K:
                    printf("Motorola 88000");
                    break;
            case EM_860:
                    printf("Intel 80860");
                    break;
            case EM_MIPS:
                    printf("MIPS RS3000");
                    break;
            case EM_PARISC:
                    printf("HP / PA");
                    break;
            case EM_SPARC32PLUS:
                    printf("SPARC with enhanced instruction set");
                    break;
            case EM_PPC:
                    printf("PowerPC");
                    break;
            case EM_PPC64:
                    printf("PowerPC 64-bit");
                    break;
            case EM_S390:
                    printf("IBM S/390");
                    break;
            case EM_ARM:
                    printf("Advanced RISC Machines");
                    break;
            case EM_SH:
                    printf("Renesas SuperH");
                    break;
            case EM_SPARCV9:
                    printf("SPARC v9 64-bit");
                    break;
            case EM_IA_64:
                    printf("Intel Itanium");
                    break;
            case EM_X86_64:
                    printf("AMD x86-64");
                    break;
            case EM_VAX:
                    printf("DEC Vax");
                    break;
    }

    index++;

e. e_entry : This is a 64-bit Address which specifies the Entry Point Address. This is valid only for an ET_EXEC Type because and executable is where code can start running. For other ELF Types, it is set to 0. We discussed about the Entry Point in good detail in the previous post.

  • As soon as the executable is loaded into main memory, the instruction at address e_entry gets executed. That is first ever instruction to get executed in executable. Printing the Entry Address is pretty straight forward.

    //ELF Entry point Address: e_entry
    printf("\n%d. Entry Point Address:\t\t\t0x%lx", index++, (unsigned long int)pme_elf_hdr->e_entry);
    

f. e_phoff : There is a table known as a Program Header Table. It’s entries has Information about various segments that will be loaded into memory when an ELF file is executed.

  • The Operating System can understand the different segments present in the ELF file while loading and these segments will decide the memory layout of the process.

  • e_phoff stores the Offset from beginning of ELF file where the Program Header table is present. The offset is in bytes. This is how you print the e_phoff.

    //Program Header table's offset 
    printf("\n%d. Offset of Program Header Table:\t%d bytes (From beginning of file)", index (int)pme_elf_hdr->e_phoff);
    

g. e_shoff : This stores the offset from beginning of ELF File where Section Header Table is present. The offset is in bytes.

  • Section Headers are parts of ELF file which help in debugging and Linking. They are not necessary for executable to properly run. When the executable is run, at load time, all Section Headers and a few Sections are dropped. Dropped simply means they are not loaded into the main memory. Some of the known sections are Symbol Table, Dynamic Symbol Table, .text, .data.

  • When an ELF file is stripped using the strip command, a few Sections like Symbol Table, String Table will be removed. This reduces size of the ELF file on disk and makes debugging harder. When those Sections are removed, their Section Headers are also removed.

     ~/rev_eng_series/post_2$ readelf -l code1
        
     Elf file type is EXEC (Executable file)
     Entry point 0x400430
     There are 9 program headers, starting at offset 64
        
     Program Headers:
     Type           Offset             VirtAddr           PhysAddr
                    FileSiz            MemSiz              Flags  Align
     PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                    0x00000000000001f8 0x00000000000001f8  R E    8
     INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                    0x000000000000001c 0x000000000000001c  R      1
        [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
     LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                    0x000000000000071c 0x000000000000071c  R E    200000
     LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                    0x000000000000022c 0x0000000000000238  RW     200000
     DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
                    0x00000000000001d0 0x00000000000001d0  RW     8
     NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                    0x0000000000000044 0x0000000000000044  R      4
     GNU_EH_FRAME   0x00000000000005f4 0x00000000004005f4 0x00000000004005f4
                    0x0000000000000034 0x0000000000000034  R      4
     GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                    0x0000000000000000 0x0000000000000000  RW     10
     GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                    0x00000000000001f0 0x00000000000001f0  R      1
        
     Section to Segment mapping:
     Segment Sections...
     00     
     01     .interp 
     02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn                  .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame 
     03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss 
     04     .dynamic 
     05     .note.ABI-tag .note.gnu.build-id 
     06     .eh_frame_hdr 
     07     
     08     .init_array .fini_array .jcr .dynamic .got 
     ~/rev_eng_series/post_2$ 
    
  • You can notice that there are 9 Program Headers. Also take a look at the Section to Segment mapping. Each Section may belong to one or more segment and each segment may constitute of several sections.

  • We will discuss about each Program Header and Section header while we write functions to parse them.

  • This is how you print e_shoff.

    //Section Header table's offset.
    printf("\n%d. Offset of Section Header Table:\t%d bytes (From beginning of file)", index, (int)pme_elf_hdr->e_shoff);
    

h. e_ehsize : This member stores the ELF Header Size.

    //ELF Header Size
    printf("\n%d. ELF Header Size:\t\t\t%lu bytes", index++, (unsigned long int)pme_elf_hdr->e_ehsize);

i. e_phentsize : This member stores the size of each Program Header Table entry in bytes.

    //Size of Single Entry in Program Header Table.
    printf("\n%d. Sizeof PH Table's single entry:\t%lu bytes", index++, (unsigned long int)pme_elf_hdr->e_phentsize);

j. e_phnum : This member stores the number of Program Header Table entries.

    //Number of Program Header Table Entries.
    printf("\n%d. Number of PH Table entries:\t\t%lu", index++, (unsigned long int)pme_elf_hdr->e_phnum);

k. e_shentsize : This member stores the size of each Section Header Table entry in bytes.

    //Size of single entry in Section Header Table
    printf("\n%d. Sizeof SH Table's single entry:\t%lu bytes", index++, (unsigned long int)pme_elf_hdr->e_shentsize);

l. e_shnum : This member stores the number of Section Header Table entries.

    //Number of Section Header Table entries.
    printf("\n%d. Number of SH Table entries:\t\t%lu", index++, (unsigned long int)pme_elf_hdr->e_shnum);

m. e_shstrndx: This requires a bit of explanation.

  • Note from that readelf output that every section has a name like .text, .data, .interp etc., Where are these names stored? They are stored in one of the Sections itself. That table is known as Section name string Table. That section is a table which consists of NULL-terminated strings which are names of different sections.

  • shstrndx : sh stands for Section Header. str stands for String. ndx stands for Index.

  • e_shstrndx member stores the index of Section name string Table. Suppose shdr is a pointer to the Section Header. Then, shdr[pme_elf64_hdr->e_shstrndx] will point to the Name string table. We will practically see how this works when we start working on Section Header Table.

Observation: Just go through the last few entries again.

a. There are 3 entries related to Program Headers - e_phoff, e_phentsize, and e_phnum. With the help of these values, we can address any Program Header table entry we want. pme_file_ptr points to beginning of ELF file. So, pme_file_ptr + pme_elf64_hdr->e_phoff will point to the Program Header table or the First entry of the table. As we know the size of each entry and number of entries, we can iterate over the whole table and parse all the entries. That is the idea. The same is the idea with Section Header Table. This is why an ELF Header is like a map to the whole executable. You can travel to any part of executable if you know the ELF Header details.

Now, only thing left is to compile the code we have written and run it.To make stuff easy, store the main function in a file parsemyelf.c. Store the other 2 ELF Header parsing and displaying functions in parse_elf_header.c. Then compile both of them.

    ~/rev_eng_series/post_2$ gcc parsemyelf.c parse_elf_header.c -o pme

You get an executable pme which is your tool. Try running it with an executable. I tried running it with pme itself. To verify the correctness of our tool, you can use readelf and compare the entries.

   ~/rev_eng_series/post_2$ ls -l pme
   -rwxrwxr-x 1 adwi adwi 17648 Jul  2 20:25 pme
   ~/rev_eng_series/post_2$ ./pme pme 18000

Output:

   ##############################ELF Header##############################



   0. Magic number:         7fELF
   1. Architecture:         64-bit
   2. Endianess:                2's Complement, Little-Endian
   3. ELF Version:              Valid Version. Version 1
   4. Application Binary Interface: UNIX System V
   5. Rest of 8 bytes:          0 0 0 0 0 0 0 0 
   6. ELF Type:             Executable File
   7. Machine:              AMD x86-64
   8. Entry Point Address:          0x4007a0
   9. Offset of Program Header Table:   64 bytes (From beginning of file)
   10. Offset of Section Header Table:  15664 bytes (From beginning of file)
   11. Flags:               0x0
   12. ELF Header Size:         64 bytes
   13. Sizeof PH Table's single entry:  56 bytes
   14. Number of PH Table entries:      9
   15. Sizeof SH Table's single entry:  64 bytes
   16. Number of SH Table entries:      31
   17. Section header string table index:   28

   PH = Program Header, SH = Section Header

With this, we have completed ELF Header. I hope you have got a general view on what exactly ELF is, what ELF Header is, it’s members, their uses . Now, let us go deep into Program Header Table.

2. Program Header Table

  1. First of all, let us understand why it is called Program Header Table. Each Program Header describes a Segment. Each Segment is an important piece of the program. Without even one of the segment, the ELF File won’t run properly.

  2. Why are they important? They are important because they decide the Memory Layout of the program when loaded into the main memory.

  3. This Table is generally present right after the ELF Header. Take a look at your tool’s output of Size of ELF Header and Offset to Program Header Table. They will probably be the same. But to locate the Program Header Table, let us use e_phoff and not this observation we made.

  4. The table is a list of different Program Headers. So, let us look at the structure of a 64-bit Program Header.

    typedef struct
    {
      Elf64_Word    p_type;                 /* Segment type */
      Elf64_Word    p_flags;                /* Segment flags */
      Elf64_Off     p_offset;               /* Segment file offset */
      Elf64_Addr    p_vaddr;                /* Segment virtual address */
      Elf64_Addr    p_paddr;                /* Segment physical address */
      Elf64_Xword   p_filesz;               /* Segment size in file */
      Elf64_Xword   p_memsz;                /* Segment size in memory */
      Elf64_Xword   p_align;                /* Segment alignment */
    } Elf64_Phdr;
    
  • There are 8 members in the above structure. Let us first write a function pme_parse_ph_table which is like an initialization function before actually parsing each Program Header in the Table. Look at the following lines of code:

        void pme_parse_ph_table(char *pme_elf_ptr) {
        
            system("clear");
            printf("##############################Program Header Table##############################\n\n\n");
            Elf64_Phdr *pme_ph_hdr;
        
            Elf64_Off ph_offset;
            Elf64_Half ph_entry_size;
            Elf64_Half ph_entry_count;
        
            ph_offset = pme_elf64_hdr->e_phoff;
            ph_entry_size = pme_elf64_hdr->e_phentsize;
            ph_entry_count = pme_elf64_hdr->e_phnum;
        
            printf("Offset at which Program Header table is found: %lu bytes\n", ph_offset);
            printf("Number of Program Header Entries: %d\n", ph_entry_count);
            printf("Size of each Program Header Entry: %d bytes\n", ph_entry_size);
        
            printf("\nVA: Virtual Address, PA: Physical Address\n\n");
        
        
            printf("Entry_no.|   Type   | Flags |    File_offset     |         VA         |         PA         |     Size(File)     |     size(Memory)    |     Alignment      \n");
           for(int i = 0; i < ph_entry_count; i++) {
        
                    pme_ph_hdr = (Elf64_Phdr *)(pme_elf_ptr + ph_offset + ph_entry_size * i);
                    pme_display_ph_entry(pme_ph_hdr, i);
           }
        
           printf("\n");
        
           return;
        }
    

Explanation:

a. I have stored Program Header Table offset, Size of each Program Header and number of Program Headers in ph_offset, ph_entry_size and ph_entry_count respectively. It is printed on the terminal.

b. For now, do not bother too much what Virtual Address and Physical Address means. It requires some insight on how the main memory is managed(Memory Management ) by an Operating System. For now, think about them as addresses of some memory location.

c. There is a printf statement which prints the Heading for the table we want to print.

d. The for loop is very important. I am basically iterating over all Program Headers. I am starting with i = 0 to i = ph_entry_count-1 . To find the address of each Program Header, I am using the idea discussed earlier. pme_elf_ptr points to beginning of ELF file. pme_elf_ptr + ph_offset should point to first Program Header of Program Header Table. The i * ph_entry_size is present to jump to the required Program Header.

e. Consider this line of code:

    pme_ph_hdr = (Elf64_Phdr *)(pme_elf_ptr + ph_offset + ph_entry_size * i);
  • This typecasts a Character Pointer to a Elf64_Phdr pointer. I cannot stress on how easy the job is with typecasting!
  1. The pme_display_ph_entry function takes pme_ph_hdr and entry number(i) as arguments. It displays details of a specified Program Header in the Table. Let us see how we can implement it. The main goal is, we have understand what the entries are and how we can convert them into human-readable form. Here we go!

a. Entry number: This is not a member of Elf64_Phdr structure, but for our reference.

        void pme_display_ph_entry(Elf64_Phdr *pme_ph_hdr, int ph_ent_no) {

            //Entry no.     
            printf("    %d    |", ph_ent_no);

b. p_type: This describes the segment whose details are present in this Program Header. The member p_type tells us what type of segment it is. There are different types of segments. A few important segment types are

  • PT_LOAD : This segment is a loadable segment. It means this segment is copied into main memory / loaded when the ELF file is executed. This consists of .text , .data , .rodata , .interp and more. It’s size on disk is specified by the p_filesz member. When it is loaded into memory, it’s size if p_memsz .

  • PT_DYNAMIC : Segment of this type gives the dynamic linking information.

  • PT_INTERP : Segment of this type gives information about the interpreter used to run the program.

    • Do not get confused by Interpreted Languages , but yes we definitely need an interpreter which interprets the machine code and send it to the processor for execution. You can think about it like this. The Interpreter we are talking about here interprets and runs machine code. The Interpreters of python, java etc., interpret and run their respective Byte code .

    • The interpreter plays a very crucial role in execution of a Binary. It finds the required shared libraries, it loads them and then runs the program. That is why it is also known as Dynamic Linker .

    • Note that ld.so is also a shared object or an ELF of type ET_DYN. Use our tool on ld.so and see what you get. Just hit $ man ld.so to get more info about Dynamic Linker.

  • For other types, you can refer the manpage.

        //Program Header Table Entry Type.
        switch(pme_ph_hdr->p_type) {
        
            case PT_LOAD:
                    printf("   LOAD   |");
                    break;
            case PT_DYNAMIC:
                    printf("  DYNAMIC |");
                    break;
            case PT_INTERP:
                    printf("  INTERP  |");
                    break;
            case PT_NOTE:
                    printf("   NOTE   |");
                    break;
            case PT_SHLIB:
                    printf("  SHLIB   |");
                    break;
            case PT_PHDR:
                    printf("   PHDR   |");
                    break;
            case PT_TLS:
                    printf("   TLS    |");
                    break;
            case PT_LOOS:
                    printf("   LOOS   |");
                    break;
            case PT_GNU_EH_FRAME:
                    printf("GNUEHFRAME|");
                    break;
            case PT_GNU_STACK:
                    printf(" GNU_STACK|");
                    break;
            case PT_GNU_RELRO:
                    printf(" GNU_RELRO|");
                    break;
            case PT_SUNWBSS:
                    printf(" SUNWBSS  |");
                    break;
            case PT_SUNWSTACK:
                    printf("SUNWSTACK |");
                    break;
            case PT_HIOS:
                    printf("   HIOS   |");
                    break;
            case PT_LOPROC:
                    printf("  LOPROC  |");
                    break;
            case PT_HIPROC:
                    printf("  HIPROC  |");
                    break;
            case PT_NULL:
                    printf("  Unused  |");
                    break;
            default:
                    pme_err_exit("Error: Unknown Program Header Table Entry");
        }
    

c. p_offset : This member stores the Offset of the of this segment from the beginning of the file.

    //Segment File Offset.
    printf(" 0x%016lx |", pme_ph_hdr->p_offset);

d. p_vaddr : This is the Virtual Address of the beginning of segment when loaded into memory.

    //Segment Virtual Address
    printf(" 0x%016lx |", pme_ph_hdr->p_vaddr);

e. p_paddr : This is the Physical Address of the beginning of segment when loaded into memory.

    //Segment Physical Address
    printf(" 0x%016lx |", pme_ph_hdr->p_paddr);
  • Note: When you run the tool, note that Virtual Address and Physical Address are same. For now, think that is the address. But Physical address is something we as normal users cannot know. It is like a secret known only to Operating System.

f. p_filesz : This is size of the segment when on disk.

    //Segment size in file
    printf(" 0x%016lx |", pme_ph_hdr->p_filesz);

g. p_memsz : This is size of the segment after loading into memory.

    //Segment size on memory
    printf(" 0x%016lx |", pme_ph_hdr->p_memsz);

h. p_align : This member stores the value to which the segments are aligned in memory and in file. Let us discuss about alignment once we are able to run the tool.

    //Segment Alignment
    printf(" 0x%016lx |", pme_ph_hdr->p_align);

i. p_flags : This variable holds permission flags for the segment. There are 3 basic flags.

  • PF_X : Segment is eXecutable.
  • PF_W : Segment is Writable.
  • PF_R : Segment is Readable.

  • Generally, a segment cannot both Writable and Executable. That is a security feature administered in all modern OSs which we will talk about later. If a segment is executable, then it has to be readable because only after it is read, it can be executed.

  • This is the piece of code which identifies the flags:

        //Segment flags, Assuming W^X is always enabled.
        switch(pme_ph_hdr->p_flags) {
        
            case PF_X:
                    printf("  X    |");
                    break;
            case PF_W:
                    printf("  W    |");
                    break;
            case PF_R:
                    printf("  R    |");
                    break;
            case PF_X | PF_R:
                    printf("  RX   |");
                    break;
            case PF_W | PF_R:
                    printf("  RW   |");
                    break;
            default:
                    pme_err_exit("Error: Unknown Flag");
        }
    

We are done with writing the function which parses and displays a PHT Entry. Store pme_parse_ph_header and pme_display_ph_entry in a new C file parse_ph_table.c. Then compile all 3 files together.

$ ~/rev_eng_series/post_2/$ gcc parsemyelf.c parse_elf_header.c parse_ph_table.c -o pme
$ ~/rev_eng_series/post_2/$ ./pme pme 18000

Output: Showing only Program Header output.

##############################Program Header Table##############################


Offset at which Program Header table is found: 64 bytes
Number of Program Header Entries: 9
Size of each Program Header Entry: 56 bytes

VA: Virtual Address, PA: Physical Address

Entry_no.|   Type   | Flags |    File_offset     |         VA         |         PA         |     Size(File)     |     size(Memory)    |     Alignment      
    0    |   PHDR   |  RX   | 0x0000000000000040 | 0x0000000000400040 | 0x0000000000400040 | 0x00000000000001f8 | 0x00000000000001f8 | 0x0000000000000008 |
    1    |  INTERP  |  R    | 0x0000000000000238 | 0x0000000000400238 | 0x0000000000400238 | 0x000000000000001c | 0x000000000000001c | 0x0000000000000001 |
    2    |   LOAD   |  RX   | 0x0000000000000000 | 0x0000000000400000 | 0x0000000000400000 | 0x0000000000002774 | 0x0000000000002774 | 0x0000000000200000 |
    3    |   LOAD   |  RW   | 0x0000000000002e10 | 0x0000000000602e10 | 0x0000000000602e10 | 0x0000000000000288 | 0x00000000000002b0 | 0x0000000000200000 |
    4    |  DYNAMIC |  RW   | 0x0000000000002e28 | 0x0000000000602e28 | 0x0000000000602e28 | 0x00000000000001d0 | 0x00000000000001d0 | 0x0000000000000008 |
    5    |   NOTE   |  R    | 0x0000000000000254 | 0x0000000000400254 | 0x0000000000400254 | 0x0000000000000044 | 0x0000000000000044 | 0x0000000000000004 |
    6    |GNUEHFRAME|  R    | 0x0000000000002580 | 0x0000000000402580 | 0x0000000000402580 | 0x000000000000005c | 0x000000000000005c | 0x0000000000000004 |
    7    | GNU_STACK|  RW   | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000010 |
    8    | GNU_RELRO|  R    | 0x0000000000002e10 | 0x0000000000602e10 | 0x0000000000602e10 | 0x00000000000001f0 | 0x00000000000001f0 | 0x0000000000000001 |
  • Verify the output using readelf tool.

  • Now that we are done implementing PHTable parsing functions, let us discuss a few important observations:

  1. First, let us discuss what PHTable entries mean. To understand it, let us take a toy program code1.c . It is an infinite loop. This loop is important for analysis.

         //code1.c
         int main() {
        
         while(1);
        
         }
    
  • Compile it. This is the Program Header Table Output of our tool on executable of code1.c .

    ##############################Program Header Table##############################
        
        
    Offset at which Program Header table is found: 64 bytes
    Number of Program Header Entries: 9
    Size of each Program Header Entry: 56 bytes
        
    VA: Virtual Address, PA: Physical Address
        
    Entry_no.|   Type   | Flags |    File_offset     |         VA         |         PA         |     Size(File)     |     size(Memory)    |     Alignment      
       0    |   PHDR   |  RX   | 0x0000000000000040 | 0x0000000000400040 | 0x0000000000400040 | 0x00000000000001f8 | 0x00000000000001f8 | 0x0000000000000008 |
       1    |  INTERP  |  R    | 0x0000000000000238 | 0x0000000000400238 | 0x0000000000400238 | 0x000000000000001c | 0x000000000000001c | 0x0000000000000001 |
       2    |   LOAD   |  RX   | 0x0000000000000000 | 0x0000000000400000 | 0x0000000000400000 | 0x000000000000068c | 0x000000000000068c | 0x0000000000200000 |
       3    |   LOAD   |  RW   | 0x0000000000000e10 | 0x0000000000600e10 | 0x0000000000600e10 | 0x0000000000000220 | 0x0000000000000228 | 0x0000000000200000 |
       4    |  DYNAMIC |  RW   | 0x0000000000000e28 | 0x0000000000600e28 | 0x0000000000600e28 | 0x00000000000001d0 | 0x00000000000001d0 | 0x0000000000000008 |
       5    |   NOTE   |  R    | 0x0000000000000254 | 0x0000000000400254 | 0x0000000000400254 | 0x0000000000000044 | 0x0000000000000044 | 0x0000000000000004 |
       6    |GNUEHFRAME|  R    | 0x0000000000000564 | 0x0000000000400564 | 0x0000000000400564 | 0x0000000000000034 | 0x0000000000000034 | 0x0000000000000004 |
       7    | GNU_STACK|  RW   | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000000 | 0x0000000000000010 |
       8    | GNU_RELRO|  R    | 0x0000000000000e10 | 0x0000000000600e10 | 0x0000000000600e10 | 0x00000000000001f0 | 0x00000000000001f0 | 0x0000000000000001 |
        
    ~/rev_eng_series/post_2$ 
    
  • Note that the above table was generated from the Executable file which is present on Disk. So, all the entries, values, everything remains the same irrespective of how many times you run. It means Addresses, Offsets are all independent of the execution and main memory.

  • In another terminal, find the Process ID of this process. Using ps command with -e flag, you will Process IDs of all the processes. But I have entered PID of code1 only. PID could be different.

    ~/rev_eng_series/post_2$ ps -e
    13498 pts/19   00:00:06 code1
    
  • PID = 13498 .

  • With PID in hand, we can go to one of the most interesting directories - the /proc directory and get all details of this process. The proc directory is named so because it mainly contains details of all the processes and some other system data also.

    ~/rev_eng_series/post_2$ cd /proc
    /proc$
    
  • You will be able to see many directories with numbers as their names. Each Number denotes a PID . Details of a particular process is present in the directory named after it’s PID. In my case, I have to go to 13498 directory.

    /proc/13498$ ls
    attr             cwd       map_files   oom_adj        schedstat  task
    autogroup        environ   maps        oom_score      sessionid  timers
    auxv             exe       mem         oom_score_adj  setgroups  timerslack_ns
    cgroup           fd        mountinfo   pagemap        smaps      uid_map
    clear_refs       fdinfo    mounts      patch_state    stack      wchan
    cmdline          gid_map   mountstats  personality    stat
    comm             io        net         projid_map     statm
    coredump_filter  limits    ns          root           status
    cpuset           loginuid  numa_maps   sched          syscall
    /proc/13498$ 
    
  • You can see there are many files and directories. Let us focus on the maps file.

    00400000-00401000 r-xp 00000000 08:02 4327028                            /home/adwi/rev_eng_series/post_2/src/code1
    00600000-00601000 r--p 00000000 08:02 4327028                            /home/adwi/rev_eng_series/post_2/src/code1
    00601000-00602000 rw-p 00001000 08:02 4327028                            /home/adwi/rev_eng_series/post_2/src/code1
    7fc76869b000-7fc76885b000 r-xp 00000000 08:02 25694643                   /lib/x86_64-linux-gnu/libc-2.23.so
    7fc76885b000-7fc768a5b000 ---p 001c0000 08:02 25694643                   /lib/x86_64-linux-gnu/libc-2.23.so
    7fc768a5b000-7fc768a5f000 r--p 001c0000 08:02 25694643                   /lib/x86_64-linux-gnu/libc-2.23.so
    7fc768a5f000-7fc768a61000 rw-p 001c4000 08:02 25694643                   /lib/x86_64-linux-gnu/libc-2.23.so
    7fc768a61000-7fc768a65000 rw-p 00000000 00:00 0
    7fc768a65000-7fc768a8b000 r-xp 00000000 08:02 25694615                   /lib/x86_64-linux-gnu/ld-2.23.so
    7fc768c70000-7fc768c73000 rw-p 00000000 00:00 0
    7fc768c8a000-7fc768c8b000 r--p 00025000 08:02 25694615                   /lib/x86_64-linux-gnu/ld-2.23.so
    7fc768c8b000-7fc768c8c000 rw-p 00026000 08:02 25694615                   /lib/x86_64-linux-gnu/ld-2.23.so
    7fc768c8c000-7fc768c8d000 rw-p 00000000 00:00 0
    7fff9eb53000-7fff9eb74000 rw-p 00000000 00:00 0                          [stack]
    7fff9ebd1000-7fff9ebd4000 r--p 00000000 00:00 0                          [vvar]
    7fff9ebd4000-7fff9ebd6000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
    
  • Look at Entry 2 of PHTable : It is a PT_LOAD type segment with Readable/Executable permissions , offset = 0x0 and Address = 0x400000 . This means Starting Address of this segment = 0x400000 . Have a look at the first entry of maps file. It is exactly the same. Also, look at the alignment. It is 0x200000 . That is why Starting Address is a multiple of 0x200000 . It cannot be any number just like that. It has to be a multiple of 0x200000 - meaning, it has to aligned.

  • Let us look at Entry 3 of PHTable : It’s Address is 0x600e10 . But it has to be aligned to 0x200000 . So, you can see the second entry of maps . It is starting from 0x600000 which is a multiple of 0x200000 . The second and third entries of maps file are read-only and data segments respectively.

  • Consider the Entry 1 of PHTable : Segment is of type PT_INTERP . The interpreter is obviously the ld.so which we discussed earlier. On file, it’s address is 0x400238, but on memory(refer ld-2.23.so entry in maps) it has a range. On file, this segment contains only size and location of a NULL-terminated String of path of the interpreter.Here, it is /lib/x86_64-linux-gnu/ld-2.23.so . In memory, it is actual address of the Interpreter given by the system.

  • The /lib/x86_64-linux-gnu/libc-2.23.so is the Standard C Library. It is present as a shared object . The functions in it are dynamically linked.

  • Consider Entry 7 of PHTable: It has no address, no offset but permissions of RW . It has no address because stack memory is allocated at loadtime. At loadtime, this address range 7fff9eb53000-7fff9eb74000 was given.

  • [vdso] and [vsyscall] : These contains routines required for a program running in user mode to change to kernel mode when required. They try to speed up the performance of the whole process. Look into vdso ‘s manpage. There is a very good explanation present there.

I hope you have got an idea of what Program Header Table contains, what segments are, the difference between segments on disk and memory.

A few more interesting things :

  1. You cannot find a Segment with permissions RWX or WX : There is a very strong reason behind this. During the late 1990s and early 2000s, there was a lot of new software written mainly because of the .com mania. There were amazing number of security holes in the software. The attackers could write their malicious code in the stack and execute it. So, they are able to execute what they write. This became a problem. It was understood that removing all such security holes is a difficult task. So, let us make a Segment either Writable or Executable but not both. So ,even if the attacker is able to write Malicious code, he won’t be able to execute it. This popularly is known as W XOR X or W^X . Microsoft called the technique Data Execution Prevention . People realized it is a very important Attack Mitigation Technique , so it was administered at the hardware level. Intel processors have a bit called XD(Execute Disable) bit, also know as NX(No Execute) bit by AMD . But at the moment, there are techniques to bypass this Mitigation Technique.

  2. Endianess: This is a very important concept to understand. It is a way data / bytes are stored in memory. There are 2 ways to store data.

Consider a 32-bit architecture. Suppose you enter a string abcdefghij through scanf(). This is how it is stored in the 2 endian architectures. Suppose the base address of string is 0xffff0000.

  • In Big-Endian arch, this is how the string is stored.

      0xffff0000: 61626364 65666768 696a00ab 3212ffff
    
    • Consider the first 4 bytes - 0x61626364.
    • The Most significant byte(Big End) is pointed by the address 0xffff0000.
    • Just observe the way in which the string is stored. The ascii values are stored in place of the characters.
  • In Little-Endian arch, this is how the string is stored.

      0xffff0000: 64636261 68676665 ab006a69 ffff1232
    
    • Consider 16 bytes from the base address.
    • The Least significant byte (Little End) is pointed by the address 0xffff0000.

    I hope this example give some clarity of what little-endian and big-endian architectures are.

    The Little-Endian arch has it’s own advantages, that is why that design was chosen for a few architectures.

  1. While explaining ELF Header parsing, I had mentioned that only macros should be used and not their direct values. Simple reason is, if for some reason there is a problem with the current values and Linux Developers change values of all the macros(or probably few of them). Then, if the tool is written using direct values, the tool will pop up errors and becomes a useless tool. But if we use the macros, there is no such problem. Even if the values change, it does not matter.

Conclusion:

  1. In this article, we discussed about general concepts of ELF , ELF Header and Program Header Table, their entries and properties. We implemented a toy tool which displays details of ELF Header and PHTable entries. We also saw the /proc directory.

  2. Check out this Link . It has the sourcecode of the tool in an organized manner. There are a few modifications I have done just to make to tool more presentable. Other than that, there are no other changes in the functions we wrote.

  3. We will get into Section Header Table, Dynamic Linking and more in the part2 of this article.

I learnt a lot while writing the tool and the post. I hope you also learnt something new.

That is it for now. Thank you for reading :)


Go to next post: Introduction to x86 Assembly Programming
Go to previous post: Internals of Compiling - Journey from C/C++ program to an Executable