Executable file and its Memory Organization in a Process ~ C in GCC

The file produced after compilation of program is the executable file. It can be produced using the four stage of compilation or by bypassing them.

We already know that an executable file when executing is called a process. This discussion is all about the contents of an executable file and different sections of it.

There are several executable file formats supported by Linux based operating systems. They are .axf, .bin, .elf, .o, .prx, .puff, .so and sometimes an executable file may not have any extension.

When we compile a C program, we get the default executable file as a.out. The .out was the extension for the executable file format for earlier versions of Linux and UNIX based operating systems. As the technology has emerged .elf has become the most commonly used executable file format, though the name of the default executable file generated is a.out. All the latest systems consider .elf as the executable file format by default.

ELF stands for Executable and Linkable File format. The major reason for migration from .out to .elf format is that .out file couldn’t support linking with the libraries.

An ELF file consists of two parts. 1. ELF header and 2. File data

The contents of ELF file can be observed using the command:

readelf -a a.out

The following is the screenshot of an elf file.

Screenshot of ELF description

Understanding sections of ELF file:

1. ELF Header:

As shown in the screenshot, there are several terms present in the ELF header. In simple words, ELF header gives the information about meta data like class of the file, endianness of the file etc. The following is the explanation of the terms you see in the screenshot shown above.

Magic is used to get the information like file format id, architecture of the system on which the file was developed, endianness of the system, version of the system etc. Note that all the numbers represented in the elf file follow hexadecimal number system. It also assures that the file is not corrupt, based on all the parameters present in the magic number.

Magic number is present in any of the file types and it decides the file format and the metadata of the file.

The following image depicts a magic number and the latter are the explanation for important understandings from the magic number.

ELF description

Magic: The first byte (7f) represents the id of the file format, which is 7f for elf file format. The next three bytes of data denote ‘E’, ‘L’, and ‘F’ consecutively.

Class: The fifth byte denotes the architecture of the platform developed. 01 in this byte denotes 32-bit (01) or 64-bit (02) format of the elf file.

Data: The sixth byte denotes the endianness of the data. 01 in this byte represents little endian format and 02 represents big endian format.

Version: The seventh byte denotes the version of elf format. However, there is only one version called Type 1. Hence, 01 in this byte denotes Type 1 of ELF.

OS/ABI: ABI stands for Application Binary Interface. Due to different versions of a given OS, there occurs overlapping or ambiguity between the common functions. ABI byte ensures that right functions are used. For all the Linux systems, ABI version is System V.

Machine: it represents the architecture of the machine. 01 – 32-bit architecture (x86) and 02 – 64-bit architecture.

Type: It gives the purpose of the file viz.,

                01 – DYN – Shared object files for libraries
                02 – EXEC – Executable files for binaries
                03 – REL – Relocatable files, before linked into executable files

All other bytes denote advanced metadata related to the executable file.

2. File Data:

A file data of ELF file consists of three parts.

Program headers or Segments
Section headers or Sections
Data

Program headers or Segments:

Program headers are used by linker to allow execution of multiple source files by linking together. They convert the predefined instructions to a memory may using mmap(2) system call.

Eg: GNU_EH_FRAME, GNU_STACK etc.

Section headers or Sections:

Section headers categorize the data into two types – Instructions or data required for processing i.e., section headers of a file define all the sections of a file.

Eg: .data, .rodata etc.

The contents of section headers are divided into four types. They are: .text, .data, .rodata and .bss.

The .text section contains the executable code of the given program. The contents of the text section do not change and are loaded only once, during compilation.
The .data section consists of initialized data with read/write access i.e., initialized static, global and extern variable.
The .rodata section consists of the initialized data with read access only i.e., numeric constants and string constants.
The .bss section consists of uninitialized data with read/write access i.e., uninitialized static, global and extern variables.

However, the most common terminology of these sections are text section and data section. The following image can give you a better view of executable files.

ELF file contents

These are details of the ELF file when it is just a file. But, when the executable file is executing, there are two more sections – Stack and Heap.

For every process, a section of RAM is allocated as segment, which is called Stack. The contents of the executable are brought into the stack and the processor starts execution. This stack consists of text section, data section, stack section and heap section.

Process during execution

Text section and data section were discussed above.

The stack is the memory space used to allocate memory to a function, called Stack Frame. Stack frame for a given function is allocated only when the function is called.
The heap section is used for allocating memory for the pointers using Dynamic Memory Allocation (DMA).

Any executable file should be brought into RAM for execution, as the CPU cannot process the data present in secondary memory. During execution, it is called a Process. A process consists of every single statement that is present in the source code. It even consists of the variable those are just declared but not used; functions those are defined but not called.

Utilities for .elf file description:

hexdump is used for getting the details of the hex file.
readelf is used to get the structure of an ELF file.
scanelf and execstack are the two tools used to get stack details of the ELF file.
dumpelf, elfls and eu-readelf are used to get the headers of the ELF file.
objdump is used to see the symbols of the ELF file.
elfutils package consists of utilities to perform analysis on an ELF file.

Commands:

hexdump a.out
readelf -a a.out
dumpelf <pax-utils>
elfls -S /bin/ps
eu-readelf -program-headers /bin/ps
objdump -h /bin/ps

Advanced concepts of elf file can be found in the man page using the command: man 5 elf

C in GCC

Executable file and its Memory Organization in a Process

0 comments:

Post a Comment