The file produced after compilation of program is the
executable file. It can be produced using the four stage of compilation or by
bypassing them.
We already know that an executable file when executing is
called a process. This discussion is all about the contents of an executable
file and different sections of it.
There are several executable file formats supported by Linux
based operating systems. They are .axf, .bin, .elf, .o, .prx, .puff, .so and
sometimes an executable file may not have any extension.
When we compile a C program, we get the default executable
file as a.out. The .out was the extension for the executable file format for
earlier versions of Linux and UNIX based operating systems. As the technology
has emerged .elf has become the most commonly used executable file format,
though the name of the default executable file generated is a.out. All the latest
systems consider .elf as the executable file format by default.
ELF stands for Executable and Linkable File format. The major
reason for migration from .out to .elf format is that .out file couldn’t
support linking with the libraries.
An ELF file consists of two parts. 1. ELF header and 2. File
data
The contents of ELF file can be observed using the command:
readelf -a a.out
The following is the screenshot of an elf file.
1. ELF Header:
As shown in the screenshot, there are several terms present
in the ELF header. In simple words, ELF header gives the information about meta
data like class of the file, endianness of the file etc. The following is the explanation
of the terms you see in the screenshot shown above.
Magic is used to get the information like file format id, architecture
of the system on which the file was developed, endianness of the system,
version of the system etc. Note that all the numbers represented in the elf file
follow hexadecimal number system. It also assures that the file is not corrupt,
based on all the parameters present in the magic number.
Magic number is present in any of the file types and it
decides the file format and the metadata of the file.
The following image depicts a magic number and the latter
are the explanation for important understandings from the magic number.
ELF description |
Class: The fifth byte denotes the architecture of the platform
developed. 01 in this byte denotes 32-bit (01) or 64-bit (02) format of the elf
file.
Data: The sixth byte denotes the endianness of the data. 01
in this byte represents little endian format and 02 represents big endian
format.
Version: The seventh byte denotes the version of elf format.
However, there is only one version called Type 1. Hence, 01 in this byte
denotes Type 1 of ELF.
OS/ABI: ABI stands for Application Binary Interface. Due to
different versions of a given OS, there occurs overlapping or ambiguity between
the common functions. ABI byte ensures that right functions are used. For all
the Linux systems, ABI version is System V.
Machine: it represents the architecture of the machine. 01 –
32-bit architecture (x86) and 02 – 64-bit architecture.
Type: It gives the purpose of the file viz.,
01 –
DYN – Shared object files for libraries
02 – EXEC – Executable files for binaries
03 – REL – Relocatable files, before linked into executable files
02 – EXEC – Executable files for binaries
03 – REL – Relocatable files, before linked into executable files
All other bytes denote advanced metadata related to the
executable file.
2. File Data:
A file data of ELF file consists of three parts.
- Program headers or Segments
- Section headers or Sections
- Data
Program headers or Segments:
Program headers are used by linker to allow execution of
multiple source files by linking together. They convert the predefined instructions
to a memory may using mmap(2) system call.
Eg: GNU_EH_FRAME, GNU_STACK etc.
Section headers or Sections:
Section headers categorize the data into two types – Instructions
or data required for processing i.e., section headers of a file define all the sections
of a file.
Eg: .data, .rodata etc.
The contents of section headers are divided into four types.
They are: .text, .data, .rodata and .bss.
- The .text section contains the executable code of the given program. The contents of the text section do not change and are loaded only once, during compilation.
- The .data section consists of initialized data with read/write access i.e., initialized static, global and extern variable.
- The .rodata section consists of the initialized data with read access only i.e., numeric constants and string constants.
- The .bss section consists of uninitialized data with read/write access i.e., uninitialized static, global and extern variables.
However, the most common terminology of these sections are
text section and data section. The following image can give you a better view
of executable files.
ELF file contents |
For every process, a section of RAM is allocated as segment,
which is called Stack. The contents of the executable are brought into the
stack and the processor starts execution. This stack consists of text section,
data section, stack section and heap section.
Text section and data section were discussed above.
Process during execution |
- The stack is the memory space used to allocate memory to a function, called Stack Frame. Stack frame for a given function is allocated only when the function is called.
- The heap section is used for allocating memory for the pointers using Dynamic Memory Allocation (DMA).
Any executable file should be brought into RAM for execution,
as the CPU cannot process the data present in secondary memory. During
execution, it is called a Process. A process consists of every single statement
that is present in the source code. It even consists of the variable those are
just declared but not used; functions those are defined but not called.
Utilities for .elf file description:
- hexdump is used for getting the details of the hex file.
- readelf is used to get the structure of an ELF file.
- scanelf and execstack are the two tools used to get stack details of the ELF file.
- dumpelf, elfls and eu-readelf are used to get the headers of the ELF file.
- objdump is used to see the symbols of the ELF file.
- elfutils package consists of utilities to perform analysis on an ELF file.
Commands:
- hexdump a.out
- readelf -a a.out
- dumpelf <pax-utils>
- elfls -S /bin/ps
- eu-readelf -program-headers /bin/ps
- objdump -h /bin/ps