Date created: 12/07/16 18:42:22. Last modified: 06/03/19 08:05:02

Preprocessor, Compiler, Assembler, Linker, Loader (gcc)


The preprocessor expands macros and typedefs and include statements to produce the final complete C/C++ code which is what is then passed to the compiler. This is the example program:

cat helloworld.c

#include <stdio.h> int main() { printf("Hello, World!\n"); return 0; }

This is the preprocessor output generated using "cc -E helloworld.c -o helloworld.pp".

The compiler takes the fully expanded C/C++ code and generates machine specific assembly code and often introduces various optimisations and can perform complex checks for example for type mismatches and stack overflows. This is the compiler output generated using "cc -S helloworld.c -o helloworld.s"

The assembly code is now passed to the assembler which turns the assembly code into binary code (machine code) in an output file called an object file. Both cc and gcc use the same assembler by default called "as". The binary code produced by the assembler is not absolute with regards to its references to memory addresses and function calls, instead it uses labels. Many object files are expected to be generated and only when they are merged will all locations become clear and relevant. This is the assembler output generated using "as helloworld.s -o helloworld.out".

Next the linker takes several object files (assembly code compiled into binary) and resolves the labels and references between them. Often the program code is assembled and linked with pre-assembled versions of C libraries that were references in the original code. The linker is called using "ld -o helloworld helloworld.out -lc --entry main" to produce the final binary executable file. This is the file generated. Note that the test system used for compiling has some path issues as the binary is crashing:

[email protected]:~/C$ ld -o helloworld helloworld.out -lc --entry main
[email protected]:~/C$ ./helloworld 
bash: ./helloworld: No such file or directory

[email protected]:~/C$ ldd helloworld =>  (0x00007ffecabbc000) => /lib/x86_64-linux-gnu/ (0x00007f7a15f27000)
	/lib/ => /lib64/ (0x0000562ec9a25000)
[email protected]:~/C$ readelf -a helloworld | grep interpreter
      [Requesting program interpreter: /lib/]
[email protected]:~/C$ ls -l /lib/
ls: cannot access '/lib/': No such file or directory

# This ^ interpreter doesn't exist. It should probably be a sym link to /lib/x86_64-linux-gnu/

[email protected]:~/C$ ld -o helloworld helloworld.out -lc --entry main --dynamic-linker=/lib/x86_64-linux-gnu/
[email protected]:~/C$ readelf -a helloworld | grep interpreter
      [Requesting program interpreter: /lib/x86_64-linux-gnu/]
[email protected]:~/C$ ldd helloworld =>  (0x00007ffffcef9000) => /lib/x86_64-linux-gnu/ (0x00007f045dc06000)
	/lib/x86_64-linux-gnu/ => /lib64/ (0x000055cabc04c000)
[email protected]:~/C$ ./helloworld 
Hello, World!
Segmentation fault (core dumped)
[email protected]:~/C$ echo $?

[email protected]:~/C$ uname -a
Linux ubuntu-laptop 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Finally the loader is used to load the binary file into memory and execute its instructions. This example binary file is in ELF format so it contains an ELF header and data sections. The syscall execve is used to start the new process and the linker-loader ld is called as well as the libc library. The \777ELF magic word can be seen at the start of the read() of the libc library to detemine this is an ELF file also:

[email protected]:~/C$ gcc -o helloworld helloworld.c
[email protected]:~/C$ ./helloworld 
Hello, World!
[email protected]:~/C$ strace ./helloworld
execve("./helloworld", ["./helloworld"], [/* 51 vars */]) = 0
brk(NULL)                               = 0x18ac000
access("/etc/", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57f8000
access("/etc/", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=113650, ...}) = 0
mmap(NULL, 113650, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f02f57dc000
close(3)                                = 0
access("/etc/", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1864888, ...}) = 0
mmap(NULL, 3967392, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f02f520c000
mprotect(0x7f02f53cb000, 2097152, PROT_NONE) = 0
mmap(0x7f02f55cb000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f02f55cb000
mmap(0x7f02f55d1000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f02f55d1000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57db000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57da000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57d9000
arch_prctl(ARCH_SET_FS, 0x7f02f57da700) = 0
mprotect(0x7f02f55cb000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f02f57fa000, 4096, PROT_READ) = 0
munmap(0x7f02f57dc000, 113650)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
brk(NULL)                               = 0x18ac000
brk(0x18cd000)                          = 0x18cd000
write(1, "Hello, World!\n", 14Hello, World!
)         = 14
exit_group(0)                           = ?
+++ exited with 0 +++