Date created: Wednesday, December 7, 2016 6:42:22 PM. Last modified: Monday, June 3, 2019 8:05:02 AM
Preprocessor, Compiler, Assembler, Linker, Loader (gcc)
References:
The preprocessor expands macros and typedefs and include statements to produce the final complete C/C++ code which is what is then passed to the compiler. This is the example program:
cat helloworld.c
#include <stdio.h> int main() { printf("Hello, World!\n"); return 0; }
This is the preprocessor output generated using "cc -E helloworld.c -o helloworld.pp".
The compiler takes the fully expanded C/C++ code and generates machine specific assembly code and often introduces various optimisations and can perform complex checks for example for type mismatches and stack overflows. This is the compiler output generated using "cc -S helloworld.c -o helloworld.s"
The assembly code is now passed to the assembler which turns the assembly code into binary code (machine code) in an output file called an object file. Both cc and gcc use the same assembler by default called "as". The binary code produced by the assembler is not absolute with regards to its references to memory addresses and function calls, instead it uses labels. Many object files are expected to be generated and only when they are merged will all locations become clear and relevant. This is the assembler output generated using "as helloworld.s -o helloworld.out".
Next the linker takes several object files (assembly code compiled into binary) and resolves the labels and references between them. Often the program code is assembled and linked with pre-assembled versions of C libraries that were references in the original code. The linker is called using "ld -o helloworld helloworld.out -lc --entry main" to produce the final binary executable file. This is the file generated. Note that the test system used for compiling has some path issues as the binary is crashing:
bensley@ubuntu-laptop:~/C$ ld -o helloworld helloworld.out -lc --entry main bensley@ubuntu-laptop:~/C$ ./helloworld bash: ./helloworld: No such file or directory bensley@ubuntu-laptop:~/C$ ldd helloworld linux-vdso.so.1 => (0x00007ffecabbc000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7a15f27000) /lib/ld64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x0000562ec9a25000) bensley@ubuntu-laptop:~/C$ readelf -a helloworld | grep interpreter [Requesting program interpreter: /lib/ld64.so.1] bensley@ubuntu-laptop:~/C$ ls -l /lib/ld64.so.1 ls: cannot access '/lib/ld64.so.1': No such file or directory # This ^ interpreter doesn't exist. It should probably be a sym link to /lib/x86_64-linux-gnu/ld-2.23.so bensley@ubuntu-laptop:~/C$ ld -o helloworld helloworld.out -lc --entry main --dynamic-linker=/lib/x86_64-linux-gnu/ld-2.23.so bensley@ubuntu-laptop:~/C$ readelf -a helloworld | grep interpreter [Requesting program interpreter: /lib/x86_64-linux-gnu/ld-2.23.so] bensley@ubuntu-laptop:~/C$ ldd helloworld linux-vdso.so.1 => (0x00007ffffcef9000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f045dc06000) /lib/x86_64-linux-gnu/ld-2.23.so => /lib64/ld-linux-x86-64.so.2 (0x000055cabc04c000) bensley@ubuntu-laptop:~/C$ ./helloworld Hello, World! Segmentation fault (core dumped) bensley@ubuntu-laptop:~/C$ echo $? 139 bensley@ubuntu-laptop:~/C$ uname -a Linux ubuntu-laptop 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Finally the loader is used to load the binary file into memory and execute its instructions. This example binary file is in ELF format so it contains an ELF header and data sections. The syscall execve is used to start the new process and the linker-loader ld is called as well as the libc library. The \777ELF magic word can be seen at the start of the read() of the libc library to detemine this is an ELF file also:
bensley@ubuntu-laptop:~/C$ gcc -o helloworld helloworld.c bensley@ubuntu-laptop:~/C$ ./helloworld Hello, World! bensley@ubuntu-laptop:~/C$ strace ./helloworld execve("./helloworld", ["./helloworld"], [/* 51 vars */]) = 0 brk(NULL) = 0x18ac000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57f8000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=113650, ...}) = 0 mmap(NULL, 113650, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f02f57dc000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1864888, ...}) = 0 mmap(NULL, 3967392, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f02f520c000 mprotect(0x7f02f53cb000, 2097152, PROT_NONE) = 0 mmap(0x7f02f55cb000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f02f55cb000 mmap(0x7f02f55d1000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f02f55d1000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57db000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57da000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f02f57d9000 arch_prctl(ARCH_SET_FS, 0x7f02f57da700) = 0 mprotect(0x7f02f55cb000, 16384, PROT_READ) = 0 mprotect(0x600000, 4096, PROT_READ) = 0 mprotect(0x7f02f57fa000, 4096, PROT_READ) = 0 munmap(0x7f02f57dc000, 113650) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 brk(NULL) = 0x18ac000 brk(0x18cd000) = 0x18cd000 write(1, "Hello, World!\n", 14Hello, World! ) = 14 exit_group(0) = ? +++ exited with 0 +++
Previous page: Intel Opcode Examples
Next page: RET vs. SYSCALL vs. C exit()