Date created: Sunday, April 21, 2019 5:50:30 PM. Last modified: Sunday, June 21, 2020 9:23:20 AM

Inline Assembly on Linux with GCC

References:
http://flint.cs.yale.edu/cs421/papers/x86-asm/asm.html

 

When using inline assembly in C on Linux and compiling with GCC, the assembly code is a compile-time C constant string. As a result, GCC doesn't parse the assembly. In additional to this, it is not clear to GCC whether this mystery code is providing any outputs to the remainder of the code, or if it is essentially redundant and can be "optimised away" (removed!) during compile. Note in the example code below the use of the volatile directive, this is to tell GCC not to remove any of the code which follows even though it seems to be doing nothing.

The inline assembly code is passed verbatim to the assembly compiler, which means the assembly code needs to already be in the format the assembly complier expects. Two actions are required to accommodate the formatting requirements of inline assembly being passed to the assembly compiler. Firstly (this only applies when writing opcodes instead of instructions), the ".byte" prefix must be used to indicate that byte values are being supplied in the C string. This means that the code #define NOP ".byte 0x90\n\t" will be interpreted as a single byte, the hex value for which is 0x90 (which is opcode for the NOP instruction in hex). Another example would be ".byte 0xb8 0x01 0x00 0x00 0x00", which is the opcode for "mov $1, %eax". Secondly, when entering multiple lines of assembly instructions, the "\n\t" characters are to align the assembly in the way the assembler would be reading it as if it was in its own file.


$ cat inline-asm.c

// ".byte" is an GNU GAS assembler directive
// https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC75
// These can be comma seperated: ".byte 0x51,0x90,0x59\n\t"
#define NOP ".byte 0x90\n\t"

int main(int argc, char* argv[]) {

  // This...
asm volatile("nop");
// is the same as this:
asm volatile(".byte 0x90");
// and the same as this:
asm volatile(NOP);

  // This...
asm("nop\n\t"
"nop\n\t"
"nop\n\t");
// is the same as this:
asm volatile(".byte 0x90\n\t"
".byte 0x90\n\t"
".byte 0x90\n\t");
// and the same as this:
asm volatile(NOP NOP NOP);

// This will exit the process before the 'return 111' (0x6F) below executes,
// the SYSCALL will exit the process with status code 222 (0xDE):
__asm__ __volatile__("mov $60, %rax\n\t"
"mov $222, %rdi\n\t"
"syscall");

// __asm__ is the same as asm, but __asm__ can be used if the name asm clashes
// with something else in the code (e.g. a variable called asm).
// The same is true volatile and __volatile__.
// volatile prevents the inline assembly from being moved, changed or deleted
// by compiler optimisations; the assembly prefixed with volatile will be executed
// exactly as it is, in it's current location within the code.

return 111;

}

$ gcc -O0 -c inline-asm.c
$ file inline-asm.o
inline-asm.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

$ objdump -d -M intel -S inline-asm.o

inline-asm.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
7: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
16: 90 nop
17: 48 c7 c0 3c 00 00 00 mov rax,0x3c
1e: 48 c7 c7 de 00 00 00 mov rdi,0xde
25: 0f 05 syscall
27: b8 6f 00 00 00 mov eax,0x6f
2c: 5d pop rbp
2d: c3 ret

$ gcc -g -O0 -o inline-asm inline-asm.c

$ file inline-asm
inline-asm: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2113be8486c0af883ca7424ca4f6b5b0b418162a, with debug_info, not stripped

$ ./inline-asm
$ echo $?
222