NANDHOO.

Compilers, Linkers, and Libraries

Chapter 6: Compilers, Linkers, and Libraries


Introduction


When you write C code and run gcc hello.c -o hello, a complex multi-stage process transforms your source code into an executable binary. Understanding this toolchain - preprocessor, compiler, assembler, and linker - is essential for system programming. This chapter demystifies how code becomes a program.


Why This Matters


System programmers need to understand the compilation process because:

  • You'll debug linking errors and symbol resolution
  • You'll create and use libraries
  • You'll optimize compilation for specific architectures
  • You'll understand binary formats and how programs are loaded
  • You'll write makefiles and build systems

How to Study This Chapter


  1. Experiment - Compile code with different flags
  2. Inspect output - Look at assembly, object files, executables
  3. Break things - See what errors look like
  4. Use tools - nm, objdump, ldd show what's inside binaries

The GCC Compilation Pipeline


GCC (GNU Compiler Collection) performs compilation in stages:


Source Code (.c)
      ↓
[Preprocessor]
      ↓
Preprocessed Code (.i)
      ↓
[Compiler]
      ↓
Assembly Code (.s)
      ↓
[Assembler]
      ↓
Object Code (.o)
      ↓
[Linker]
      ↓
Executable

Stage 1: Preprocessing


The preprocessor handles directives starting with #.


What It Does


#include <stdio.h>    // Include header file
#define MAX 100       // Define macro
#ifdef DEBUG          // Conditional compilation
    printf("Debug mode\n");
#endif

Actions:

  • #include - Paste entire header file contents
  • #define - Text substitution of macros
  • #ifdef/#ifndef - Conditional code inclusion
  • #pragma - Compiler-specific directives

Running Just the Preprocessor


gcc -E hello.c -o hello.i

Example:


Input (hello.c):

#include <stdio.h>
#define NUM 42

int main() { printf("Number: %d\n", NUM); return 0; }


Output (hello.i):

// ... thousands of lines from stdio.h ...

int main() { printf("Number: %d\n", 42); // NUM replaced with 42 return 0; }


Stage 2: Compilation


The compiler converts C code to assembly language.


What It Does


  • Parses C syntax
  • Checks types
  • Optimizes code
  • Generates assembly for target architecture

Running Just the Compiler


gcc -S hello.c -o hello.s

Example Assembly Output (hello.s for x86-64):

    .file   "hello.c"
    .section    .rodata
.LC0:
    .string "Hello, World!"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %edi
    call    puts
    movl    $0, %eax
    popq    %rbp
    ret

You can read this! It's the actual CPU instructions your code becomes.


Stage 3: Assembly


The assembler converts assembly to machine code (object file).


What It Does


  • Translates assembly mnemonics to binary opcodes
  • Creates object file (.o or .obj)
  • Includes symbol table (function/variable names and addresses)
  • Not yet executable (needs linking)

Running Just the Assembler


gcc -c hello.c -o hello.o

Object files contain:

  • Machine code for your functions
  • Data sections for variables
  • Symbol table (exported/imported symbols)
  • Relocation information

Inspecting Object Files


# List symbols
nm hello.o

Disassemble

objdump -d hello.o


View sections

objdump -h hello.o


Stage 4: Linking


The linker combines object files and libraries into an executable.


What It Does


  • Resolves symbol references (function calls, variables)
  • Combines code and data sections
  • Determines final memory addresses
  • Produces executable file

Example: Multi-File Program


main.c:

extern void greet();

int main() { greet(); return 0; }


greet.c:

#include <stdio.h>

void greet() { printf("Hello!\n"); }


Compilation:

gcc -c main.c -o main.o
gcc -c greet.c -o greet.o
gcc main.o greet.o -o program

The linker:

  1. Sees main.o calls function greet (undefined symbol)
  2. Finds greet defined in greet.o
  3. Resolves the address
  4. Combines into single executable

Libraries


Libraries are collections of reusable code.


Static Libraries (.a on Linux, .lib on Windows)


Characteristics:

  • Linked directly into your executable
  • Code copied into your binary
  • Larger executable size
  • No external dependencies at runtime

Creating a Static Library:


# Compile source files
gcc -c lib1.c -o lib1.o
gcc -c lib2.c -o lib2.o

Create archive (static library)

ar rcs libmylib.a lib1.o lib2.o


Link with program

gcc main.c -L. -lmylib -o program


Explanation:

  • ar - archive tool
  • -L. - look for libraries in current directory
  • -lmylib - link with libmylib.a (lib prefix and .a suffix automatic)

Dynamic/Shared Libraries (.so on Linux, .dll on Windows, .dylib on macOS)


Characteristics:

  • Not copied into executable
  • Loaded at runtime
  • Smaller executable
  • Multiple programs can share one copy in memory
  • Can update library without recompiling program

Creating a Shared Library:


# Compile with position-independent code
gcc -fPIC -c lib1.c -o lib1.o
gcc -fPIC -c lib2.c -o lib2.o

Create shared library

gcc -shared -o libmylib.so lib1.o lib2.o


Link with program

gcc main.c -L. -lmylib -o program


Run (need to set library path)

LD_LIBRARY_PATH=. ./program


PIC (Position Independent Code):

  • Code can execute at any memory address
  • Required for shared libraries
  • Slight performance overhead

Static vs Dynamic Comparison


AspectStatic LibraryDynamic Library
Link TimeCopied into executableReference stored
Executable SizeLargerSmaller
Load TimeFasterSlower (must load library)
MemoryDuplicated per programShared among programs
UpdatesRequires recompilationCan update library independently
DependenciesSelf-containedNeeds .so/.dll present

Binary Executable Formats


ELF (Executable and Linkable Format)


Used on Linux and many Unix systems.


ELF Sections:

  • .text - Executable code
  • .data - Initialized global variables
  • .bss - Uninitialized global variables (zeroed)
  • .rodata - Read-only data (constants, string literals)
  • .symtab - Symbol table
  • .strtab - String table (symbol names)

Viewing ELF Structure:


# View sections
readelf -S program

View symbols

readelf -s program


View program headers

readelf -l program


Example:

$ readelf -S hello

Section Headers: [Nr] Name Type Address Off Size [ 0] NULL 0000000000000000 000000 000000 [ 1] .text PROGBITS 0000000000401000 001000 000185 [ 2] .rodata PROGBITS 0000000000402000 002000 000013 [ 3] .data PROGBITS 0000000000404000 003000 000010 [ 4] .bss NOBITS 0000000000404010 003010 000008 ...


PE (Portable Executable)


Used on Windows (.exe, .dll).


Similar concept to ELF but different format.


The Linker's Job in Detail


Symbol Resolution


When you call a function, the compiler generates a reference:


int main() {
    printf("Hello");  // Compiler: "call function printf"
    return 0;
}

Object file contains:

UNDEFINED SYMBOL: printf

Linker searches:

  1. Other object files you specified
  2. Libraries (static then dynamic)
  3. System libraries

If found: Resolves to actual address If not found: Linker error: "undefined reference to printf"


Relocation


Object files contain placeholder addresses:


call 0x0  ; Placeholder, don't know printf's address yet

Linker:

  1. Decides final memory layout
  2. Assigns actual addresses
  3. Patches all references

Result:

call 0x401234  ; printf is at this address

Common Linking Errors


Undefined Reference


undefined reference to `someFunction'

Cause: Function declared but never defined, or forgot to link library.


Fix:

  • Define the function
  • Link the library: gcc main.c -lm (for math library)

Multiple Definition


multiple definition of `globalVar'

Cause: Same symbol defined in multiple object files.


Fix: Use extern or static.


Library Not Found


cannot find -lsomelib

Cause: Library file not in library search path.


Fix: Use -L/path/to/lib to add search directory.


GCC Compilation Flags


Essential Flags


# Output file name
gcc -o program main.c

Compile without linking

gcc -c main.c


Enable all warnings

gcc -Wall -Wextra main.c


Debug symbols

gcc -g main.c


Optimization

gcc -O2 main.c # -O0 (none), -O1, -O2, -O3


Specify C standard

gcc -std=c11 main.c


Link library

gcc main.c -lm # Link libm.so (math library)


Add library search path

gcc main.c -L/usr/local/lib -lmylib


Add include search path

gcc -I/usr/local/include main.c


Define macro

gcc -DDEBUG -DMAX=100 main.c


Debugging and Analysis Flags


# Generate assembly
gcc -S main.c

Preprocess only

gcc -E main.c


Verbose output (see what gcc is doing)

gcc -v main.c


Save temporary files

gcc -save-temps main.c


Position-independent code (for shared libraries)

gcc -fPIC main.c


Build Systems


For large projects with many files, typing gcc commands manually is tedious. Build systems automate compilation.


Makefiles (Make)


Example Makefile:


CC = gcc
CFLAGS = -Wall -Wextra -O2

program: main.o utils.o (CC)(CC) (CFLAGS) -o program main.o utils.o


main.o: main.c utils.h (CC)(CC) (CFLAGS) -c main.c


utils.o: utils.c utils.h (CC)(CC) (CFLAGS) -c utils.c


clean: rm -f *.o program


Usage:

make           # Build program
make clean     # Remove generated files

How it works:

  • Dependency graph
  • Only recompiles changed files
  • Saves time in large projects

Key Concepts


  • Compilation has four stages: preprocessing, compiling, assembling, linking
  • Object files contain machine code but aren't executable yet
  • Linker resolves symbols and combines object files
  • Static libraries are copied into executable
  • Shared libraries are loaded at runtime
  • ELF is the binary format on Linux
  • Build systems automate compilation

Common Mistakes


  1. Forgetting -l flag - Can't find library
  2. Wrong library order - Order matters, dependencies last
  3. Missing -fPIC - Shared library compilation fails
  4. Not using -Wall - Miss important warnings
  5. Mixing debug/release - Inconsistent behavior

Debugging Tips


  • Use nm - See what symbols are in object files/libraries
  • Use ldd - See what libraries executable depends on
  • Use objdump - Disassemble and inspect binaries
  • Check library paths - Use LD_LIBRARY_PATH for testing
  • Read linker errors carefully - Usually tell you exactly what's wrong

Mini Exercises


  1. Compile a C file in stages (stop at each stage and inspect output)
  2. Create a multi-file program and compile it
  3. Create a static library and link against it
  4. Create a shared library and use it
  5. Use nm to inspect symbols in an object file
  6. Disassemble an executable with objdump -d
  7. Write a simple Makefile
  8. Use readelf to examine ELF structure
  9. Intentionally cause an "undefined reference" error and fix it
  10. Compare executable sizes with static vs dynamic linking

Review Questions


  1. What are the four stages of compilation?
  2. What does the linker do?
  3. What's the difference between static and shared libraries?
  4. What is ELF and what sections does it contain?
  5. Why do we need position-independent code for shared libraries?

Reference Checklist


By the end of this chapter, you should be able to:

  • Understand the GCC compilation pipeline
  • Compile code in stages
  • Create and use static libraries
  • Create and use shared libraries
  • Understand ELF format basics
  • Use GCC flags effectively
  • Debug linking errors
  • Write basic Makefiles

Next Steps


Now that you understand how C code becomes executable machine code, the next chapter explores data structures implemented in C. You'll learn how arrays, linked lists, stacks, and queues are actually laid out in memory and how to implement them efficiently at a low level.




Key Takeaway: The journey from source code to executable involves preprocessing, compiling to assembly, assembling to object code, and linking. Understanding this process helps you debug errors, optimize builds, and create libraries for system programming.