NANDHOO.

Introduction to Assembly Language

Chapter 8: Introduction to Assembly Language


Introduction


Assembly language is the lowest level of programming before raw machine code. Each assembly instruction corresponds to a single CPU operation. While modern system programming is mostly done in C, understanding assembly is essential for debugging, optimization, writing bootloaders, and understanding how CPUs actually execute your code.


Why This Matters


Assembly language reveals what the CPU is truly doing. When you debug crashes, analyze performance, write device drivers, or develop operating systems, you'll encounter assembly. Understanding assembly helps you write better C code because you understand what your compiler generates. For certain system-level tasks like bootloaders and kernel initialization, assembly is unavoidable.


How to Study This Chapter


  1. Write code - Assembly makes sense by doing, not just reading
  2. Use NASM - We'll use NASM (Netwide Assembler) for examples
  3. Test in small steps - Start with tiny programs
  4. Read disassembly - See what your C compiler generates
  5. Be patient - Assembly is verbose but logical

What is Assembly Language?


Assembly is a human-readable representation of machine code.


Machine Code (binary):

10110000 01100001

Assembly (mnemonics):

mov al, 97

Meaning: Move the value 97 into the AL register.


Assembly vs Machine Code


Machine Code (hex):  B0 61
Assembly:            mov al, 97

The assembler converts assembly to machine code.


Why Learn Assembly?


1. Understanding Your Code


C code:

int x = 5;
int y = x + 10;

Assembly equivalent (conceptually):

mov eax, 5      ; x = 5
add eax, 10     ; x + 10
mov ebx, eax    ; y = result

2. Debugging


When your program crashes, debuggers show assembly:

Segmentation fault at: mov [eax], 0

Understanding assembly helps you diagnose the problem.


3. Optimization


Compilers are good but not perfect. Sometimes you need to write critical sections in assembly for maximum performance.


4. System Programming Requirements


  • Bootloaders: Start in 16-bit real mode assembly
  • Kernel initialization: Switch CPU modes
  • Context switching: Save/restore all registers
  • Interrupt handlers: Direct hardware interaction

Choosing an Assembler: NASM


NASM (Netwide Assembler) is popular because:

  • Clean, readable syntax
  • Cross-platform (Linux, Windows, macOS)
  • Well-documented
  • Used in many bootloader/kernel tutorials

Install NASM:

# Ubuntu/Debian
sudo apt-get install nasm

macOS

brew install nasm


Verify

nasm -v


Alternative assemblers: GAS (GNU Assembler), MASM (Microsoft), FASM, YASM.


CPU Registers


Registers are tiny, ultra-fast storage locations inside the CPU.


x86 32-bit General Purpose Registers


EAX - Accumulator (arithmetic operations)
EBX - Base (base pointer for memory)
ECX - Counter (loop counter)
EDX - Data (I/O operations, arithmetic)
ESI - Source Index (string/memory operations)
EDI - Destination Index (string/memory operations)
EBP - Base Pointer (stack frame pointer)
ESP - Stack Pointer (points to top of stack)

Register Hierarchy (x86)


64-bit: RAX  (entire register)
         |
32-bit: EAX  (lower 32 bits)
         |
16-bit: AX   (lower 16 bits)
         |
 8-bit: AH AL (high 8 bits, low 8 bits)

Example:

mov rax, 0x1234567890ABCDEF  ; 64-bit
; Now:
; RAX = 0x1234567890ABCDEF
; EAX = 0x90ABCDEF (lower 32 bits)
; AX  = 0xCDEF (lower 16 bits)
; AL  = 0xEF (lower 8 bits)
; AH  = 0xCD (bits 8-15)

Special Purpose Registers


EIP - Instruction Pointer (points to next instruction)
EFLAGS - Flags register (status flags: zero, carry, overflow, etc.)

Segment Registers (less commonly used in modern programming)


CS - Code Segment
DS - Data Segment
SS - Stack Segment
ES, FS, GS - Extra segments

Basic Assembly Syntax (NASM)


Instruction Format


label:  instruction  operands  ; comment

Example:

start:  mov eax, 5    ; Move 5 into EAX register

Data Sizes


byte    - 8 bits  (1 byte)
word    - 16 bits (2 bytes)
dword   - 32 bits (4 bytes)
qword   - 64 bits (8 bytes)

Directives


section .data      ; Data segment (initialized data)
section .bss       ; BSS segment (uninitialized data)
section .text      ; Code segment (executable instructions)

global _start ; Make _start visible to linker


db - define byte dw - define word dd - define double word dq - define quad word


resb - reserve bytes resw - reserve words


Your First Assembly Program


Hello World (Linux x86-64)


section .data
    msg db "Hello, Assembly!", 0xA   ; String with newline
    len equ $ - msg                   ; Length of string

section .text global _start


_start: ; write(1, msg, len) system call mov rax, 1 ; sys_write mov rdi, 1 ; file descriptor: stdout mov rsi, msg ; pointer to message mov rdx, len ; message length syscall ; invoke system call


; exit(0) system call
mov rax, 60         ; sys_exit
xor rdi, rdi        ; exit code 0
syscall             ; invoke system call

Compile and run:

nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello

Output:

Hello, Assembly!

Breakdown


  1. section .data: Defines initialized data
  2. msg db: Defines byte string
  3. len equ: Calculates length ($ = current position)
  4. section .text: Code section
  5. global _start: Entry point
  6. mov: Move data between registers/memory
  7. syscall: Make system call (Linux x86-64)

Common Assembly Instructions


Data Movement


mov dest, src       ; Move data: dest = src
lea dest, [addr]    ; Load effective address
push value          ; Push onto stack (ESP -= 4, [ESP] = value)
pop dest            ; Pop from stack (dest = [ESP], ESP += 4)

Examples:

mov eax, 42         ; eax = 42
mov ebx, eax        ; ebx = eax
mov ecx, [var]      ; ecx = value at memory location var
mov [var], eax      ; Store eax value to memory location var

Arithmetic


add dest, src       ; dest = dest + src
sub dest, src       ; dest = dest - src
mul src             ; eax = eax * src (unsigned)
imul src            ; eax = eax * src (signed)
div src             ; eax = eax / src, edx = remainder
inc dest            ; dest = dest + 1
dec dest            ; dest = dest - 1
neg dest            ; dest = -dest

Examples:

mov eax, 10
add eax, 5          ; eax = 15
sub eax, 3          ; eax = 12
inc eax             ; eax = 13

Logical and Bitwise


and dest, src       ; dest = dest & src
or  dest, src       ; dest = dest | src
xor dest, src       ; dest = dest ^ src
not dest            ; dest = ~dest
shl dest, count     ; dest = dest << count (shift left)
shr dest, count     ; dest = dest >> count (shift right)

Examples:

mov al, 0b10101010
and al, 0b11110000  ; al = 0b10100000 (mask)
or  al, 0b00001111  ; al = 0b10101111 (set bits)
xor al, 0b11111111  ; al = 0b01010000 (invert)

Comparison and Jumps


cmp op1, op2        ; Compare (sets flags, doesn't change operands)
test op1, op2       ; Logical AND (sets flags, doesn't change operands)

jmp label ; Unconditional jump je label ; Jump if equal (ZF = 1) jne label ; Jump if not equal (ZF = 0) jg label ; Jump if greater (signed) jl label ; Jump if less (signed) ja label ; Jump if above (unsigned) jb label ; Jump if below (unsigned)


Example: If statement:

mov eax, 10
cmp eax, 5
jg  greater        ; Jump if eax > 5

; eax <= 5 mov ebx, 0 jmp done


greater: ; eax > 5 mov ebx, 1


done: ; Continue


Example: Loop:

mov ecx, 10        ; Counter

loop_start: ; Do something dec ecx ; ecx-- jnz loop_start ; Jump if not zero


; Loop done


Memory Addressing Modes


Immediate


mov eax, 42        ; eax = 42 (value is in instruction)

Register


mov eax, ebx       ; eax = ebx

Direct Memory


mov eax, [var]     ; eax = value at memory address 'var'

Indirect


mov eax, [ebx]     ; eax = value at address stored in ebx

Indexed


mov eax, [ebx + 4]           ; eax = *(ebx + 4)
mov eax, [array + ecx*4]     ; eax = array[ecx] (for int array)

The Stack


The stack is a region of memory for temporary storage, function calls, and local variables.


Stack Operations


push eax           ; ESP -= 4, [ESP] = eax
pop ebx            ; ebx = [ESP], ESP += 4

Visualization:

Before push eax (eax = 0x1234, esp = 0x2000):

 0x2000 <- ESP

After push eax:


 0x1FFC    0x1234 <- ESP
 0x2000

Stack grows downward (toward lower addresses).


Function Calls


call function      ; Push return address, jump to function
ret                ; Pop return address, jump to it

What call does:

  1. Push address of next instruction onto stack
  2. Jump to function

What ret does:

  1. Pop address from stack
  2. Jump to that address

Flags Register


The EFLAGS register contains status flags set by operations.


Common Flags


ZF - Zero Flag (set if result is zero)
CF - Carry Flag (set if unsigned overflow)
SF - Sign Flag (set if result is negative)
OF - Overflow Flag (set if signed overflow)

Example:

mov eax, 5
sub eax, 5    ; eax = 0, ZF = 1 (zero flag set)

mov al, 255 add al, 1 ; al = 0 (wrap), CF = 1 (carry flag set)


System Calls (Linux x86-64)


System calls request services from the kernel.


Making a System Call


mov rax, syscall_number
mov rdi, arg1
mov rsi, arg2
mov rdx, arg3
syscall

Common System Calls


rax = 0:  read(fd, buf, count)
rax = 1:  write(fd, buf, count)
rax = 2:  open(filename, flags, mode)
rax = 3:  close(fd)
rax = 60: exit(status)

Example: Read from stdin:

section .bss
    buffer resb 64

section .text global _start


_start: ; read(0, buffer, 64) mov rax, 0 ; sys_read mov rdi, 0 ; stdin mov rsi, buffer ; buffer address mov rdx, 64 ; bytes to read syscall


; exit(0)
mov rax, 60
xor rdi, rdi
syscall

Key Concepts


  • Assembly is human-readable machine code
  • Registers are fast storage inside the CPU
  • Instructions perform operations on registers and memory
  • The stack stores temporary data and function call information
  • Flags indicate results of operations
  • System calls request kernel services
  • Addressing modes access data in different ways

Common Mistakes


  1. Wrong operand order - NASM/Intel syntax is dest, src
  2. Forgetting stack alignment - x64 requires 16-byte alignment
  3. Register size mismatch - Can't mov al, ebx
  4. Not preserving registers - Caller/callee save conventions
  5. Stack imbalance - Every push needs a pop

Debugging Tips


  • Use GDB - Step through assembly instructions
  • Print registers - info registers in GDB
  • Start simple - Get "Hello World" working first
  • Read disassembly - objdump -d shows machine code
  • Check flags - Understand how flags are set

Mini Exercises


  1. Write "Hello, World!" in assembly
  2. Add two numbers and print the result
  3. Implement a loop that counts from 1 to 10
  4. Write a function that returns a value
  5. Use the stack to save and restore registers
  6. Read a character from stdin
  7. Implement a simple if-else statement
  8. Create a program that exits with specific code
  9. Use bitwise operations to test/set bits
  10. Write inline assembly in a C program

Review Questions


  1. What is the difference between assembly and machine code?
  2. Name four general-purpose registers in x86.
  3. What does the syscall instruction do?
  4. How does the stack grow (up or down)?
  5. What flag is set when the result of an operation is zero?

Reference Checklist


By the end of this chapter, you should be able to:

  • Understand what assembly language is
  • Know x86 register names and purposes
  • Write basic assembly programs with NASM
  • Use common instructions (mov, add, sub, jmp)
  • Make Linux system calls
  • Understand the stack
  • Use different addressing modes
  • Compile and run assembly programs

Next Steps


Now that you understand basic assembly, the next chapter explores CPU architectures in detail. You'll learn about x86, x64, ARM, and AArch64 architectures, their differences, and how to write code for different processors.




Key Takeaway: Assembly language provides direct control over the CPU. While verbose, it gives you precise understanding and control over what the computer does, which is essential for system-level programming, bootloaders, and kernels.