Chapter 19: Debugging and Optimization

Introduction

Writing correct, efficient code requires systematic debugging and optimization techniques. This chapter explores strategies for finding and fixing bugs, profiling performance, and optimizing algorithms and data structures. Mastering these skills transforms good programmers into exceptional ones.

Why This Matters

Debugging and optimization skills are essential:

Production systems: Bugs cause outages and data loss
Performance: Slow code wastes resources and frustrates users
Scalability: Inefficient algorithms don't scale
Career: Senior developers excel at debugging complex issues
Competitive programming: Optimization makes the difference
Interviews: Optimization questions are common

How to Study This Chapter

Practice systematic debugging - Follow methodical approaches
Profile before optimizing - Measure, don't guess
Learn debugging tools - GDB, Valgrind, profilers
Study common bugs - Recognize patterns
Benchmark code - Compare implementations

Debugging Strategies

Systematic Approach

1. Reproduce the bug reliably
2. Isolate the problem (binary search through code)
3. Form hypothesis about cause
4. Test hypothesis
5. Fix bug
6. Verify fix doesn't break anything else
7. Understand why bug occurred

Rubber Duck Debugging

Explain your code line-by-line to a rubber duck (or colleague). Often reveals the bug.

Print Debugging

Strategic print statements to trace execution.

#include <iostream>
#include <vector>
using namespace std;

void debugBinarySearch(vector& arr, int target) { int left = 0, right = arr.size() - 1; int iteration = 0;

while (left <= right) {
    iteration++;
    int mid = left + (right - left) / 2;

    cout << "Iteration " << iteration << ": "
         << "left=" << left << ", mid=" << mid
         << ", right=" << right
         << ", arr[mid]=" << arr[mid] << endl;

    if (arr[mid] == target) {
        cout << "Found at index " << mid << endl;
        return;
    }

    if (arr[mid] < target) {
        left = mid + 1;
    } else {
        right = mid - 1;
    }
}

cout << "Not found" << endl;

}

int main() { vector arr = {1, 3, 5, 7, 9, 11, 13}; debugBinarySearch(arr, 7); return 0; }

Assertions

Verify assumptions during development.

#include <cassert>
#include <iostream>
#include <vector>
using namespace std;

int binarySearch(vector& arr, int target) { // Preconditions assert(!arr.empty() && "Array must not be empty"); assert(is_sorted(arr.begin(), arr.end()) && "Array must be sorted");

int left = 0, right = arr.size() - 1;

while (left <= right) {
    int mid = left + (right - left) / 2;

    // Invariant: if target exists, it's in [left, right]
    assert(left >= 0 && right < arr.size());

    if (arr[mid] == target) {
        return mid;
    }

    if (arr[mid] < target) {
        left = mid + 1;
    } else {
        right = mid - 1;
    }
}

return -1;

}

int main() { vector arr = {1, 3, 5, 7, 9}; int result = binarySearch(arr, 5); cout << "Found at: " << result << endl; return 0; }

Debugger (GDB Example)

# Compile with debug symbols
g++ -g program.cpp -o program

Run in GDB

gdb ./program

Common GDB commands:

break main # Set breakpoint at main run # Start program next # Next line (step over) step # Step into function print variable # Print variable value backtrace # Show call stack continue # Continue execution quit # Exit GDB

Common Bug Patterns

Off-by-One Errors

// Wrong: Misses last element
for (int i = 0; i < arr.size() - 1; i++) {  // Bug!
    process(arr[i]);
}

// Correct for (int i = 0; i < arr.size(); i++) { process(arr[i]); }

// Wrong: Array access out of bounds int mid = (left + right) / 2; // Can overflow!

// Correct int mid = left + (right - left) / 2;

Null Pointer Dereferencing

#include <iostream>
using namespace std;

struct Node { int data; Node* next; };

// Buggy version int getLength(Node* head) { int count = 0; while (head->next != nullptr) { // Bug: What if head is nullptr? count++; head = head->next; } return count; }

// Fixed version int getLengthFixed(Node* head) { if (head == nullptr) return 0; // Check null first!

int count = 0;
while (head != nullptr) {
    count++;
    head = head->next;
}
return count;

}

int main() { Node* emptyList = nullptr; cout << "Length: " << getLengthFixed(emptyList) << endl; return 0; }

Memory Leaks

#include <iostream>
using namespace std;

// Buggy: Memory leak int* createArray() { int* arr = new int[100]; // ... use array ... return arr; // Caller must remember to delete[]! }

// Better: Use RAII #include vector createVector() { vector vec(100); // ... use vector ... return vec; // Automatically cleaned up }

// C version with manual cleanup void buggyFunction() { int* arr = (int*)malloc(100 * sizeof(int)); // ... use array ... // Forgot to free! Memory leak }

void fixedFunction() { int* arr = (int*)malloc(100 * sizeof(int)); if (arr == NULL) return;

// ... use array ...

free(arr);  // Always free!

}

Integer Overflow

#include <iostream>
#include <limits>
using namespace std;

// Buggy: Integer overflow int sumArray(int arr[], int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; // Can overflow! } return sum; }

// Fixed: Use larger type long long sumArrayFixed(int arr[], int n) { long long sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; } return sum; }

// Check for overflow bool willOverflow(int a, int b) { if (a > 0 && b > 0 && a > INT_MAX - b) return true; if (a < 0 && b < 0 && a < INT_MIN - b) return true; return false; }

int main() { int arr[] = {INT_MAX, 1}; cout << "Buggy sum: " << sumArray(arr, 2) << endl; // Overflow! cout << "Fixed sum: " << sumArrayFixed(arr, 2) << endl;

if (willOverflow(INT_MAX, 1)) {
    cout << "Overflow detected!" << endl;
}

return 0;

}

Performance Profiling

Timing Code

#include <iostream>
#include <chrono>
#include <vector>
using namespace std;
using namespace chrono;

void bubbleSort(vector& arr) { int n = arr.size(); for (int i = 0; i < n - 1; i++) { for (int j = 0; j < n - i - 1; j++) { if (arr[j] > arr[j + 1]) { swap(arr[j], arr[j + 1]); } } } }

void measureTime(void (*func)(vector&), vector& arr, const string& name) { auto start = high_resolution_clock::now();

func(arr);

auto end = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(end - start);

cout << name << " took " << duration.count() << " microseconds" << endl;

}

int main() { vector data(1000); for (int i = 0; i < data.size(); i++) { data[i] = data.size() - i; // Reverse order }

measureTime(bubbleSort, data, "Bubble Sort");

return 0;

}

Profiling with gprof

# Compile with profiling enabled
g++ -pg program.cpp -o program

Run program

./program

Generate profile report

gprof program gmon.out > analysis.txt

View hotspots

less analysis.txt

Memory Profiling with Valgrind

# Check for memory leaks
valgrind --leak-check=full ./program

Profile memory usage

valgrind --tool=massif ./program ms_print massif.out.

Detect cache misses

valgrind --tool=cachegrind ./program cg_annotate cachegrind.out.

Optimization Techniques

Algorithm Selection

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
using namespace std;
using namespace chrono;

// O(n²) - Slow for large n bool hasDuplicateSlow(vector& arr) { for (int i = 0; i < arr.size(); i++) { for (int j = i + 1; j < arr.size(); j++) { if (arr[i] == arr[j]) return true; } } return false; }

// O(n log n) - Much faster bool hasDuplicateFast(vector& arr) { vector sorted = arr; sort(sorted.begin(), sorted.end());

for (int i = 0; i < sorted.size() - 1; i++) {
    if (sorted[i] == sorted[i + 1]) return true;
}
return false;

}

int main() { vector data(10000); for (int i = 0; i < data.size(); i++) { data[i] = i; }

auto start = high_resolution_clock::now();
hasDuplicateSlow(data);
auto end = high_resolution_clock::now();
cout << "Slow: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;

start = high_resolution_clock::now();
hasDuplicateFast(data);
end = high_resolution_clock::now();
cout << "Fast: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;

return 0;

}

Data Structure Selection

#include <iostream>
#include <vector>
#include <unordered_set>
#include <chrono>
using namespace std;
using namespace chrono;

// Vector: O(n) search bool containsVector(vector& vec, int value) { for (int x : vec) { if (x == value) return true; } return false; }

// Unordered set: O(1) average search bool containsSet(unordered_set& set, int value) { return set.find(value) != set.end(); }

int main() { const int SIZE = 100000;

vector<int> vec;
unordered_set<int> set;

for (int i = 0; i < SIZE; i++) {
    vec.push_back(i);
    set.insert(i);
}

auto start = high_resolution_clock::now();
for (int i = 0; i < 1000; i++) {
    containsVector(vec, SIZE - 1);  // Search for last element
}
auto end = high_resolution_clock::now();
cout << "Vector: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;

start = high_resolution_clock::now();
for (int i = 0; i < 1000; i++) {
    containsSet(set, SIZE - 1);
}
end = high_resolution_clock::now();
cout << "Set: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;

return 0;

}

Cache-Friendly Code

#include <iostream>
#include <vector>
#include <chrono>
using namespace std;
using namespace chrono;

// Cache-unfriendly: Column-major access long sumColumnMajor(vector<vector>& matrix) { long sum = 0; for (int col = 0; col < matrix[0].size(); col++) { for (int row = 0; row < matrix.size(); row++) { sum += matrix[row][col]; // Poor cache locality } } return sum; }

// Cache-friendly: Row-major access long sumRowMajor(vector<vector>& matrix) { long sum = 0; for (int row = 0; row < matrix.size(); row++) { for (int col = 0; col < matrix[row].size(); col++) { sum += matrix[row][col]; // Good cache locality } } return sum; }

int main() { const int SIZE = 1000; vector<vector> matrix(SIZE, vector(SIZE, 1));

auto start = high_resolution_clock::now();
long sum1 = sumColumnMajor(matrix);
auto end = high_resolution_clock::now();
cout << "Column-major: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;

start = high_resolution_clock::now();
long sum2 = sumRowMajor(matrix);
end = high_resolution_clock::now();
cout << "Row-major: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;

return 0;

}

Loop Optimization

// Inefficient: Function call in loop condition
for (int i = 0; i < vec.size(); i++) {  // size() called every iteration
    process(vec[i]);
}

// Better: Cache size int n = vec.size(); for (int i = 0; i < n; i++) { process(vec[i]); }

// Inefficient: Repeated computation for (int i = 0; i < n; i++) { result[i] = expensiveFunction() * data[i]; // Called n times! }

// Better: Hoist invariant int constant = expensiveFunction(); for (int i = 0; i < n; i++) { result[i] = constant * data[i]; }

Space-Time Tradeoff

// Fibonacci: Slow (exponential time)
int fibSlow(int n) {
    if (n <= 1) return n;
    return fibSlow(n - 1) + fibSlow(n - 2);
}

// Fibonacci: Fast (linear time, linear space) int fibFast(int n, vector& memo) { if (n <= 1) return n; if (memo[n] != -1) return memo[n]; memo[n] = fibFast(n - 1, memo) + fibFast(n - 2, memo); return memo[n]; }

// Fibonacci: Fast (linear time, constant space) int fibOptimal(int n) { if (n <= 1) return n; int prev2 = 0, prev1 = 1; for (int i = 2; i <= n; i++) { int curr = prev1 + prev2; prev2 = prev1; prev1 = curr; } return prev1; }

Optimization Rules

Profile first - Don't guess where the bottleneck is
Optimize the right thing - Focus on hotspots (80/20 rule)
Choose right algorithm - O(n log n) beats O(n²) for large n
Use appropriate data structures - Hash table vs array vs tree
Reduce allocations - Reuse objects, reserve capacity
Consider cache locality - Access memory sequentially
Avoid premature optimization - "Root of all evil" - Donald Knuth

Common Mistakes

Premature optimization - Optimize before profiling
Optimizing wrong code - Not the bottleneck
Breaking correctness - Optimization introduces bugs
Micro-optimizations - Negligible impact
Ignoring compiler optimizations - Compiler often does better
Not testing after optimization - Verify correctness
Over-engineering - Complexity without benefit

Debugging Tips

Reproduce consistently - Can't fix what you can't reproduce
Use version control - Git bisect to find regression
Simplify test case - Minimal example that shows bug
Read error messages carefully - They often tell you exactly what's wrong
Check recent changes - Bug likely in new code
Take breaks - Fresh eyes see bugs
Ask for help - Pair debugging is powerful

Mini Exercises

Add assertions to binary search
Profile sorting algorithms
Find and fix memory leaks with Valgrind
Optimize matrix multiplication
Benchmark hash table vs binary search tree
Debug off-by-one error in array code
Profile recursive vs iterative Fibonacci
Fix integer overflow in sum function
Optimize cache locality in 2D array access
Measure impact of compiler optimization flags

Review Questions

What's the difference between debugging and optimization?
When should you optimize code?
How do you profile a C++ program?
What tools help find memory leaks?
How does cache locality affect performance?

Reference Checklist

By the end of this chapter, you should be able to:

Use systematic debugging strategies
Apply debugging tools (GDB, Valgrind)
Profile code to find bottlenecks
Recognize common bug patterns
Choose appropriate optimizations
Measure performance improvements
Use assertions effectively
Optimize without breaking correctness

Next Steps

Chapter 20 concludes the course with practical projects that apply all the concepts learned. You'll build real-world applications using data structures and algorithms.

Key Takeaway: Effective debugging requires systematic approaches and the right tools. Profile before optimizing—measure, don't guess. Choose the right algorithm and data structure for the problem. Optimize hotspots, not the entire codebase. Maintain correctness while improving performance. Master these skills to write production-quality code.