Chapter 19: Debugging and Optimization
Introduction
Writing correct, efficient code requires systematic debugging and optimization techniques. This chapter explores strategies for finding and fixing bugs, profiling performance, and optimizing algorithms and data structures. Mastering these skills transforms good programmers into exceptional ones.
Why This Matters
Debugging and optimization skills are essential:
- Production systems: Bugs cause outages and data loss
- Performance: Slow code wastes resources and frustrates users
- Scalability: Inefficient algorithms don't scale
- Career: Senior developers excel at debugging complex issues
- Competitive programming: Optimization makes the difference
- Interviews: Optimization questions are common
How to Study This Chapter
- Practice systematic debugging - Follow methodical approaches
- Profile before optimizing - Measure, don't guess
- Learn debugging tools - GDB, Valgrind, profilers
- Study common bugs - Recognize patterns
- Benchmark code - Compare implementations
Debugging Strategies
Systematic Approach
1. Reproduce the bug reliably
2. Isolate the problem (binary search through code)
3. Form hypothesis about cause
4. Test hypothesis
5. Fix bug
6. Verify fix doesn't break anything else
7. Understand why bug occurred
Rubber Duck Debugging
Explain your code line-by-line to a rubber duck (or colleague). Often reveals the bug.
Print Debugging
Strategic print statements to trace execution.
#include <iostream>
#include <vector>
using namespace std;
void debugBinarySearch(vector
while (left <= right) {
iteration++;
int mid = left + (right - left) / 2;
cout << "Iteration " << iteration << ": "
<< "left=" << left << ", mid=" << mid
<< ", right=" << right
<< ", arr[mid]=" << arr[mid] << endl;
if (arr[mid] == target) {
cout << "Found at index " << mid << endl;
return;
}
if (arr[mid] < target) {
left = mid + 1;
} else {
right = mid - 1;
}
}
cout << "Not found" << endl;
}
int main() {
vector
Assertions
Verify assumptions during development.
#include <cassert>
#include <iostream>
#include <vector>
using namespace std;
int binarySearch(vector
int left = 0, right = arr.size() - 1;
while (left <= right) {
int mid = left + (right - left) / 2;
// Invariant: if target exists, it's in [left, right]
assert(left >= 0 && right < arr.size());
if (arr[mid] == target) {
return mid;
}
if (arr[mid] < target) {
left = mid + 1;
} else {
right = mid - 1;
}
}
return -1;
}
int main() {
vector
Debugger (GDB Example)
# Compile with debug symbols
g++ -g program.cpp -o program
Run in GDB
gdb ./program
Common GDB commands:
break main # Set breakpoint at main run # Start program next # Next line (step over) step # Step into function print variable # Print variable value backtrace # Show call stack continue # Continue execution quit # Exit GDB
Common Bug Patterns
Off-by-One Errors
// Wrong: Misses last element
for (int i = 0; i < arr.size() - 1; i++) { // Bug!
process(arr[i]);
}
// Correct for (int i = 0; i < arr.size(); i++) { process(arr[i]); }
// Wrong: Array access out of bounds int mid = (left + right) / 2; // Can overflow!
// Correct int mid = left + (right - left) / 2;
Null Pointer Dereferencing
#include <iostream>
using namespace std;
struct Node { int data; Node* next; };
// Buggy version int getLength(Node* head) { int count = 0; while (head->next != nullptr) { // Bug: What if head is nullptr? count++; head = head->next; } return count; }
// Fixed version int getLengthFixed(Node* head) { if (head == nullptr) return 0; // Check null first!
int count = 0;
while (head != nullptr) {
count++;
head = head->next;
}
return count;
}
int main() { Node* emptyList = nullptr; cout << "Length: " << getLengthFixed(emptyList) << endl; return 0; }
Memory Leaks
#include <iostream>
using namespace std;
// Buggy: Memory leak int* createArray() { int* arr = new int[100]; // ... use array ... return arr; // Caller must remember to delete[]! }
// Better: Use RAII
#include
// C version with manual cleanup void buggyFunction() { int* arr = (int*)malloc(100 * sizeof(int)); // ... use array ... // Forgot to free! Memory leak }
void fixedFunction() { int* arr = (int*)malloc(100 * sizeof(int)); if (arr == NULL) return;
// ... use array ...
free(arr); // Always free!
}
Integer Overflow
#include <iostream>
#include <limits>
using namespace std;
// Buggy: Integer overflow int sumArray(int arr[], int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; // Can overflow! } return sum; }
// Fixed: Use larger type long long sumArrayFixed(int arr[], int n) { long long sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; } return sum; }
// Check for overflow bool willOverflow(int a, int b) { if (a > 0 && b > 0 && a > INT_MAX - b) return true; if (a < 0 && b < 0 && a < INT_MIN - b) return true; return false; }
int main() { int arr[] = {INT_MAX, 1}; cout << "Buggy sum: " << sumArray(arr, 2) << endl; // Overflow! cout << "Fixed sum: " << sumArrayFixed(arr, 2) << endl;
if (willOverflow(INT_MAX, 1)) {
cout << "Overflow detected!" << endl;
}
return 0;
}
Performance Profiling
Timing Code
#include <iostream>
#include <chrono>
#include <vector>
using namespace std;
using namespace chrono;
void bubbleSort(vector
void measureTime(void (*func)(vector
func(arr);
auto end = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(end - start);
cout << name << " took " << duration.count() << " microseconds" << endl;
}
int main() {
vector
measureTime(bubbleSort, data, "Bubble Sort");
return 0;
}
Profiling with gprof
# Compile with profiling enabled
g++ -pg program.cpp -o program
Run program
./program
Generate profile report
gprof program gmon.out > analysis.txt
View hotspots
less analysis.txt
Memory Profiling with Valgrind
# Check for memory leaks
valgrind --leak-check=full ./program
Profile memory usage
valgrind --tool=massif ./program
ms_print massif.out.
Detect cache misses
valgrind --tool=cachegrind ./program
cg_annotate cachegrind.out.
Optimization Techniques
Algorithm Selection
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
using namespace std;
using namespace chrono;
// O(n²) - Slow for large n
bool hasDuplicateSlow(vector
// O(n log n) - Much faster
bool hasDuplicateFast(vector
for (int i = 0; i < sorted.size() - 1; i++) {
if (sorted[i] == sorted[i + 1]) return true;
}
return false;
}
int main() {
vector
auto start = high_resolution_clock::now();
hasDuplicateSlow(data);
auto end = high_resolution_clock::now();
cout << "Slow: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;
start = high_resolution_clock::now();
hasDuplicateFast(data);
end = high_resolution_clock::now();
cout << "Fast: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;
return 0;
}
Data Structure Selection
#include <iostream>
#include <vector>
#include <unordered_set>
#include <chrono>
using namespace std;
using namespace chrono;
// Vector: O(n) search
bool containsVector(vector
// Unordered set: O(1) average search
bool containsSet(unordered_set
int main() { const int SIZE = 100000;
vector<int> vec;
unordered_set<int> set;
for (int i = 0; i < SIZE; i++) {
vec.push_back(i);
set.insert(i);
}
auto start = high_resolution_clock::now();
for (int i = 0; i < 1000; i++) {
containsVector(vec, SIZE - 1); // Search for last element
}
auto end = high_resolution_clock::now();
cout << "Vector: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;
start = high_resolution_clock::now();
for (int i = 0; i < 1000; i++) {
containsSet(set, SIZE - 1);
}
end = high_resolution_clock::now();
cout << "Set: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;
return 0;
}
Cache-Friendly Code
#include <iostream>
#include <vector>
#include <chrono>
using namespace std;
using namespace chrono;
// Cache-unfriendly: Column-major access
long sumColumnMajor(vector<vector
// Cache-friendly: Row-major access
long sumRowMajor(vector<vector
int main() {
const int SIZE = 1000;
vector<vector
auto start = high_resolution_clock::now();
long sum1 = sumColumnMajor(matrix);
auto end = high_resolution_clock::now();
cout << "Column-major: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;
start = high_resolution_clock::now();
long sum2 = sumRowMajor(matrix);
end = high_resolution_clock::now();
cout << "Row-major: " << duration_cast<milliseconds>(end - start).count() << "ms" << endl;
return 0;
}
Loop Optimization
// Inefficient: Function call in loop condition
for (int i = 0; i < vec.size(); i++) { // size() called every iteration
process(vec[i]);
}
// Better: Cache size int n = vec.size(); for (int i = 0; i < n; i++) { process(vec[i]); }
// Inefficient: Repeated computation for (int i = 0; i < n; i++) { result[i] = expensiveFunction() * data[i]; // Called n times! }
// Better: Hoist invariant int constant = expensiveFunction(); for (int i = 0; i < n; i++) { result[i] = constant * data[i]; }
Space-Time Tradeoff
// Fibonacci: Slow (exponential time)
int fibSlow(int n) {
if (n <= 1) return n;
return fibSlow(n - 1) + fibSlow(n - 2);
}
// Fibonacci: Fast (linear time, linear space)
int fibFast(int n, vector
// Fibonacci: Fast (linear time, constant space) int fibOptimal(int n) { if (n <= 1) return n; int prev2 = 0, prev1 = 1; for (int i = 2; i <= n; i++) { int curr = prev1 + prev2; prev2 = prev1; prev1 = curr; } return prev1; }
Optimization Rules
- Profile first - Don't guess where the bottleneck is
- Optimize the right thing - Focus on hotspots (80/20 rule)
- Choose right algorithm - O(n log n) beats O(n²) for large n
- Use appropriate data structures - Hash table vs array vs tree
- Reduce allocations - Reuse objects, reserve capacity
- Consider cache locality - Access memory sequentially
- Avoid premature optimization - "Root of all evil" - Donald Knuth
Common Mistakes
- Premature optimization - Optimize before profiling
- Optimizing wrong code - Not the bottleneck
- Breaking correctness - Optimization introduces bugs
- Micro-optimizations - Negligible impact
- Ignoring compiler optimizations - Compiler often does better
- Not testing after optimization - Verify correctness
- Over-engineering - Complexity without benefit
Debugging Tips
- Reproduce consistently - Can't fix what you can't reproduce
- Use version control - Git bisect to find regression
- Simplify test case - Minimal example that shows bug
- Read error messages carefully - They often tell you exactly what's wrong
- Check recent changes - Bug likely in new code
- Take breaks - Fresh eyes see bugs
- Ask for help - Pair debugging is powerful
Mini Exercises
- Add assertions to binary search
- Profile sorting algorithms
- Find and fix memory leaks with Valgrind
- Optimize matrix multiplication
- Benchmark hash table vs binary search tree
- Debug off-by-one error in array code
- Profile recursive vs iterative Fibonacci
- Fix integer overflow in sum function
- Optimize cache locality in 2D array access
- Measure impact of compiler optimization flags
Review Questions
- What's the difference between debugging and optimization?
- When should you optimize code?
- How do you profile a C++ program?
- What tools help find memory leaks?
- How does cache locality affect performance?
Reference Checklist
By the end of this chapter, you should be able to:
- Use systematic debugging strategies
- Apply debugging tools (GDB, Valgrind)
- Profile code to find bottlenecks
- Recognize common bug patterns
- Choose appropriate optimizations
- Measure performance improvements
- Use assertions effectively
- Optimize without breaking correctness
Next Steps
Chapter 20 concludes the course with practical projects that apply all the concepts learned. You'll build real-world applications using data structures and algorithms.
Key Takeaway: Effective debugging requires systematic approaches and the right tools. Profile before optimizing—measure, don't guess. Choose the right algorithm and data structure for the problem. Optimize hotspots, not the entire codebase. Maintain correctness while improving performance. Master these skills to write production-quality code.