NANDHOO.

Using Libraries: NumPy, Pandas & Matplotlib

Using Libraries: NumPy, Pandas & Matplotlib


Python's greatest strength is its ecosystem of libraries. This chapter covers three essential libraries used in data science, analysis, and visualization: NumPy, Pandas, and Matplotlib.


Why This Chapter Matters


These three libraries are the foundation of Python's data science stack. Understanding them opens doors to machine learning, data analysis, scientific computing, and business intelligence.


NumPy — Numerical Computing


NumPy (Numerical Python) provides a fast, multi-dimensional array object called ndarray and hundreds of mathematical functions.


Installing NumPy


pip install numpy

Creating Arrays


import numpy as np

From a list

arr = np.array([1, 2, 3, 4, 5]) print(arr) # [1 2 3 4 5] print(arr.dtype) # int64 print(arr.shape) # (5,)


2D array (matrix)

matrix = np.array([[1, 2, 3], [4, 5, 6]]) print(matrix.shape) # (2, 3)


Convenience constructors

zeros = np.zeros((3, 4)) # 3x4 array of zeros ones = np.ones((2, 3)) # 2x3 array of ones identity = np.eye(3) # 3x3 identity matrix range_arr = np.arange(0, 10, 2) # array([0, 2, 4, 6, 8]) linspace = np.linspace(0, 1, 5) # 5 evenly spaced points 0 to 1 random_arr = np.random.rand(3, 3) # 3x3 random floats


Array Operations (Vectorized)


NumPy operations apply element-wise without loops — much faster than Python lists.


arr = np.array([1, 2, 3, 4, 5])

print(arr * 2) # [2 4 6 8 10] print(arr + 10) # [11 12 13 14 15] print(arr ** 2) # [1 4 9 16 25] print(arr > 3) # [False False False True True]


Element-wise operations between arrays

a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) print(a + b) # [5 7 9] print(a * b) # [4 10 18] print(np.dot(a, b)) # 32 (dot product)


Indexing and Slicing


arr = np.array([10, 20, 30, 40, 50])
print(arr[0])      # 10
print(arr[1:4])    # [20 30 40]
print(arr[-1])     # 50

Boolean indexing

print(arr[arr > 25]) # [30 40 50]


2D indexing

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print(matrix[0, :]) # first row: [1 2 3] print(matrix[:, 1]) # second column: [2 5 8] print(matrix[1, 2]) # row 1, col 2: 6


Useful Math Functions


arr = np.array([4, 9, 16, 25])
print(np.sqrt(arr))   # [2. 3. 4. 5.]
print(np.mean(arr))   # 13.5
print(np.std(arr))    # standard deviation
print(np.sum(arr))    # 54
print(np.min(arr))    # 4
print(np.max(arr))    # 25
print(np.sort(arr))   # sorts a copy

Pandas — Data Analysis


Pandas introduces two powerful data structures: Series (1D) and DataFrame (2D table).


Installing Pandas


pip install pandas

Series


A Series is a labeled 1D array.


import pandas as pd

scores = pd.Series([95, 88, 72, 91], index=["Asha", "Leo", "Mina", "Sam"]) print(scores) print(scores["Asha"]) # 95 print(scores[scores > 85]) # filter


DataFrame


A DataFrame is a 2D table — like a spreadsheet.


data = {
    "Name": ["Asha", "Leo", "Mina", "Sam"],
    "Score": [95, 88, 72, 91],
    "Grade": ["A", "B", "C", "A"]
}

df = pd.DataFrame(data) print(df) print(df.shape) # (4, 3) print(df.dtypes) # column types print(df.describe()) # stats summary print(df.head(2)) # first 2 rows print(df.tail(2)) # last 2 rows


Selecting Data


# Select a column
print(df["Name"])
print(df[["Name", "Score"]])   # multiple columns

Row selection

print(df.iloc[0]) # by integer position print(df.loc[0]) # by label (same here)


Conditional filtering

top = df[df["Score"] >= 90] print(top)


Adding and Modifying Columns


df["Passed"] = df["Score"] >= 60
df["Score_Boosted"] = df["Score"] + 5
df = df.drop(columns=["Score_Boosted"])
df = df.rename(columns={"Score": "Final Score"})

Handling Missing Data


import numpy as np

df.loc[2, "Score"] = np.nan # set a missing value print(df.isnull()) # boolean mask print(df.isnull().sum()) # count missing per column df_clean = df.dropna() # drop rows with any NaN df_filled = df.fillna(0) # fill missing with 0


Reading and Writing Files


# CSV
df = pd.read_csv("students.csv")
df.to_csv("output.csv", index=False)

Excel

df = pd.read_excel("data.xlsx")


JSON

df = pd.read_json("data.json")


Grouping and Aggregation


# Group by Grade and compute mean score
summary = df.groupby("Grade")["Score"].mean()
print(summary)

Multiple aggregations

summary2 = df.groupby("Grade").agg({"Score": ["mean", "max", "count"]})


Sorting


df_sorted = df.sort_values("Score", ascending=False)

Matplotlib — Data Visualization


Matplotlib is the foundational plotting library for Python.


Installing Matplotlib


pip install matplotlib

Line Plot


import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5] y = [10, 20, 15, 30, 25]


plt.plot(x, y, marker="o", color="blue", linestyle="--") plt.title("My Line Chart") plt.xlabel("X Axis") plt.ylabel("Y Axis") plt.grid(True) plt.savefig("chart.png") plt.show()


Bar Chart


names = ["Asha", "Leo", "Mina"]
scores = [95, 88, 72]

plt.bar(names, scores, color=["green", "orange", "red"]) plt.title("Student Scores") plt.ylabel("Score") plt.show()


Scatter Plot


import numpy as np

x = np.random.rand(50) y = np.random.rand(50)


plt.scatter(x, y, alpha=0.7, color="purple") plt.title("Scatter Plot") plt.show()


Histogram


data = np.random.randn(1000)
plt.hist(data, bins=30, color="teal", edgecolor="black")
plt.title("Distribution")
plt.show()

Subplots


fig, axes = plt.subplots(1, 2, figsize=(10, 4))

axes[0].plot([1, 2, 3], [10, 20, 15]) axes[0].set_title("Line")


axes[1].bar(["A", "B", "C"], [5, 10, 8]) axes[1].set_title("Bar")


plt.tight_layout() plt.show()


Putting It Together — A Mini Analysis


import pandas as pd
import matplotlib.pyplot as plt

Load data

df = pd.read_csv("sales.csv")


Clean

df = df.dropna(subset=["revenue"])


Analyze

monthly = df.groupby("month")["revenue"].sum()


Visualize

monthly.plot(kind="bar", color="steelblue") plt.title("Monthly Revenue") plt.xlabel("Month") plt.ylabel("Revenue ($)") plt.tight_layout() plt.savefig("revenue.png") plt.show()


Common Mistakes


  • forgetting to import numpy as np / import pandas as pd / import matplotlib.pyplot as plt
  • modifying a DataFrame column without understanding copy vs view (use .copy())
  • using a for loop over a DataFrame instead of vectorized operations
  • not calling plt.show() or plt.savefig() to see/save plots
  • ignoring SettingWithCopyWarning from Pandas

Mini Exercises


  1. Create a NumPy array of 1–20 and select all values greater than 10.
  2. Create a DataFrame from a dictionary of your choice and filter rows by a condition.
  3. Read a CSV file with Pandas and print the 5 rows with the highest values in one column.
  4. Plot a bar chart comparing at least 4 categories.
  5. Combine NumPy and Matplotlib to plot a sine wave.

Review Questions


  1. What is the key advantage of NumPy arrays over Python lists for math?
  2. What is the difference between iloc and loc in Pandas?
  3. How do you handle missing values in a Pandas DataFrame?
  4. What is groupby() used for?
  5. How do you save a Matplotlib figure to a file?

Reference Checklist


  • I can create and manipulate NumPy arrays
  • I can create DataFrames from dicts and CSVs
  • I can filter, sort, and group Pandas DataFrames
  • I can handle missing data with dropna() and fillna()
  • I can create line, bar, scatter, and histogram plots
  • I can save plots and build multi-panel figures with subplots