CSIP12.in
Back to List
Calculating...
UNIT 1 : CH 3 Dec 14, 2025

πŸ“Š Plotting with PyPlot

# Data Visualization using Matplotlib
## 1. πŸ“– Definitions & Key Terminology
* **Data Visualization:** The art of translating information into a visual context (maps, graphs) to help the human brain understand data and pull insights quickly. 🧠
* **Matplotlib:** A comprehensive **Python library** for creating static, animated, and interactive visualizations. It is the foundation for many other libraries (like Seaborn) and comes pre-installed with Anaconda. 🐍
* **Pyplot:** A specific *module* within Matplotlib (imported as `plt`) that mimics the interface of MATLAB, allowing users to create 2D plots easily.
* **Figure:** The "Canvas". The top-level container holding all plot elements (axes, titles, legends). πŸ–ΌοΈ
* **Axes:** The actual region where data is plotted. A figure can have multiple axes (subplots), but an axes belongs to only one figure.
* **Axis:** The number-lines that handle scales, limits, and ticks (marks). πŸ“
* **Marker:** A symbol (dot, star, square) representing a specific data point.
* **Legend:** The key that identifies what different colors or line styles represent. πŸ—ΊοΈ
* **Histogram:** A graph showing the **frequency distribution** of continuous data (grouped into "bins").
* **Box Plot:** Displays data distribution based on a five-number summary (Min, Q1, Median, Q3, Max). πŸ“¦

---

## 2. 🧠 Concepts & Architecture
### 2.1 Why Visualize Data?
In the era of **Big Data**, raw tables are hard to read. Visualization helps because:
1. **Better Analysis:** Reveals hidden trends and correlations. πŸ“‰
2. **Quick Action:** The brain processes visuals faster than text. ⚑
3. **Pattern Recognition:** Identifies seasonal trends or exponential growth.
4. **Error Spotting:** Visual outliers (spikes) help find bad data. πŸ›
5. **Business Insights:** Helps decision-makers grasp facts instantly. πŸ’Ό

### 2.2 The Matplotlib Architecture
Matplotlib has three layers:
1. **Backend Layer:** Renders the plot to screen or file.
2. **Artist Layer:** Contains visuals like titles, lines, and text.
3. **Scripting Layer (Pyplot):** The user-friendly interface for writing code.

### 2.2.1 Installation & Import
**Install:**

```bash
pip install matplotlib

```

**Standard Import:**

```python
import matplotlib.pyplot as plt

```

### 2.2.2 The Pyplot "State Machine"
Pyplot tracks the *current* figure and axes. Any command you type (like `plt.plot()`) applies to the currently active chart.

---

## 3. 🎨 Creating Charts
### 3.1 Line & Scatter Charts
#### πŸ“ˆ Line Chart (`plt.plot()`) - The default plot type. Best for showing **trends over time** (time-series).
* **Syntax:** `plt.plot(x, y)`
* **Note:** If you only provide one list `plt.plot(y)`, Matplotlib assumes they are Y-values and automatically generates X-values `[0, 1, 2...]`.

#### 🌌 Scatter Chart (`plt.scatter()`) - Displays individual data points **without connecting lines**.

* **Use Case:** Observing relationships or **correlations** between two variables (e.g., Study Hours vs. Marks).

### 3.2 Bar & Pie Charts
#### πŸ“Š Bar Chart (`plt.bar()`) - Used for comparing **categorical data** (discrete categories).

* **Vertical:** `plt.bar(x, height)`
* **Horizontal:** `plt.barh(y, width)` (Good for long category names).
* **Multiple Bars:** You must manually offset the X-coordinates so bars don't overlap.

#### πŸ₯§ Pie Chart (`plt.pie()`) - Shows numerical proportions (composition of a whole).

* **Key Params:** `autopct` (shows %), `explode` (highlights a slice).

### 3.3 Histograms & Box Plots
#### 🧱 Histogram (`plt.hist()`) - Shows frequency of **continuous** data.

* **Bins:** The ranges into which data is grouped.
* **Visual Distinction:** Unlike bar charts, histograms usually have **no gaps** between bars.

#### πŸ“¦ Box Plot (`plt.boxplot()`) - Visualizes statistical summary (IQR, Median, Outliers).

---

## 4. βš™οΈ Customization & Syntax
### 4.1 Anatomy Customization
| Function | Description |
| --- | --- |
| `plt.figure(figsize=(w,h))` | Sets chart size in inches. |
| `plt.title("Text")` | Adds a heading. |
| `plt.xlabel("Text")` | Labels the X-axis. |
| `plt.ylabel("Text")` | Labels the Y-axis. |
| `plt.grid(True)` | Turns on grid lines. πŸ•ΈοΈ |
| `plt.legend()` | Displays the legend (requires `label=` in plot). |
| `plt.savefig("name.png")` | Saves the chart. **Must be called BEFORE `show()**`. |

### 4.2 Style Parameters 🎨
**Common Color Codes:**
| Code | Color | Code | Color |
| :---: | :--- | :---: | :--- |
| `'b'` | πŸ”΅ Blue | `'r'` | πŸ”΄ Red |
| `'g'` | 🟒 Green | `'k'` | ⚫ Black |
| `'y'` | 🟑 Yellow | `'w'` | βšͺ White |

**Line Styles & Markers:**
| Style Code | Description | Marker | Description |
| :---: | :--- | :---: | :--- |
| `'-'` | Solid (Default) | `'o'` | Circle |
| `'--'` | Dashed | `'*'` | Star |
| `':'` | Dotted | `'s'` | Square |
| `'-.'` | Dash-dot | `'^'` | Triangle |

### 4.3 Plotting from Pandas 🐼
```python
dataframe.plot(kind='bar', x='col_name', y='col_name', color='red')

```

*Kinds:* `'line'`, `'bar'`, `'barh'`, `'hist'`, `'box'`, `'pie'`, `'scatter'`

---

## 5. πŸ’» Code Examples
### Example 1: Custom Line Chart
```python
import matplotlib.pyplot as plt

years = [2020, 2021, 2022, 2023]
sales = [5000, 7000, 6500, 8000]

# Customization: Red dashed line with circle markers
plt.plot(years, sales, color='r', linestyle='--', marker='o', label='Sales')

plt.title("Annual Sales Report")
plt.xlabel("Year")
plt.ylabel("Sales ($)")
plt.grid(True)
plt.legend()

plt.show()

```

### Example 2: Histogram (Frequency)
```python
import matplotlib.pyplot as plt

marks = [10, 15, 20, 20, 25, 30, 35, 40, 45, 50, 50, 55]

# Bins=5 groups the data into 5 ranges
plt.hist(marks, bins=5, edgecolor='black', color='cyan')

plt.title("Marks Distribution")
plt.ylabel("Number of Students")
plt.show()

```

---

## 6. πŸ†š Comparisons
### Line Chart vs. Scatter Plot
| Feature | πŸ“ˆ Line Chart | 🌌 Scatter Plot |
| --- | --- | --- |
| **Purpose** | Trends over time (continuity). | Correlation between variables. |
| **Connection** | Points connected by lines. | Standalone markers. |
| **Function** | `plt.plot()` | `plt.scatter()` |

### Bar Chart vs. Histogram
| Feature | πŸ“Š Bar Chart | 🧱 Histogram |
| --- | --- | --- |
| **Data Type** | Categorical / Discrete. | Continuous / Numerical. |
| **X-Axis** | Distinct Categories (e.g., Cities). | Numerical Bins (Ranges). |
| **Gaps** | Bars usually have gaps. | Bars touch (no gaps). |
| **Ordering** | Can be reordered. | Must follow numerical order. |

---

## 7. ⚠️ Common Errors & Troubleshooting
* **🚫 The Blank Image Error:**
* *Mistake:* Calling `plt.savefig()` *after* `plt.show()`.
* *Reason:* `show()` clears the canvas (flushes memory).
* *Fix:* **Always Save First!** `savefig` -> `show`.


* **πŸ“‰ Mismatched Lists:**
* *Error:* `ValueError: x and y must have same first dimension`.
* *Fix:* Ensure `len(x) == len(y)`.


* **😡 Confusing Bar Arguments:**
* `plt.plot(y)` works (auto-generates x).
* `plt.bar(y)` **fails**. You must provide `plt.bar(x, height)`.


* **πŸ–ŒοΈ Style Fail:**
* Trying to use `linestyle='--'` inside `plt.scatter()`. Scatter plots don't have lines!



---

## 8. πŸŽ“ Exam-Oriented Short Notes
* **Import:** `import matplotlib.pyplot as plt`
* **Order of Ops:** Define Data -> Plot Data -> Customize (Labels/Title) -> Save -> Show.
* **Grid:** `plt.grid(True)` is essential for reading graph values accurately.
* **Legend:** Requires `label='Name'` inside the plot function to work.
* **Case Sensitive:** `plt.plot()` is correct; `plt.Plot()` is wrong. ❌
* **Pandas:** `df.plot(kind='...')` is the fastest way to plot existing dataframes.

---