CSIP12.in
Back to List
Calculating...
UNIT 1 : CH 1 Dec 11, 2025

๐Ÿ“˜ Data Handling Using Pandas โ€“ I

## โญ 1. Definitions: The Building Blocks

* **๐ŸŽฏ Pandas**: Think of this as "Excel on Steroids" for Python. It is a library used to analyze and manipulate data at lightning speed.
* **๐Ÿ“Œ Series**: A **1-D labeled array**. Imagine a single column in a spreadsheet.
* **๐Ÿ“Š DataFrame**: A **2-D labeled data structure**. Imagine a whole table with rows and columns.
* **๐Ÿท๏ธ Index**: The "address" of your data. These are labels for rows to help you find data instantly.
* **โšก Vectorization**: The superpower of Pandas. It performs math on the *entire* column at once without using slow `for` loops.

-----

## โญ 2. Concepts Explained

### ๐Ÿ”ฅ 2.1 Introduction

Pandas is built on top of **NumPy**. It is the best friend of Data Scientists.

**Why use it?**

* โœ” **Cleaning:** Fix messy data (like missing values).
* โœ” **Filtering:** Find exactly what you need (e.g., "Show me students with \> 90 marks").
* โœ” **Visualization:** Works with Matplotlib to draw graphs.

### ๐Ÿ”Œ 2.2 Using Pandas

To use it, we must import it first. We give it a nickname `pd` so we don't have to type "pandas" every time.

```python
import pandas as pd
```

-----

## โญ 2.3 Pandas Data Structures

Pandas gives us two main containers:

1. **Series (1-D)** โžก๏ธ Like a List, but smarter.
2. **DataFrame (2-D)** โžก๏ธ Like a Table.

-----

## โญ 2.4 Series โ€” The Smart Column ๐Ÿš€

A Series stores data + labels.

### ๐Ÿ“ Method 1: Creating from a List

Basic creation. If you don't give an index, Pandas counts from 0, 1, 2...

```python
import pandas as pd
marks = [85, 90, 78]
s = pd.Series(marks)
print(s)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
0 85
1 90
2 78
dtype: int64
```

### ๐Ÿ“ Method 2: Creating with Custom Index

You can name your rows\!

```python
students = ['Amit', 'Neha', 'Raj']
s = pd.Series(marks, index=students)
print(s)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
Amit 85
Neha 90
Raj 78
dtype: int64
```

### ๐Ÿ“ Method 3: Creating from Dictionary

Keys become Index, Values become Data.

```python
data = {'Math': 95, 'Sci': 88, 'Eng': 72}
s = pd.Series(data)
print(s)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
Math 95
Sci 88
Eng 72
dtype: int64
```

### ๐Ÿ“ Method 4: Creating from Scalar (Constant)

Fills the whole series with the same number.

```python
s = pd.Series(50, index=['A', 'B', 'C'])
print(s)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
A 50
B 50
C 50
dtype: int64
```

-----

## โญ 2.5 Series Attributes (Know Your Data)

Let's use this Series for examples:
`s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])`

| Attribute | Function | Example Output |
| :--- | :--- | :--- |
| **`s.values`** | Returns data as a NumPy array (removes labels). | `[10 20 30]` |
| **`s.index`** | Returns the index labels. | `Index(['a', 'b', 'c'], dtype='object')` |
| **`s.dtype`** | Returns the data type of values. | `int64` |
| **`s.size`** | Counts total elements (including empty/NaN). | `3` |
| **`s.shape`** | Returns dimensions (Rows, ). | `(3,)` |
| **`s.nbytes`** | Returns memory usage in bytes. | `24` |
| **`s.empty`** | Returns True if Series is empty. | `False` |

-----

## โญ 2.6 Operations on Series ๐Ÿ› ๏ธ

### โž• 1. Math Operations (Vectorization)

Math happens on every element automatically\!

```python
s = pd.Series([10, 20, 30])
print(s + 5)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
0 15
1 25
2 35
dtype: int64
```

### ๐Ÿ” 2. Filtering

Select data based on conditions.

```python
print(s[s > 15])
```

**๐Ÿ–ฅ๏ธ Output:**

```text
1 20
2 30
dtype: int64
```

### ๐Ÿ”ƒ 3. Sorting

* `sort_values()`: Sorts by data.
* `sort_index()`: Sorts by labels.



```python
s = pd.Series([30, 10, 20], index=['c', 'a', 'b'])
print(s.sort_values())
```

**๐Ÿ–ฅ๏ธ Output:**

```text
a 10
b 20
c 30
dtype: int64
```

-----

# ๐ŸŸฆ 2.7 DataFrame โ€” Your Smart Table ๐Ÿ“Š

A DataFrame is a collection of Series sharing the same index.

## โญ 2.8 Creating DataFrames

### Method 1: Dictionary of Lists (Most Common)

Keys become **Columns**.

```python
data = {
'Name': ['Amit', 'Neha'],
'Marks': [85, 90]
}
df = pd.DataFrame(data)
print(df)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
Name Marks
0 Amit 85
1 Neha 90
```

### Method 2: List of Dictionaries

Keys become **Columns**.

```python
data = [
{'Name': 'Amit', 'Marks': 85},
{'Name': 'Neha', 'Marks': 90}
]
df = pd.DataFrame(data)
print(df)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
Name Marks
0 Amit 85
1 Neha 90
```

-----

## โญ 2.9 DataFrame Attributes & Methods

Let's assume `df` is the student table above.

| Attribute/Method | Explanation | Output Example |
| :--- | :--- | :--- |
| **`df.shape`** | (Rows, Columns) | `(2, 2)` |
| **`df.columns`** | List of column names | `Index(['Name', 'Marks'], dtype='object')` |
| **`df.index`** | List of row labels | `RangeIndex(start=0, stop=2, step=1)` |
| **`df.T`** | Transpose (Swaps Rows & Cols) | (Table flips sideways) |
| **`df.head(n)`** | Shows top `n` rows | First 5 rows (default) |
| **`df.tail(n)`** | Shows bottom `n` rows | Last 5 rows (default) |
| **`df.info()`** | Summary of data types & nulls | (Technical summary) |
| **`df.describe()`** | Statistical summary (mean, max, min) | (Math stats table) |

-----

## โญ 2.10 Accessing Data (The Most Important Part\!) ๐ŸŽฏ

This is where students make the most mistakes. Pay attention\!

### 1๏ธโƒฃ Selecting Columns

```python
print(df['Name'])
```

**Output:** Returns a Series of names.

### 2๏ธโƒฃ Selecting Rows: `loc` vs `iloc`

| Feature | **`loc`** (Label Based) | **`iloc`** (Integer Position Based) |
| :--- | :--- | :--- |
| **Uses** | Uses the *Name* of the index. | Uses the *Index Number* (0, 1, 2...). |
| **Slicing** | End value is **INCLUSIVE**. | End value is **EXCLUSIVE** (like Python lists). |
| **Syntax** | `df.loc['row_label']` | `df.iloc[row_index]` |

**Example:**

```python
df = pd.DataFrame({'A': [10, 20], 'B': [30, 40]}, index=['x', 'y'])

print(df.loc['x']) # Access row with label 'x'
print(df.iloc[0]) # Access 0th row (which is 'x')
```

**๐Ÿ–ฅ๏ธ Output (Same for both):**

```text
A 10
B 30
Name: x, dtype: int64
```

-----

## โญ 2.11 Modifying Data โœ๏ธ

### โž• Add a Column

Just treat it like a dictionary key assignment.

```python
df['Grade'] = ['A', 'B']
print(df)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
Name Marks Grade
0 Amit 85 A
1 Neha 90 B
```

### โŒ Delete a Column (`drop`)

* `axis=0`: Delete Row
* `axis=1`: Delete Column



```python
# Create a new DF to avoid errors
df2 = df.drop('Grade', axis=1)
print(df2)
```

**๐Ÿ–ฅ๏ธ Output:**

```text
Name Marks
0 Amit 85
1 Neha 90
```

-----

## โญ 2.12 Statistical Functions ๐Ÿงฎ

These work on both Series and DataFrames.

| Function | Description | Code Example |
| :--- | :--- | :--- |
| `sum()` | Total sum of values | `df['Marks'].sum()` |
| `mean()` | Average value | `df['Marks'].mean()` |
| `max()` | Highest value | `df['Marks'].max()` |
| `min()` | Lowest value | `df['Marks'].min()` |
| `count()` | Count of non-empty values | `df['Marks'].count()` |

-----

# โš ๏ธ 3. Common Errors & Fixes (Don't do these\!)

1. **โŒ KeyError:**

* *Error:* `df['Names']` (when column is 'Name').
* *Fix:* Check spelling\! Python is case-sensitive. Use `df.columns` to check names.

2. **โŒ Confusing `loc` slicing:**

* *Error:* `df.loc[0:2]` expects labels 0, 1, and 2. It includes 2\!
* *Error:* `df.iloc[0:2]` gives indices 0 and 1. It excludes 2\!

3. **โŒ Forgetting Brackets:**

* *Error:* `df['Name', 'Marks']`
* *Fix:* If selecting multiple columns, use **double brackets**: `df[['Name', 'Marks']]` (You are passing a list of columns).

-----

# ๐ŸŽฏ 4. Common Board Questions

* Create Series of subjects and marks
* Create DataFrame with 3 columns
* Display rows with marks > 75
* Add Grade column
* Delete a row/column
* Difference: loc vs iloc
* Sort DataFrame
* Find max/min/mean