## โญ 1. Definitions: The Building Blocks
* **๐ฏ Pandas**: Think of this as "Excel on Steroids" for Python. It is a library used to analyze and manipulate data at lightning speed.
* **๐ Series**: A **1-D labeled array**. Imagine a single column in a spreadsheet.
* **๐ DataFrame**: A **2-D labeled data structure**. Imagine a whole table with rows and columns.
* **๐ท๏ธ Index**: The "address" of your data. These are labels for rows to help you find data instantly.
* **โก Vectorization**: The superpower of Pandas. It performs math on the *entire* column at once without using slow `for` loops.
-----
## โญ 2. Concepts Explained
### ๐ฅ 2.1 Introduction
Pandas is built on top of **NumPy**. It is the best friend of Data Scientists.
**Why use it?**
* โ **Cleaning:** Fix messy data (like missing values).
* โ **Filtering:** Find exactly what you need (e.g., "Show me students with \> 90 marks").
* โ **Visualization:** Works with Matplotlib to draw graphs.
### ๐ 2.2 Using Pandas
To use it, we must import it first. We give it a nickname `pd` so we don't have to type "pandas" every time.
```python
import pandas as pd
```
-----
## โญ 2.3 Pandas Data Structures
Pandas gives us two main containers:
1. **Series (1-D)** โก๏ธ Like a List, but smarter.
2. **DataFrame (2-D)** โก๏ธ Like a Table.
-----
## โญ 2.4 Series โ The Smart Column ๐
A Series stores data + labels.
### ๐ Method 1: Creating from a List
Basic creation. If you don't give an index, Pandas counts from 0, 1, 2...
```python
import pandas as pd
marks = [85, 90, 78]
s = pd.Series(marks)
print(s)
```
**๐ฅ๏ธ Output:**
```text
0 85
1 90
2 78
dtype: int64
```
### ๐ Method 2: Creating with Custom Index
You can name your rows\!
```python
students = ['Amit', 'Neha', 'Raj']
s = pd.Series(marks, index=students)
print(s)
```
**๐ฅ๏ธ Output:**
```text
Amit 85
Neha 90
Raj 78
dtype: int64
```
### ๐ Method 3: Creating from Dictionary
Keys become Index, Values become Data.
```python
data = {'Math': 95, 'Sci': 88, 'Eng': 72}
s = pd.Series(data)
print(s)
```
**๐ฅ๏ธ Output:**
```text
Math 95
Sci 88
Eng 72
dtype: int64
```
### ๐ Method 4: Creating from Scalar (Constant)
Fills the whole series with the same number.
```python
s = pd.Series(50, index=['A', 'B', 'C'])
print(s)
```
**๐ฅ๏ธ Output:**
```text
A 50
B 50
C 50
dtype: int64
```
-----
## โญ 2.5 Series Attributes (Know Your Data)
Let's use this Series for examples:
`s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])`
| Attribute | Function | Example Output |
| :--- | :--- | :--- |
| **`s.values`** | Returns data as a NumPy array (removes labels). | `[10 20 30]` |
| **`s.index`** | Returns the index labels. | `Index(['a', 'b', 'c'], dtype='object')` |
| **`s.dtype`** | Returns the data type of values. | `int64` |
| **`s.size`** | Counts total elements (including empty/NaN). | `3` |
| **`s.shape`** | Returns dimensions (Rows, ). | `(3,)` |
| **`s.nbytes`** | Returns memory usage in bytes. | `24` |
| **`s.empty`** | Returns True if Series is empty. | `False` |
-----
## โญ 2.6 Operations on Series ๐ ๏ธ
### โ 1. Math Operations (Vectorization)
Math happens on every element automatically\!
```python
s = pd.Series([10, 20, 30])
print(s + 5)
```
**๐ฅ๏ธ Output:**
```text
0 15
1 25
2 35
dtype: int64
```
### ๐ 2. Filtering
Select data based on conditions.
```python
print(s[s > 15])
```
**๐ฅ๏ธ Output:**
```text
1 20
2 30
dtype: int64
```
### ๐ 3. Sorting
* `sort_values()`: Sorts by data.
* `sort_index()`: Sorts by labels.
```python
s = pd.Series([30, 10, 20], index=['c', 'a', 'b'])
print(s.sort_values())
```
**๐ฅ๏ธ Output:**
```text
a 10
b 20
c 30
dtype: int64
```
-----
# ๐ฆ 2.7 DataFrame โ Your Smart Table ๐
A DataFrame is a collection of Series sharing the same index.
## โญ 2.8 Creating DataFrames
### Method 1: Dictionary of Lists (Most Common)
Keys become **Columns**.
```python
data = {
'Name': ['Amit', 'Neha'],
'Marks': [85, 90]
}
df = pd.DataFrame(data)
print(df)
```
**๐ฅ๏ธ Output:**
```text
Name Marks
0 Amit 85
1 Neha 90
```
### Method 2: List of Dictionaries
Keys become **Columns**.
```python
data = [
{'Name': 'Amit', 'Marks': 85},
{'Name': 'Neha', 'Marks': 90}
]
df = pd.DataFrame(data)
print(df)
```
**๐ฅ๏ธ Output:**
```text
Name Marks
0 Amit 85
1 Neha 90
```
-----
## โญ 2.9 DataFrame Attributes & Methods
Let's assume `df` is the student table above.
| Attribute/Method | Explanation | Output Example |
| :--- | :--- | :--- |
| **`df.shape`** | (Rows, Columns) | `(2, 2)` |
| **`df.columns`** | List of column names | `Index(['Name', 'Marks'], dtype='object')` |
| **`df.index`** | List of row labels | `RangeIndex(start=0, stop=2, step=1)` |
| **`df.T`** | Transpose (Swaps Rows & Cols) | (Table flips sideways) |
| **`df.head(n)`** | Shows top `n` rows | First 5 rows (default) |
| **`df.tail(n)`** | Shows bottom `n` rows | Last 5 rows (default) |
| **`df.info()`** | Summary of data types & nulls | (Technical summary) |
| **`df.describe()`** | Statistical summary (mean, max, min) | (Math stats table) |
-----
## โญ 2.10 Accessing Data (The Most Important Part\!) ๐ฏ
This is where students make the most mistakes. Pay attention\!
### 1๏ธโฃ Selecting Columns
```python
print(df['Name'])
```
**Output:** Returns a Series of names.
### 2๏ธโฃ Selecting Rows: `loc` vs `iloc`
| Feature | **`loc`** (Label Based) | **`iloc`** (Integer Position Based) |
| :--- | :--- | :--- |
| **Uses** | Uses the *Name* of the index. | Uses the *Index Number* (0, 1, 2...). |
| **Slicing** | End value is **INCLUSIVE**. | End value is **EXCLUSIVE** (like Python lists). |
| **Syntax** | `df.loc['row_label']` | `df.iloc[row_index]` |
**Example:**
```python
df = pd.DataFrame({'A': [10, 20], 'B': [30, 40]}, index=['x', 'y'])
print(df.loc['x']) # Access row with label 'x'
print(df.iloc[0]) # Access 0th row (which is 'x')
```
**๐ฅ๏ธ Output (Same for both):**
```text
A 10
B 30
Name: x, dtype: int64
```
-----
## โญ 2.11 Modifying Data โ๏ธ
### โ Add a Column
Just treat it like a dictionary key assignment.
```python
df['Grade'] = ['A', 'B']
print(df)
```
**๐ฅ๏ธ Output:**
```text
Name Marks Grade
0 Amit 85 A
1 Neha 90 B
```
### โ Delete a Column (`drop`)
* `axis=0`: Delete Row
* `axis=1`: Delete Column
```python
# Create a new DF to avoid errors
df2 = df.drop('Grade', axis=1)
print(df2)
```
**๐ฅ๏ธ Output:**
```text
Name Marks
0 Amit 85
1 Neha 90
```
-----
## โญ 2.12 Statistical Functions ๐งฎ
These work on both Series and DataFrames.
| Function | Description | Code Example |
| :--- | :--- | :--- |
| `sum()` | Total sum of values | `df['Marks'].sum()` |
| `mean()` | Average value | `df['Marks'].mean()` |
| `max()` | Highest value | `df['Marks'].max()` |
| `min()` | Lowest value | `df['Marks'].min()` |
| `count()` | Count of non-empty values | `df['Marks'].count()` |
-----
# โ ๏ธ 3. Common Errors & Fixes (Don't do these\!)
1. **โ KeyError:**
* *Error:* `df['Names']` (when column is 'Name').
* *Fix:* Check spelling\! Python is case-sensitive. Use `df.columns` to check names.
2. **โ Confusing `loc` slicing:**
* *Error:* `df.loc[0:2]` expects labels 0, 1, and 2. It includes 2\!
* *Error:* `df.iloc[0:2]` gives indices 0 and 1. It excludes 2\!
3. **โ Forgetting Brackets:**
* *Error:* `df['Name', 'Marks']`
* *Fix:* If selecting multiple columns, use **double brackets**: `df[['Name', 'Marks']]` (You are passing a list of columns).
-----
# ๐ฏ 4. Common Board Questions
* Create Series of subjects and marks
* Create DataFrame with 3 columns
* Display rows with marks > 75
* Add Grade column
* Delete a row/column
* Difference: loc vs iloc
* Sort DataFrame
* Find max/min/mean
* **๐ฏ Pandas**: Think of this as "Excel on Steroids" for Python. It is a library used to analyze and manipulate data at lightning speed.
* **๐ Series**: A **1-D labeled array**. Imagine a single column in a spreadsheet.
* **๐ DataFrame**: A **2-D labeled data structure**. Imagine a whole table with rows and columns.
* **๐ท๏ธ Index**: The "address" of your data. These are labels for rows to help you find data instantly.
* **โก Vectorization**: The superpower of Pandas. It performs math on the *entire* column at once without using slow `for` loops.
-----
## โญ 2. Concepts Explained
### ๐ฅ 2.1 Introduction
Pandas is built on top of **NumPy**. It is the best friend of Data Scientists.
**Why use it?**
* โ **Cleaning:** Fix messy data (like missing values).
* โ **Filtering:** Find exactly what you need (e.g., "Show me students with \> 90 marks").
* โ **Visualization:** Works with Matplotlib to draw graphs.
### ๐ 2.2 Using Pandas
To use it, we must import it first. We give it a nickname `pd` so we don't have to type "pandas" every time.
```python
import pandas as pd
```
-----
## โญ 2.3 Pandas Data Structures
Pandas gives us two main containers:
1. **Series (1-D)** โก๏ธ Like a List, but smarter.
2. **DataFrame (2-D)** โก๏ธ Like a Table.
-----
## โญ 2.4 Series โ The Smart Column ๐
A Series stores data + labels.
### ๐ Method 1: Creating from a List
Basic creation. If you don't give an index, Pandas counts from 0, 1, 2...
```python
import pandas as pd
marks = [85, 90, 78]
s = pd.Series(marks)
print(s)
```
**๐ฅ๏ธ Output:**
```text
0 85
1 90
2 78
dtype: int64
```
### ๐ Method 2: Creating with Custom Index
You can name your rows\!
```python
students = ['Amit', 'Neha', 'Raj']
s = pd.Series(marks, index=students)
print(s)
```
**๐ฅ๏ธ Output:**
```text
Amit 85
Neha 90
Raj 78
dtype: int64
```
### ๐ Method 3: Creating from Dictionary
Keys become Index, Values become Data.
```python
data = {'Math': 95, 'Sci': 88, 'Eng': 72}
s = pd.Series(data)
print(s)
```
**๐ฅ๏ธ Output:**
```text
Math 95
Sci 88
Eng 72
dtype: int64
```
### ๐ Method 4: Creating from Scalar (Constant)
Fills the whole series with the same number.
```python
s = pd.Series(50, index=['A', 'B', 'C'])
print(s)
```
**๐ฅ๏ธ Output:**
```text
A 50
B 50
C 50
dtype: int64
```
-----
## โญ 2.5 Series Attributes (Know Your Data)
Let's use this Series for examples:
`s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])`
| Attribute | Function | Example Output |
| :--- | :--- | :--- |
| **`s.values`** | Returns data as a NumPy array (removes labels). | `[10 20 30]` |
| **`s.index`** | Returns the index labels. | `Index(['a', 'b', 'c'], dtype='object')` |
| **`s.dtype`** | Returns the data type of values. | `int64` |
| **`s.size`** | Counts total elements (including empty/NaN). | `3` |
| **`s.shape`** | Returns dimensions (Rows, ). | `(3,)` |
| **`s.nbytes`** | Returns memory usage in bytes. | `24` |
| **`s.empty`** | Returns True if Series is empty. | `False` |
-----
## โญ 2.6 Operations on Series ๐ ๏ธ
### โ 1. Math Operations (Vectorization)
Math happens on every element automatically\!
```python
s = pd.Series([10, 20, 30])
print(s + 5)
```
**๐ฅ๏ธ Output:**
```text
0 15
1 25
2 35
dtype: int64
```
### ๐ 2. Filtering
Select data based on conditions.
```python
print(s[s > 15])
```
**๐ฅ๏ธ Output:**
```text
1 20
2 30
dtype: int64
```
### ๐ 3. Sorting
* `sort_values()`: Sorts by data.
* `sort_index()`: Sorts by labels.
```python
s = pd.Series([30, 10, 20], index=['c', 'a', 'b'])
print(s.sort_values())
```
**๐ฅ๏ธ Output:**
```text
a 10
b 20
c 30
dtype: int64
```
-----
# ๐ฆ 2.7 DataFrame โ Your Smart Table ๐
A DataFrame is a collection of Series sharing the same index.
## โญ 2.8 Creating DataFrames
### Method 1: Dictionary of Lists (Most Common)
Keys become **Columns**.
```python
data = {
'Name': ['Amit', 'Neha'],
'Marks': [85, 90]
}
df = pd.DataFrame(data)
print(df)
```
**๐ฅ๏ธ Output:**
```text
Name Marks
0 Amit 85
1 Neha 90
```
### Method 2: List of Dictionaries
Keys become **Columns**.
```python
data = [
{'Name': 'Amit', 'Marks': 85},
{'Name': 'Neha', 'Marks': 90}
]
df = pd.DataFrame(data)
print(df)
```
**๐ฅ๏ธ Output:**
```text
Name Marks
0 Amit 85
1 Neha 90
```
-----
## โญ 2.9 DataFrame Attributes & Methods
Let's assume `df` is the student table above.
| Attribute/Method | Explanation | Output Example |
| :--- | :--- | :--- |
| **`df.shape`** | (Rows, Columns) | `(2, 2)` |
| **`df.columns`** | List of column names | `Index(['Name', 'Marks'], dtype='object')` |
| **`df.index`** | List of row labels | `RangeIndex(start=0, stop=2, step=1)` |
| **`df.T`** | Transpose (Swaps Rows & Cols) | (Table flips sideways) |
| **`df.head(n)`** | Shows top `n` rows | First 5 rows (default) |
| **`df.tail(n)`** | Shows bottom `n` rows | Last 5 rows (default) |
| **`df.info()`** | Summary of data types & nulls | (Technical summary) |
| **`df.describe()`** | Statistical summary (mean, max, min) | (Math stats table) |
-----
## โญ 2.10 Accessing Data (The Most Important Part\!) ๐ฏ
This is where students make the most mistakes. Pay attention\!
### 1๏ธโฃ Selecting Columns
```python
print(df['Name'])
```
**Output:** Returns a Series of names.
### 2๏ธโฃ Selecting Rows: `loc` vs `iloc`
| Feature | **`loc`** (Label Based) | **`iloc`** (Integer Position Based) |
| :--- | :--- | :--- |
| **Uses** | Uses the *Name* of the index. | Uses the *Index Number* (0, 1, 2...). |
| **Slicing** | End value is **INCLUSIVE**. | End value is **EXCLUSIVE** (like Python lists). |
| **Syntax** | `df.loc['row_label']` | `df.iloc[row_index]` |
**Example:**
```python
df = pd.DataFrame({'A': [10, 20], 'B': [30, 40]}, index=['x', 'y'])
print(df.loc['x']) # Access row with label 'x'
print(df.iloc[0]) # Access 0th row (which is 'x')
```
**๐ฅ๏ธ Output (Same for both):**
```text
A 10
B 30
Name: x, dtype: int64
```
-----
## โญ 2.11 Modifying Data โ๏ธ
### โ Add a Column
Just treat it like a dictionary key assignment.
```python
df['Grade'] = ['A', 'B']
print(df)
```
**๐ฅ๏ธ Output:**
```text
Name Marks Grade
0 Amit 85 A
1 Neha 90 B
```
### โ Delete a Column (`drop`)
* `axis=0`: Delete Row
* `axis=1`: Delete Column
```python
# Create a new DF to avoid errors
df2 = df.drop('Grade', axis=1)
print(df2)
```
**๐ฅ๏ธ Output:**
```text
Name Marks
0 Amit 85
1 Neha 90
```
-----
## โญ 2.12 Statistical Functions ๐งฎ
These work on both Series and DataFrames.
| Function | Description | Code Example |
| :--- | :--- | :--- |
| `sum()` | Total sum of values | `df['Marks'].sum()` |
| `mean()` | Average value | `df['Marks'].mean()` |
| `max()` | Highest value | `df['Marks'].max()` |
| `min()` | Lowest value | `df['Marks'].min()` |
| `count()` | Count of non-empty values | `df['Marks'].count()` |
-----
# โ ๏ธ 3. Common Errors & Fixes (Don't do these\!)
1. **โ KeyError:**
* *Error:* `df['Names']` (when column is 'Name').
* *Fix:* Check spelling\! Python is case-sensitive. Use `df.columns` to check names.
2. **โ Confusing `loc` slicing:**
* *Error:* `df.loc[0:2]` expects labels 0, 1, and 2. It includes 2\!
* *Error:* `df.iloc[0:2]` gives indices 0 and 1. It excludes 2\!
3. **โ Forgetting Brackets:**
* *Error:* `df['Name', 'Marks']`
* *Fix:* If selecting multiple columns, use **double brackets**: `df[['Name', 'Marks']]` (You are passing a list of columns).
-----
# ๐ฏ 4. Common Board Questions
* Create Series of subjects and marks
* Create DataFrame with 3 columns
* Display rows with marks > 75
* Add Grade column
* Delete a row/column
* Difference: loc vs iloc
* Sort DataFrame
* Find max/min/mean