Python Pandas Tutorial in Hindi
Python Pandas Tutorial in Hindi - CodeWithHarry
1. Summary
This comprehensive tutorial by CodeWithHarry introduces Python's Pandas library for data analysis, specifically targeting beginners. The video covers the fundamental concepts of Pandas, including DataFrames and Series, and demonstrates how to create, manipulate, analyze, and import/export data in various formats like CSV and Excel. It emphasizes Pandas as a crucial tool for Exploratory Data Analysis (EDA) for aspiring data scientists. The tutorial is presented in Hindi and aims to equip viewers with the necessary skills to start working with data in Python.
2. Key Takeaways
* **Pandas for Data Analysis:** Pandas is a powerful Python library essential for data manipulation and analysis, forming a strong foundation for Exploratory Data Analysis (EDA).
* **Core Data Structures:** The tutorial focuses on two primary Pandas data structures:
* **Series:** A one-dimensional labeled array capable of holding any data type.
* **DataFrame:** A two-dimensional labeled data structure with columns of potentially different types, similar to a spreadsheet or SQL table.
* **Data Creation:** Learn to create DataFrames from various sources, including dictionaries and lists.
* **Data Inspection:** Understand how to view and inspect your data, including checking the head, tail, info, and descriptive statistics.
* **Data Selection & Filtering:** Master techniques for selecting specific rows and columns, and filtering data based on conditions.
* **Data Manipulation:** Explore methods for adding, deleting, and modifying columns and rows.
* **Handling Missing Data:** Learn strategies for identifying and dealing with missing values (NaNs) in your datasets.
* **Importing & Exporting Data:** Practical guidance on reading data from and writing data to common file formats like CSV and Excel.
* **Beginner-Friendly:** The tutorial is designed for beginners with no prior Pandas experience, explained clearly in Hindi.
3. Detailed Notes
3.1. Introduction to Pandas
* **What is Pandas?**
* A powerful open-source Python library.
* Used for data manipulation and analysis.
* Provides high-performance, easy-to-use data structures and data analysis tools.
* Crucial for Exploratory Data Analysis (EDA) in Data Science.
* **Why use Pandas?**
* Efficiently handles large datasets.
* Simplifies complex data operations.
* Integrates well with other Python libraries (NumPy, Matplotlib, Scikit-learn).
3.2. Core Pandas Data Structures
* **Series:**
* One-dimensional array-like object.
* Has an index (labels for each element).
* Can hold any data type (integers, strings, floats, Python objects, etc.).
* **Example:** Creating a Series from a list.
```python
import pandas as pd
data = [1, 3, 5, np.nan, 6, 8] # np.nan for missing values
s = pd.Series(data)
print(s)
```
* **DataFrame:**
* Two-dimensional labeled data structure with columns of potentially different types.
* Think of it as a table or spreadsheet.
* Has both row and column indices.
* Can be thought of as a collection of Series sharing the same index.
* **Example:** Creating a DataFrame from a dictionary of lists.
```python
data = {'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']}
df = pd.DataFrame(data)
print(df)
```
3.3. Creating DataFrames
* **From a Dictionary:**
* Keys become column names.
* Values (lists or arrays) become column data.
* **From a List of Dictionaries:**
* Each dictionary represents a row.
* Keys from dictionaries become column names.
* **From NumPy Arrays:**
* Can be used to create DataFrames with specified column and index names.
3.4. Inspecting Data
* **Viewing Data:**
* `df.head(n)`: Displays the first `n` rows (default is 5).
* `df.tail(n)`: Displays the last `n` rows (default is 5).
* `df.sample(n)`: Displays `n` random rows.
* **Getting Information:**
* `df.info()`: Provides a concise summary of the DataFrame, including the index dtype and columns, non-null values, and memory usage.
* `df.describe()`: Generates descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset's distribution, excluding NaN values. Includes count, mean, std, min, 25%, 50%, 75%, max.
* **Shape:**
* `df.shape`: Returns a tuple representing the dimensionality of the DataFrame (rows, columns).
* **Data Types:**
* `df.dtypes`: Shows the data type of each column.
3.5. Selection and Indexing
* **Selecting Columns:**
* `df['column_name']`: Selects a single column as a Series.
* `df[['col1', 'col2']]`: Selects multiple columns as a DataFrame.
* **Selecting Rows (using `.loc` and `.iloc`):**
* `.loc[]`: Label-based indexing (uses index labels and column names).
* `df.loc[label]`
* `df.loc[[label1, label2]]`
* `df.loc[start_label:end_label]`
* `df.loc[rows, columns]` (e.g., `df.loc[:, ['col1', 'col2']]`)
* `.iloc[]`: Integer-location based indexing (uses integer positions).
* `df.iloc[0]`
* `df.iloc[[0, 1]]`
* `df.iloc[start_index:end_index]`
* `df.iloc[rows, columns]` (e.g., `df.iloc[:, [0, 1]]`)
* **Boolean Indexing (Filtering):**
* Selecting rows based on a condition.
* `df[df['column_name'] > value]`
* Chaining conditions with `&` (AND) and `|` (OR).
* `df[(df['col1'] > value1) & (df['col2'] == value2)]`
3.6. Data Manipulation
* **Adding Columns:**
* Assigning a Series or a scalar value to a new column name.
```python
df['new_column'] = df['existing_column'] * 2
df['constant_column'] = 10
```
* **Deleting Columns/Rows:**
* `df.drop('column_name', axis=1)`: Drops a column. `axis=1` specifies columns.
* `df.drop(index_label)`: Drops a row.
* Use `inplace=True` to modify the DataFrame directly.
* **Modifying Values:**
* Using `.loc` or `.iloc` to target specific cells or ranges and assign new values.
3.7. Handling Missing Data (NaN)
* **Identifying Missing Values:**
* `df.isnull()`: Returns a boolean DataFrame indicating `True` for missing values.
* `df.isnull().sum()`: Counts the number of missing values per column.
* **Dealing with Missing Values:**
* **Dropping:**
* `df.dropna()`: Drops rows or columns containing NaN values.
* `df.dropna(axis=1)`: Drops columns with NaN values.
* `df.dropna(how='all')`: Drops rows/columns only if ALL values are NaN.
* **Filling:**
* `df.fillna(value)`: Fills NaN values with a specified `value` (e.g., 0, mean, median).
* `df['column'].fillna(df['column'].mean(), inplace=True)`: Fills NaNs in a specific column with its mean.
* `df.ffill()` (forward fill) or `df.bfill()` (backward fill).
3.8. Importing and Exporting Data
* **Reading CSV:**
* `df = pd.read_csv('file.csv')`
* Common parameters: `sep`, `header`, `index_col`, `usecols`.
* **Writing CSV:**
* `df.to_csv('output.csv', index=False)`
* `index=False` prevents writing the DataFrame index as a column.
* **Reading Excel:**
* `df = pd.read_excel('file.xlsx')`
* Requires `openpyxl` or `xlrd` to be installed.
* Common parameters: `sheet_name`.
* **Writing Excel:**
* `df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)`
3.9. Further Topics (Brief Mention/Implied)
* Data Aggregation (groupby)
* Merging and Joining DataFrames
* Time Series Analysis
---
**Note:** The provided information is based on the description and title of the YouTube video. The detailed notes reflect common topics covered in an introductory Pandas tutorial. For the exact content and specific code examples, please refer to the actual video.
Related Summaries
Why this video matters
This video provides valuable insights into the topic. Our AI summary attempts to capture the core message, but for the full nuance and context, we highly recommend watching the original video from the creator.
Disclaimer: This content is an AI-generated summary of a public YouTube video. The views and opinions expressed in the original video belong to the content creator. YouTube Note is not affiliated with the video creator or YouTube.

![[캡컷PC]0015-복합클립만들기분리된영상 하나로 만들기](https://img.youtube.com/vi/qtUfil0xjCs/mqdefault.jpg)
