SYLLABUS: Manipulating and visualizing data with Pandas: defining data frames, Creating and
manipulating data frames, visualization with pandas
Matplotlib: Features of matplotlib, Anatomy and Customization of matplotlib plot, Plotting and
plot customization, Customizing a plot, Visualization examples,
Manipulating and Visualizing Data with Pandas
Pandas is a powerful Python library used for data manipulation, analysis, and visualization. It
provides easy-to-use data structures like Series and DataFrame, making it ideal for handling
structured data such as tables, CSV files, and Excel sheets. Pandas integrates well with
visualization libraries like Matplotlib and allows basic plotting directly.
Defining DataFrames in Pandas
In data analysis, information is usually available in the form of tables containing rows and
columns. Python’s Pandas library provides a powerful data structure called the DataFrame to
store, manipulate, and analyze such tabular data efficiently. DataFrames are widely used in data
science, machine learning, business analytics, and academic research.
DataFrame:
A DataFrame is a two-dimensional, mutable, and heterogeneous data structure in Pandas with
labeled rows and labeled columns.
● Two-dimensional → Data is arranged in rows and columns
● Mutable → Data can be changed after creation
● Heterogeneous → Each column can have a different data type
A DataFrame is similar to:
● An Excel spreadsheet
● A SQL table
● A CSV file
Structure of a DataFrame
A DataFrame consists of the following components:
1. Rows (Index):
Each row has a unique label called an index (0, 1, 2, … by default).
, 2. Columns:
Each column has a name and represents a specific attribute.
3. Data:
The actual values stored in the table.
4. Data Types:
Each column can have a different data type such as int, float, string, or boolean.
Ways to Define (Create) a DataFrame
1. Creating a DataFrame from a Dictionary
This is the most common method of creating a DataFrame.
● Dictionary keys become column names
● Dictionary values become column data
● Pandas automatically assigns row indices
Syntax
pd.DataFrame(dictionary)
Example : Student Data
import pandas as pd
data = {
'Roll_No': [101, 102, 103, 104],
'Name': ['Ravi', 'Sita', 'Kiran', 'Anu'],
'Marks': [75, 82, 90, 68]
}
df = pd.DataFrame(data)
print(df)
OUTPUT:
Roll_No Name Marks
0 101 Ravi 75
1 102 Sita 82
2 103 Kiran 90
, 3 104 Anu 68
2. Creating a DataFrame from a List of Lists
Each inner list represents a row in the DataFrame.
● Each row is represented by a list
● Column names are explicitly specified
● Useful when data is available in row format
CODE:
import pandas as pd
employees = [
[1, 'Asha', 35000],
[2, 'Bala', 42000],
[3, 'Charan', 39000]
]
df_emp = pd.DataFrame(employees, columns=['Emp_ID', 'Emp_Name', 'Salary'])
print(df_emp)
OUTPUT:
Emp_ID Emp _Name Salary
0 1 Asha 35000
1 2 Bala 42000
2 3 Charan 39000
3.Creating a DataFrame from a List of Dictionaries
Each dictionary represents one row.It is used in Survey data where each respondent gives
multiple attributes.
CODE:
import pandas as pd
data = [