Step 1️⃣: Importing Required Libraries
import numpy as np
➡ numpy is used for numerical operations (arrays, math, statistics).
import pandas as pd
➡ pandas is used to load and work with datasets in table format (rows & columns).
import matplotlib.pyplot as plt
➡ matplotlib helps in creating visual graphs and plots.
import seaborn as sns
➡ seaborn is used for advanced visualization; it makes graphs prettier and more readable.
Step 2️⃣: Load Dataset
df = pd.read_excel('/content/Fraud_Dataset_with_Issues.xlsx')
➡ Reads the Excel dataset and stores it in a DataFrame named df.
Step 3️⃣: Exploring the Dataset
df.head()
➡ Shows the first 5 rows of the dataset.
df.head(15)
➡ Shows the first 15 rows for more preview.
df.tail()
➡ Shows last 5 rows of the dataset.
df.tail(10)
➡ Shows last 10 rows of the dataset.
df.columns
➡ Lists all column names.
df.shape
, ➡ Shows number of rows and columns (format: rows, columns).
df.dtypes
➡ Displays the data types of each column (int, float, object/text).
df['CustomerLocation']
➡ Displays the “CustomerLocation” column values.
df['CustomerLocation'].unique()
➡ Shows unique values present in that column.
df.describe()
➡ Gives basic statistics summary like mean, min, max, std deviation for numeric columns.
Step 4️⃣: Handling Missing Values
df.isnull().sum()
➡ Counts missing (NULL) values in each column.
df = df.dropna()
➡ Removes all rows containing missing data.
df.shape
➡ Check new dataset size after removing missing values.
df.describe()
➡ New statistical summary after cleaning.
Step 5️⃣: Convert Text Labels to Numbers (Encoding)
df['CustomerLocation'] = df['CustomerLocation'].map({'Foreign': 1, 'Local': 0})
➡ Converts text categories to numbers because ML models require numeric input.
df.head()
➡ Check updated dataset.
df['CustomerLocation'].unique()
➡ Confirm conversion worked (should show only 0 and 1).
df['CustomerLocation'].value_counts()