Problem Definition:
○ Objective: Predict student final exam scores based on features
such as hours studied, attendance percentage, and previous exam
scores.
○ Ask: "Is there a pattern in the data that correlates hours studied,
attendance, and previous scores with the final exam score?"
○ Action: Students should understand that we are trying to predict
a numerical outcome (regression problem).
Data Collection:
○ Step 1: Define the necessary data features: hours studied,
attendance percentage, and previous exam scores.
Dataset: result.csv
Unit 2: Model Construction and Evaluation
Data Preprocessing:
○ Objective: Prepare the data for model training by splitting it into
training and testing sets.
○ Action:
■ Load the dataset.
■ Split the dataset into features (X) and target (y).
■ Use train_test_split() to divide the data into training and
testing sets.
○ Code:
, import pandas as pd
from sklearn.model_selection import train_test_split
# Load data
data = pd.read_csv('result.csv')
# Define features (X) and target (y)
X = data.drop(columns=['Final_exam_score'])
y = data['Final_exam_score']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.2, random_state=42)
Model Construction:
● Objective: Build and train a model using Random Forest Regression to
predict student exam scores.
● Action:
○ Define the model using RandomForestRegressor().
○ Train the model on the training data (X_train, y_train).
○ Use the trained model to make predictions on the testing data.
● Code
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error