Implemented feature engineering

This commit is contained in:
Drew Giffin
2025-10-20 18:30:48 -04:00
parent 5e375d1e6d
commit 01b815deeb
4 changed files with 49 additions and 12 deletions
+12 -2
View File
@@ -29,7 +29,6 @@ The target variable is the **stress level**, indicated as *low*, *moderate* or *
**Figures:**
![Feature Distributions Historgram](images/feature_distributions_histogram.png)
![Scatter Plot Matrix](images/scatter_plot_matrix.png)
![Correlation Heatmap](images/correlation_heatmap.png)
![Study Boxplot](images/boxplots_study_hours_per_day.png)
![Sleep Boxplot](images/boxplots_sleep_hours_per_day.png)
![Sleep Boxplot](images/boxplots_extracurricular_hours_per_day.png)
@@ -45,4 +44,15 @@ No missing values or duplicate rows were found in the dataset. Outliers in numer
![Missing Values](images/missing_values.png)
![Duplicate Entries](images/duplicate_entries.png)
![Duplicate Entries](images/removed_outliers.png)
![Duplicate Entries](images/removed_outliers.png)
---
## Feature Engineering
To improve model performance and reduce redundancy, I performed feature engineering before training:
- **GPA** was removed because it was highly correlated with **study time**, reducing redundant information and potential multicollinearity.
- Features such as **extracurricular activity time** and **social time** were removed due to low predictive importance, minimizing noise and helping the model focus on the most relevant factors.
![Correlation Heatmap](images/correlation_heatmap.png)
![Feature Importance](images/feature_importance.png)