Power Plant Energy Output: Regression Model Comparison
Regression analysis comparing 4 models (linear, multiple, polynomial, KNN) for power plant energy output prediction. Identified optimal atmospheric predictors using UCI CCPP dataset.
Problem
Power plant operators need accurate energy output predictions based on atmospheric conditions to optimize operational efficiency and grid management. Requires systematic evaluation of different regression modeling approaches to determine which best captures the complex relationships between environmental factors and electrical generation.
Approach
Analyzed the UCI Combined Cycle Power Plant (CCPP) dataset using comprehensive regression methodology. Started with exploratory data analysis including descriptive statistics (means, medians, quartiles, ranges), pairwise scatter plots, and correlation analyses. Implemented and compared four modeling approaches: (1) Simple linear regression with individual predictors, (2) Multiple linear regression with all predictors simultaneously, (3) Polynomial regression with interaction terms, and (4) K-nearest neighbors regression with feature normalization and optimized k-parameter selection. Evaluated statistical significance, identified outliers, and performed comparative performance analysis to determine optimal prediction strategy.
Impact
Identified key atmospheric and operational predictors significantly affecting power plant electrical output through systematic model comparison. Demonstrated performance trade-offs between parametric linear approaches and non-parametric KNN regression, providing data-driven insights for power plant optimization and predictive maintenance strategies.
Key Metrics
Technologies
Links
My Role
Sole developer - conducted exploratory data analysis with descriptive statistics and visualizations, implemented simple and multiple linear regression models, developed polynomial regression with interaction terms, optimized KNN regression with feature normalization, performed comparative model evaluation, documented findings and methodology in Jupyter Notebook. Course project for data science coursework at USC.
Team Size: 1 person