Classification with NN and Tree-based Machine Learning Models: Wine Quality Dataset
In this project, I did classification of wine quality using several Tree-based Machine Learning models and simple Neural Network model:
- Gradient Boosting Machines (GBM)
- XGBoost
- RandomForest
- Neural Network
The key steps included:
- Data Preparation and Preprocessing:
- Data Exploration: To check the types of data within the dataset, identify missing values, and detect duplicates, to ensure the dataset is clean for analysis..
- Feature Selection After an exploratory data analysis, I selected features based on their correlation with wine quality and domain knowledge.
- Data Transformation Applied log transformation to certain skewed features to normalize their distribution, improving model performance.
- Model Training and Evaluation: Tuned with GridSearchCV to find the best parameters.
- Results and Analysis:
On the one hand, RandomForest performed the best compared to other tree-based machine learning methods (XGBoost, Gradient Boosting Machines (GBM)),
demonstrating its robustness and effectiveness in handling variability. On the other hand, the Neural Network model with a simple architecture showed superior performance.
However, the choice of which model to use depends on the importance of the model's interpretability, deployment requirements, and operational considerations.
Technologies used
- Python
- NN
- Tree-based ML