Project Description:
This project focuses on building a predictive Credit Scoring Model using supervised Machine Learning techniques to assess whether a credit card customer is likely to default on their payment next month. The model is trained on the UCI Credit Card Dataset, which contains a mix of financial, behavioral, and demographic features.
GitHub Repo: [https://github.com/Sohag016/CodeAlpha...]
LinkedIn Post: [ / sohag-a5550a374 ]
✅ Key Highlights:
Dataset: UCI Credit Card Dataset (25 features, 30,000+ samples)
Target Variable: default.payment.next.month (1 = default, 0 = not default)
ML Models Used:
Logistic Regression
Decision Tree
Random Forest (⭐️ Best Performer)
Handling Imbalanced Data: SMOTE from imbalanced-learn
Best Performance:
Accuracy: 79.72%
ROC-AUC: 0.753 (Random Forest)
Confusion Matrix: [[4178, 495], [722, 605]]
Top Features: PAY_0, LIMIT_BAL, AGE, PAY_2, PAY_AMT1, etc.
📈 Cross-Validation (ROC-AUC):
Model Mean ROC-AUC Std Dev
Logistic Regression 0.7356 0.0042
Decision Tree 0.8226 0.0536
Random Forest 0.9371 0.0350
🛠️ Tech Stack:
Programming: Python
Libraries: Pandas, NumPy, scikit-learn, imbalanced-learn, Matplotlib, Seaborn
🎓 Learnings:
How to handle class imbalance using SMOTE
The impact of feature importance in decision-making
How ensemble models (Random Forest) outperform simpler models in real-world classification problems
👨💻 Author:
Md. Sohag Hossain
Role: Machine Learning Intern @CodeAlpha
📅 Date: August 2025