Credit Scoring Model using Machine Learning (UCI Dataset)

Опубликовано: 21 Май 2026
на канале: Sohag H-75
86
5

Project Description:
This project focuses on building a predictive Credit Scoring Model using supervised Machine Learning techniques to assess whether a credit card customer is likely to default on their payment next month. The model is trained on the UCI Credit Card Dataset, which contains a mix of financial, behavioral, and demographic features.

GitHub Repo: [https://github.com/Sohag016/CodeAlpha...]
LinkedIn Post: [  / sohag-a5550a374  ]

✅ Key Highlights:
Dataset: UCI Credit Card Dataset (25 features, 30,000+ samples)

Target Variable: default.payment.next.month (1 = default, 0 = not default)

ML Models Used:

Logistic Regression

Decision Tree

Random Forest (⭐️ Best Performer)

Handling Imbalanced Data: SMOTE from imbalanced-learn

Best Performance:

Accuracy: 79.72%

ROC-AUC: 0.753 (Random Forest)

Confusion Matrix: [[4178, 495], [722, 605]]

Top Features: PAY_0, LIMIT_BAL, AGE, PAY_2, PAY_AMT1, etc.

📈 Cross-Validation (ROC-AUC):
Model Mean ROC-AUC Std Dev
Logistic Regression 0.7356 0.0042
Decision Tree 0.8226 0.0536
Random Forest 0.9371 0.0350

🛠️ Tech Stack:
Programming: Python

Libraries: Pandas, NumPy, scikit-learn, imbalanced-learn, Matplotlib, Seaborn

🎓 Learnings:
How to handle class imbalance using SMOTE

The impact of feature importance in decision-making

How ensemble models (Random Forest) outperform simpler models in real-world classification problems

👨‍💻 Author:
Md. Sohag Hossain
Role: Machine Learning Intern @CodeAlpha
📅 Date: August 2025