ML · Imbalanced Classification

Credit Card Fraud Detection.

An end-to-end fraud pipeline on a 284k-transaction dataset with a 0.17 per cent fraud rate, where the threshold is the real work.

Live demo Source

Screenshot of Credit Card Fraud Detection

Challenge

The dataset has 284,807 transactions and a 0.17 per cent fraud rate. A naive model scores 99.8 per cent accuracy by predicting that nothing is ever fraud, so accuracy is a trap and the whole problem lives in how you handle the imbalance.

Approach

A pipeline that takes class imbalance seriously: SMOTE oversampling on the training fold only, a head-to-head comparison of XGBoost, Random Forest and Logistic Regression, and precision-recall threshold tuning rather than chasing accuracy.

Outcome

A model evaluated on the metric that actually matters for fraud, the precision-recall trade-off at a usable operating threshold, with the full analysis open for review on Hugging Face and GitHub.

Key decisions

SMOTE applied inside the training fold only, avoiding the leakage that inflates naive imbalanced-data results.
Three models compared head to head: XGBoost, Random Forest and Logistic Regression.
Operating point chosen on the precision-recall curve, not on accuracy, because accuracy is meaningless at a 0.17 per cent base rate.
Reproducible notebook plus an interactive demo so the trade-offs are inspectable, not just claimed.

← Back to all work