Vikrant Sharma
← All work

ML · Imbalanced Classification

Credit Card Fraud Detection.

An end-to-end fraud pipeline on a 284k-transaction dataset with a 0.17 per cent fraud rate, where the threshold is the real work.

Screenshot of Credit Card Fraud Detection

Challenge

The dataset has 284,807 transactions and a 0.17 per cent fraud rate. A naive model scores 99.8 per cent accuracy by predicting that nothing is ever fraud, so accuracy is a trap and the whole problem lives in how you handle the imbalance.

Approach

A pipeline that takes class imbalance seriously: SMOTE oversampling on the training fold only, a head-to-head comparison of XGBoost, Random Forest and Logistic Regression, and precision-recall threshold tuning rather than chasing accuracy.

Outcome

A model evaluated on the metric that actually matters for fraud, the precision-recall trade-off at a usable operating threshold, with the full analysis open for review on Hugging Face and GitHub.

Key decisions

  • SMOTE applied inside the training fold only, avoiding the leakage that inflates naive imbalanced-data results.
  • Three models compared head to head: XGBoost, Random Forest and Logistic Regression.
  • Operating point chosen on the precision-recall curve, not on accuracy, because accuracy is meaningless at a 0.17 per cent base rate.
  • Reproducible notebook plus an interactive demo so the trade-offs are inspectable, not just claimed.