About
Data Scientist with 5 years of experience building, validating, and deploying statistical and machine learning models that quantify credit risk and forecast loan portfolio performance at scale. Skilled in predictive modeling, time series forecasting, clustering, and model governance across billion-record datasets using Python, Conda, AWS, Spark, and H2O. Experienced partnering with cross-functional teams of data scientists, software engineers, and product managers to translate ambiguous business problems into validated, production-ready solutions for bank volume, revenue, and expense forecasting.
Skills & Tools
Programming Languages
ML Frameworks & Libraries
Big Data & Cloud
MLOps & Model Governance
Databases & Visualization
Statistical & ML Methods
Experience
- Partnered with a cross-functional team of data scientists, software engineers, and product managers to build machine learning models forecasting loan portfolio volume, revenue, and credit losses across a multi-billion-dollar book.
- Used Python, Conda, AWS, H2O AutoML, and Apache Spark to surface insights from large volumes of structured and unstructured loan, market, and transaction data, informing credit risk and portfolio strategy decisions.
- Built and trained classification and time-series forecasting models end to end using Amazon SageMaker, from design through training, evaluation, validation, and deployment, improving credit loss forecast precision by 22%.
- Developed challenger models to benchmark against deployed champion models, applying rigorous backtesting and statistical validation, including confusion matrix and ROC/AUC analysis, to recommend production updates.
- Built scalable PySpark and SQL pipelines on AWS Glue and EMR to retrieve, combine, and analyze data from structured and unstructured sources, improving data availability and reproducibility by 40% across the team.
- Presented loss outlook and allowance impacts to non-technical executives through clear visualizations and storytelling, translating complex model risk findings into actionable decisions for senior stakeholders.
- Built, validated, and backtested classification and risk-scoring models using Scikit-learn, XGBoost, and logistic regression on large-scale claims datasets, achieving an 87% F1-score and 92% recall in production.
- Developed a clinical risk stratification platform that automated patient risk scoring, improving population risk segmentation accuracy by 22% and cutting manual triage time by 30% for the care management team.
- Built scalable ETL pipelines with Python, PySpark, and SQL on AWS to combine claims, billing codes, and clinical notes into high-quality, analysis-ready datasets for downstream modeling and reporting.
- Tuned models via grid search and Bayesian optimization and applied SHAP explainability to deliver statistically grounded interpretations to clinical and compliance stakeholders ahead of model approval.
- Partnered with product, actuarial, and engineering teams across Agile sprints to translate model outputs into actionable insights, presenting findings to leadership through Tableau and Power BI dashboards.
Education
Certificates
Let's Connect
Got an idea, a role, or a problem worth solving? Drop me a message.