Project
Financial ETL Modernization on Databricks
Refactored legacy financial ETL into modular PySpark pipelines and SQL models for faster, cleaner reporting.
Highlights
- Reduced repetitive SQL logic by modularizing shared transformations.
- Added validation checks to improve downstream data quality.
- Simplified change requests by standardizing pipeline structure.
Outcome
A streamlined ETL workflow now powers finance reporting with clearer logic, faster updates, and fewer manual fixes.
What I did
- Rebuilt ingestion and transformation steps in Python and PySpark.
- Standardized shared logic across datasets to reduce duplication.
- Added validation checks before publishing to reporting tables.
Next steps
Expand automated monitoring to flag data anomalies before stakeholders see them.