Open to opportunities

Sai Charan Pulugam

Data Engineer @ Walmart

I process 2TB+ daily at Walmart, building the data infrastructure behind enterprise-scale retail analytics. From architecting cloud data lakes on Azure to optimizing PySpark jobs that cut runtime by 35%, I turn raw data into reliable, production-grade systems that 50+ stakeholders depend on every day.

3+Years

2TB+Daily

50+Users

35%Faster

GitHub LinkedIn Email me

Kafka→PySpark→Snowflake→dbt→Airflow→Power BI|Azure ADF→Databricks→Delta Lake→Synapse→Docker→Terraform|Kafka→PySpark→Snowflake→dbt→Airflow→Power BI|Azure ADF→Databricks→Delta Lake→Synapse→Docker→Terraform

// tech_stack

Tools I work with

Cloud & Warehousing

Azure (ADF, Databricks, Synapse, ADLS Gen2), Snowflake, AWS (S3, Redshift), BigQuery

Big Data & Streaming

PySpark, Kafka, Event Hubs, Hive, HDFS

Languages & Tools

Python, SQL, Spark SQL, Airflow, dbt, Docker, Git, CI/CD, Terraform

Data Modeling

ETL/ELT, Kimball Star Schema, SCD Type I/II, Lakehouse Architecture, Delta Lake

Visualization

Power BI, Tableau, Looker, Streamlit

// experience

Where I've worked

Data Engineer

Walmart — USA

Jul 2025 - Present

Architected a cloud-based data lake on ADLS Gen2, integrating 10+ retail systems for 50+ stakeholders
Engineered 20+ automated ETL pipelines using Azure Data Factory, eliminating 15 hrs/week of manual work
Led migration to event-driven pipelines using Kafka, increasing throughput by 40%
Optimized PySpark jobs through partitioning and join tuning, cutting runtime by 35%

Data Engineer (Internship)

Walmart — USA

May 2024 - Dec 2024

Migrated retail datasets to Snowflake with Snowpipe, reducing query latency by 30%
Built ETL workflows using PySpark and ADF, processing 2TB+ daily with 99.5% uptime
Established automated validation checks resolving discrepancies across 3+ downstream systems

Data Engineer

Maxgen Technologies — India

Aug 2021 - Jul 2023

Modernized ETL from on-prem SQL Server to cloud-native Spark, boosting reliability by 30%
Processed 5TB daily transactional records using HDFS and MapReduce with zero data loss
Designed Star and Snowflake schemas with SCD, enabling 20% faster downstream analytics

// projects

What I've built

dbt E-Commerce Warehouse

Production-ready Kimball star schema warehouse with dbt and Snowflake. 35+ automated tests, SCD Type 2 snapshots, CI/CD pipeline.

dbtSnowflakeSQLJinjaGitHub Actions

Real-Time Retail Pipeline

Streaming and batch pipeline ingesting retail events via Kafka, processing with PySpark, loading to Snowflake with medallion architecture.

KafkaPySparkSnowflakeAirflowDockerStreamlit

Azure Cloud Lakehouse

End-to-end Azure Lakehouse with ADF ingestion, Databricks transformation, ADLS Gen2 storage, and Synapse analytics layer.

Azure ADFDatabricksADLS Gen2SynapseDelta LakeTerraform

Healthcare Data Pipeline

Automated pipeline pulling CMS Medicare and FDA API data into Snowflake with dbt transformations and quality monitoring.

PythonSnowflakedbtAirflowGreat ExpectationsStreamlit

Data Quality Framework

YAML-configurable monitoring framework with freshness, volume, schema, and anomaly detection monitors plus Slack alerts.

PythonSQLAirflowSlack APIStreamlitDocker

// education

Academic background

M.S. Data Science

University of Massachusetts Dartmouth — May 2025

// get_in_touch

Let's connect

I'm actively looking for Data Engineer opportunities. Whether you have an opening, a project idea, or just want to talk data — I'd love to hear from you.

saipcharan2023@gmail.com