HamzaGabajiwala

Software Development Engineer at Yahoo

Building large-scale data pipelines and audience targeting systems at Yahoo. Working with Spark, Airflow, and Flink on AWS to process audience segments for programmatic advertising at scale. Recently integrating GenAI/LLM capabilities into search retargeting pipelines.

01Skills

Data Engineering

Apache SparkPySparkApache AirflowApache FlinkOpenSearchAmazon MSK (Kafka)ProtobufAvro/ORC

GenAI / AI Tooling

Claude (Anthropic)Claude CodeAmazon BedrockLLM API integrationPrompt engineeringAgentic development

Cloud & DevOps

AWS EMREMR ServerlessAWS S3AWS GlueAWS LambdaEC2AWS BedrockChronosphereOpenTelemetryDockerKubernetesJenkinsCI/CD

Languages

PythonScalaJavaC++JavaScriptTypeScriptBash

Databases & Web

MySQLPostgreSQLMongoDBRedisFastAPISQLModelReact

02Experience

February 2024 — Present

Yahoo

Software Development Engineer I · Dublin, Ireland

  • Built Accelerated Audience Activation end-to-end — Spark + Avro + bucketed user partitioning + backward-compatible named-parameter scoring app — that cut new-segment activation latency from 00 hours to ~4 hours.
  • Built the GenAI keyword-expansion DAG (Airflow + EMR Serverless + Amazon Bedrock, currently Claude Sonnet 4.5) — scaled LLM concurrency from 0 to 0 threads to saturate the 0-RPM model quota and shipped inference-profile reuse that stopped Bedrock's 0-profile per-region cap from killing hourly runs.
  • Shipped 3-level real-time re-engagement targeting in Flink (Line → Package → Campaign) — extended the segment cache with TLongObjectMap package/order indexes, added a DSP line mapping cache loaded from S3, and feature-flagged the rollout so it activates only when the new rule types are exposed.
  • Designed a segment-reprocessing system covering Yahoo DSP's 0K-segment audience catalog — daily health-check DAG with Slack alerting, a forward-compatible write_target toggle for the upcoming OpenSearch → S3 cutover, and remediation playbooks that resolved live customer incidents (traced 0 segments, restored 0 for a major travel advertiser — ~0M users brought back).
  • Migrated 4 production scoring systems — upgraded data access layers, moved from EMR v6 → v7, replaced Glue catalog reads with direct S3, and migrated monitoring to Chronosphere via OpenTelemetry — cutting batch-scoring cost 0% and unlocking sub-minute alerting.
  • Led the team's adoption of agentic development tooling — consolidated 4 product repos under a shared submodule layout, published a Claude Code plugin marketplace with 7+ shared skills, and ran weekly knowledge-sharing for the Dublin team.

June 2021 — June 2022

TIAA GBS

Software Developer · Mumbai, India

  • Migrated 0+ test cases from Selenium to WebDriver in 0 days — manual regression to automated nightly runs.
  • Cut data-collection downtime by 0% with REST API ingestion, enabling same-day reporting.

03Projects

04Publications

05Education

2022 — 2023

Trinity College Dublin

M.Sc. Computer Science — AR/VR · 1:1

2018 — 2022

NMIMS University

B.Tech Computer Engineering · GPA: 3.45/4