Position Summary
We’re hiring a Senior Data Engineer to own data at truly massive scale. You’ll design and run pipelines that clean, enrich, and serve data spanning hundreds of attributes across 80M+ companies and 800M+ people. The role blends classic data engineering with data operations, vendor/BPO orchestration, and data partnerships.
What we’re looking for
Core stack: Python, Dagster, DuckDB
Pipelines at scale: Building resilient ELT/ETL with strong contracts, idempotency, and lineage.
Data operations: Set quality bars, manage BPO workflows, and run SLAs with external data partners.
Serving & access: Position data for production use from serving infrastructure, documentation, and SLAs for internal consumers.
Cost & performance: You tune storage/compute and keep a sharp eye on unit economics.
Opinionated: Deep level of understanding of the technological landscape, making both high level system and granular code design decisions based on understanding rather than preference - diving deep on unknown patterns in order to build the best product.
Responsibilities
Own end-to-end data flows: ingestion, normalization, entity resolution, enrichment, and delivery.
Stand up monitoring for freshness, completeness, and accuracy; drive RCA and prevention.
Build internal tools that make data discoverable and usable by engineering and product.
Recruit, onboard, and manage BPO vendors; negotiate and run data partnerships.
Nice to have
Experience with big data, columnar storage formats, vector indexes, and privacy/compliance in data products.
Logistics:
Compensation: $165K - $250K salary, $100K-200K equivalent in new hire equity (4 year vest)
Location: New York City. We are a fully in-office team working out of Midtown Manhattan Monday through Friday. We allow for WFH days when anyone is traveling, but we do not allow for permanent remote work.
Benefits:
Generous health, dental, and vision insurance
401k with 3% automatic contribution (no vesting)
Paid Lunches
Wellness and Citi Bike benefit