AI Data Infrastructure Engineer

🇺🇸 United StatesRemote

Posted Mar 23, 2026

About Abaka AI

Abaka AI is built on one mission: to be the world’s most trusted data partner for AI companies. More than 1,000 industry leaders across Generative AI, Embodied AI, and Automotive AI rely on us to power their data pipelines. With our headquarters in Silicon Valley—and teams in Paris, Singapore, and Tokyo—we support global partners with fast, reliable, and scalable data solutions.

Our offerings include a diverse catalog of off-the-shelf datasets (image, video, multimodal, reasoning, 3D, and beyond) as well as comprehensive data collection and annotation services. Whether teams need raw data, curated datasets, or full-cycle data engineering, Abaka AI provides the foundation for building high-performance AI systems.

About the Role

We’re hiring an AI Data Infrastructure Engineer to build systems that power how large-scale datasets for LLM and multimodal models are discovered, evaluated, and scaled. This is a builder-first engineering role focused on designing LLM-powered agents, automation systems, and data pipelines. You’ll work on problems like:

Automatically discovering new data sources across the internet
Using LLMs and agents to evaluate and filter data sources at scale
Building systems that significantly increase data throughput without increasing headcount
This role sits at the intersection of data engineering, LLM systems, and applied AI infrastructure, and is ideal for someone who enjoys building from scratch and shipping fast.

Responsibilities

Build LLM-powered agents and automation systems for data discovery and evaluation
Design and implement data pipelines for ingesting, filtering, and transforming large-scale datasets
Develop internal tools for data quality scoring, ranking, and selection
Experiment with scraping, APIs, and programmatic data collection at scale
Rapidly prototype and iterate on systems that improve data acquisition speed and quality
Collaborate closely with Data Engineering and Research teams to align data systems with model needs
Build scalable systems that increase data throughput and efficiency

Qualifications

Strong technical foundation (engineering, scripting, systems, or data-focused background)
Experience building tools, automation, or pipelines from 0→1
Comfortable with Python, APIs, scraping, or backend workflows
Interest in LLMs, agents, or applied AI systems
Strong problem-solving ability and a builder mindset
Ability to operate independently in fast-paced, ambiguous environments

Nice to have:

Experience with LLM frameworks or agent systems
Experience with large-scale data processing or distributed systems
Familiarity with automation tools, workflow builders, or AI-assisted development (e.g., Cursor)
Startup or high-growth environment experience

Compensation & Benefits

The base salary range for this position is $110,000 - $160,000 USD annually.

Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies, and experience. Base pay is one part of the total package provided to compensate and recognize employees for their work at Abaka AI. This role is eligible for equity, as well as a comprehensive benefits package including health, dental, vision, PTO, and a flexible work schedule.

Apply on company site

AI Data Infrastructure Engineer

Similar jobs from Abaka AI

Data Engineer