About Abaka AI
Abaka AI is built on one mission: to be the world’s most trusted data partner for AI companies. More than 1,000 industry leaders across Generative AI, Embodied AI, and Automotive AI rely on us to power their data pipelines. With our headquarters in Silicon Valley—and teams in Paris, Singapore, and Tokyo—we support global partners with fast, reliable, and scalable data solutions.
Our offerings include a diverse catalog of off-the-shelf datasets (image, video, multimodal, reasoning, 3D, and beyond) as well as comprehensive data collection and annotation services. Whether teams need raw data, curated datasets, or full-cycle data engineering, Abaka AI provides the foundation for building high-performance AI systems.
About the Role
We’re hiring an AI Data Infrastructure Engineer to build systems that power how large-scale datasets for LLM and multimodal models are discovered, evaluated, and scaled. This is a builder-first engineering role focused on designing LLM-powered agents, automation systems, and data pipelines. You’ll work on problems like:
-
Automatically discovering new data sources across the internet
-
Using LLMs and agents to evaluate and filter data sources at scale
-
Building systems that significantly increase data throughput without increasing headcount
-
This role sits at the intersection of data engineering, LLM systems, and applied AI infrastructure, and is ideal for someone who enjoys building from scratch and shipping fast.
Responsibilities
-
Build LLM-powered agents and automation systems for data discovery and evaluation
-
Design and implement data pipelines for ingesting, filtering, and transforming large-scale datasets
-
Develop internal tools for data quality scoring, ranking, and selection
-
Experiment with scraping, APIs, and programmatic data collection at scale
-
Rapidly prototype and iterate on systems that improve data acquisition speed and quality
-
Collaborate closely with Data Engineering and Research teams to align data systems with model needs
-
Build scalable systems that increase data throughput and efficiency
Qualifications
-
Strong technical foundation (engineering, scripting, systems, or data-focused background)
-
Experience building tools, automation, or pipelines from 0→1
-
Comfortable with Python, APIs, scraping, or backend workflows
-
Interest in LLMs, agents, or applied AI systems
-
Strong problem-solving ability and a builder mindset
-
Ability to operate independently in fast-paced, ambiguous environments
Nice to have:
-
Experience with LLM frameworks or agent systems
-
Experience with large-scale data processing or distributed systems
-
Familiarity with automation tools, workflow builders, or AI-assisted development (e.g., Cursor)
-
Startup or high-growth environment experience
Compensation & Benefits
The base salary range for this position is $110,000 - $160,000 USD annually.
Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies, and experience. Base pay is one part of the total package provided to compensate and recognize employees for their work at Abaka AI. This role is eligible for equity, as well as a comprehensive benefits package including health, dental, vision, PTO, and a flexible work schedule.