Senior AI Data Engineer III
Posted: 06/02/2026
Job Number: 42375
Job Description
Senior AI Data Engineer III
Hybrid
Summary
Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI requires orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row scale.
This role sits at the intersection of Data Engineering and ML Systems. The Senior AI Data Engineer will own end-to-end data pipelines that don't just move and transform data, but enrich it through remote model inference, managing the systems complexity of async execution, capacity allocation, retry/fallback logic, and throughput optimization that comes with it. This is not a pure ETL-with-SQL role; it demands hands-on systems experience with distributed inference infrastructure.
Our team develops comprehensive data curation and evaluation solutions for image generation models across quality dimensions including visual quality, prompt adherence, identity preservation, naturalness, and visual text generation.
Responsibilities
Pay range is $60 - $65 per hour with full benefits available, including paid time off, medical/dental/vision/life insurance, 401K, parental leave, and more. Our compensation reflects the cost of labor across several US geographic markets. Pay is based on several factors including market location and may vary depending on job-related knowledge, skills, and experience.
THE PROMISES WE MAKE:
At Crystal Equation, we empower people and advance technology initiatives by building trust. Your recruiter will prep you for the interview, obtain feedback, guide you through any necessary paperwork and provide everything you need for a successful start. We will serve to empower you along the way and provide the path for your professional journey.
For more information regarding our Privacy Policy, please visit crystalequation.com/privacy.
Hybrid
Summary
Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI requires orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row scale.
This role sits at the intersection of Data Engineering and ML Systems. The Senior AI Data Engineer will own end-to-end data pipelines that don't just move and transform data, but enrich it through remote model inference, managing the systems complexity of async execution, capacity allocation, retry/fallback logic, and throughput optimization that comes with it. This is not a pure ETL-with-SQL role; it demands hands-on systems experience with distributed inference infrastructure.
Our team develops comprehensive data curation and evaluation solutions for image generation models across quality dimensions including visual quality, prompt adherence, identity preservation, naturalness, and visual text generation.
Responsibilities
- AI-Augmented Data Pipelines: Design and maintain AI-augmented, large-scale data pipelines (billions of images) integrating traditional transformations with ML models (classifiers, embeddings, LLMs) for cleaning and annotation.
- Remote Inference Orchestration: Own the systems for remote ML model inference orchestration within pipelines, managing batching, retries, async jobs, and ensuring graceful degradation.
- Feature Pipelines: Build and maintain scalable pipelines for generating, storing, and serving vector embeddings, including nearest-neighbor index management and quality validation.
- Data Curation at Scale: Source, filter, and curate training datasets using a combination of SQL and model-derived signals (e.g., aesthetic scores, NSFW classifiers), owning the end-to-end data flow and maintaining governance, quality, and compliance.
- LLM-Assisted Annotation: Design and operate pipelines that use LLMs and vision models for automated annotation of training data, including auditing workflows to measure and improve annotation model performance.
- Tooling & Frameworks: Contribute to shared tooling and frameworks that make it easier for the broader team to build AI-augmented data pipelines - e.g., reusable operators for model invocation, standard patterns for async job management.
- Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow, Dataswarm, or equivalent).
- Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale.
- Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
- Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration.
- Be onsite in MPK, working closely with engineers and researchers.
- Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings (FAISS, Milvus, or equivalent).
- Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring.
- Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs.
- Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.).
- Bachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.
- 5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.
- Demonstrated track record of building and operating production data pipelines that invoke ML models at scale.
Pay range is $60 - $65 per hour with full benefits available, including paid time off, medical/dental/vision/life insurance, 401K, parental leave, and more. Our compensation reflects the cost of labor across several US geographic markets. Pay is based on several factors including market location and may vary depending on job-related knowledge, skills, and experience.
THE PROMISES WE MAKE:
At Crystal Equation, we empower people and advance technology initiatives by building trust. Your recruiter will prep you for the interview, obtain feedback, guide you through any necessary paperwork and provide everything you need for a successful start. We will serve to empower you along the way and provide the path for your professional journey.
For more information regarding our Privacy Policy, please visit crystalequation.com/privacy.
Meet Your Recruiter
Share This Job:
Related Jobs:
Login to save this search and get notified of similar positions.About Menlo Park, CA
Ready to jumpstart your career in the vibrant heart of Menlo Park, California? Nestled in the bustling Silicon Valley, this dynamic region beckons with unparalleled growth opportunities and a thriving tech-centric environment. Unleash your potential in the epicenter of innovation, where companies like Facebook and Stanford University drive groundbreaking advancements. With the picturesque Stanford University, the iconic Menlo Park Caltrain station, and the bustling downtown area brimming with cafes and boutiques, this locale offers a blend of charm and modernity like no other. Explore career opportunities surrounded by the lush landscapes of nearby parks like Bedwell Bayfront Park or catch a game at the Stanford Stadium. Join us in discovering the endless possibilities that await in Menlo Park and unlock your career aspirations today!