Software Engineer – Data & Infrastructure

LLM Trust & Safety

Our client is a fast-growing AI company building foundational AI safety and reasoning systems designed to keep advanced AI models aligned, reliable, and under human oversight. As AI capabilities accelerate, the safety infrastructure behind them must scale just as quickly. The team is developing the core data, evaluation, and training systems that ensure models behave safely, consistently, and with human-aligned reasoning at scale.

If you want to build the data backbone behind cutting-edge AI safety research—this role gives you the opportunity to shape the pipelines and infrastructure that power safe, trustworthy AI.

What You’ll Do

  • Design, build, and maintain the data lake, warehouse, and ingestion pipelines that power training, evaluation, and safety research
  • Develop scalable ETL/ELT processes, ingesting structured and unstructured data from diverse internal and external sources
  • Build orchestration workflows using tools like Airflow, Prefect, Dagster, or Argo, ensuring reliability and observability
  • Collaborate with ML engineers to deliver high-quality datasets for model training, safety evaluations, and RLHF pipelines
  • Implement robust data quality checks, validation layers, and monitoring systems for safety-critical data
  • Optimize data storage, compute usage, and distributed processing systems
  • Contribute to infrastructure decisions related to storage, schema design, data governance, and scaling
  • Help develop tooling that accelerates annotation, evaluation, and rubric-driven feedback loops
  • Improve internal developer experience across data pipelines, environments, and CI/CD workflows

What We’re Looking For

  • 4+ years of experience in software or data engineering building production-grade pipelines or infrastructure
  • Strong experience with Python, SQL, and modern data engineering frameworks
  • Hands-on expertise with data lakes, warehouses, ETL pipelines, and distributed data processing
  • Familiarity with cloud infrastructure (AWS, GCP, or Azure), containerization, and orchestration
  • Experience building and scaling systems with tools like Spark, Ray, Kafka, Airflow, Dagster, or Argo
  • Strong debugging and systems-thinking mindset across data, infrastructure, and backend components
  • Understanding of versioning, schema evolution, and reliability principles for critical data assets
  • Ability to collaborate closely with ML teams and translate ambiguous requirements into clear data workflows

Bonus Points

  • Experience with:
    • DevOps / platform engineering
    • distributed compute (Ray, Spark, Kubernetes)
    • data governance, cataloging, or lineage systems
    • automated evaluation or annotation pipelines
    • safety-, ML-, or research-oriented data environments
  • Prior work supporting ML training, evaluation pipelines, or AI safety initiatives

Adam@intelletec.com
View Profile
Max file size 10MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.