Sr ML Engineer | Member of Technical Staff
Responsible AI | LLM Trust & Safety
AI, Machine Learning & Data ScienceHybrid
San Fransisco, CA
$180-$230k base + equity
December 5, 2025
Apply NowOur client is a fast-growing AI company building foundational AI safety and reasoning systems designed to keep advanced AI models aligned, reliable, and under human control. As AI capabilities accelerate, the gap between what AI can do and what we can trust it to do is widening. The team is developing the core intelligence, evaluators, and training pipelines needed to ensure AI systems behave safely, consistently, and with human-aligned reasoning at scale.
If you're excited about working on cutting-edge model safety, interpretability, and evaluation challenges—this is one of the few places where your work will shape the future of safe, trustworthy AI.
What You’ll Do
- Build, train, and deploy large-scale ML models focused on reasoning, evaluation, and safe RLHF workflows
- Develop advanced rubrics, evaluators, and automated feedback systems for model alignment
- Lead research + engineering projects on safety-critical behaviors, adversarial robustness, and reliability
- Design experiments and scalable training pipelines across multi-GPU and distributed compute environments
- Collaborate with research scientists to prototype new methods for safe supervision, model critique, and long-horizon reasoning
- Optimize model performance (latency, quality, throughput) while maintaining strict safety and reliability standards
- Analyze model failures, edge cases, and safety-relevant behaviors—and turn insights into improved datasets or training methods
- Contribute to system architecture decisions and establish best practices for evaluation, data quality, and ML safety engineering
What We’re Looking For
- 4+ years of experience training, evaluating, or deploying ML models in production
- Deep knowledge of transformers, LLMs, diffusion models, or large-scale neural architectures
- Strong experience with PyTorch, distributed training (FSDP/DP/TP), and multi-GPU experimentation
- Background in evaluation, model safety, rubric-building, RLHF, or alignment research
- Experience analyzing failure cases, building evaluators, or designing datasets for safe/controlled behavior
- Ability to translate ambiguous safety goals into technical modeling tasks
- Excellent communication skills and ability to collaborate with research + engineering teams
- Advanced Degree (MSc or PhD)
Bonus Points
- Experience with:
- safety benchmarking tools
- adversarial or red-team evaluations
- long-context or agentic LLMs
- reinforcement learning (RLHF, RLAIF, DPO)
- automated evaluation pipelines or synthetic data generation
- Prior work at an AI safety org, research lab, or AGI-oriented startup
- Demonstrated experience publish papers and/ or speaking at tier 1 conferences
