Data Scientist

LLM Trust & Safety

AI, Machine Learning & Data Science

Apply Now

Hybrid

San Fransisco, CA

$180-$220k + Equity

December 5, 2025

Apply Now

Our client is a fast-growing AI start-up building foundational AI safety and reasoning systems designed to keep advanced AI models aligned, reliable, and under human oversight. As AI models become more powerful, the metrics, evaluations, and data intelligence behind them become mission-critical. The team is developing the measurement systems that ensure models behave safely, consistently, and according to human-aligned objectives.

If you want to shape the analytics, benchmarking, and insight engine behind cutting-edge AI safety work—this role gives you the chance to define how next-generation AI systems are evaluated and understood.

What You’ll Do

Build and own the company’s internal and external metrics frameworks, defining how model quality, safety, reliability, and performance are measured
Develop BI dashboards, reporting layers, and analytics tools to track model behavior, product adoption, and system health
Run post-hoc data analysis to guide product decisions, uncover failure modes, and validate new features
Benchmark models and systems against leading LLMs, designing comparative evaluations and performance studies
Build, design, and analyze A/B tests to measure impact and drive data-backed product improvements
Collaborate with ML, product, and research teams to translate ambiguous questions into structured analyses and actionable insights
Work with the engineering team to ensure high-quality data flows, instrumentation, and measurement pipelines

What We’re Looking For

3–6+ years of experience in data science, applied analytics, or research-driven data work
Strong proficiency with Python, SQL, and statistical analysis libraries
Experience building metrics systems, dashboards, instrumentation, or analytical reporting tools
Ability to design and run A/B tests, experiments, and causal inference analyses
Familiarity with evaluating machine learning models—especially LLMs—through benchmarks or custom evaluation suites
Comfortable with ambiguous problem spaces and translating qualitative goals into quantitative frameworks

Bonus Points

Experience in any of the following:
- AI Safety, AI Security, or Trust & Safety evaluations
- LLM evaluation frameworks, rubric-based scoring, or red-teaming data
- Building analytics or evaluation systems for ML-driven products
- Working with experiment platforms, BI tools, or observability stacks
Prior experience partnering with ML research teams on model evaluation, bias testing, or safety metrics

‍

Adam@intelletec.com

View Profile

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Data Scientist

LLM Trust & Safety

What You’ll Do

What We’re Looking For

Bonus Points

Adam Brown

More open jobs