anthropic logo

Research Scientist, Interpretability

Verified
anthropic
Posted 2 weeks ago
Posted 3 April 2026
2 views

About the Role

<div class="content-intro"><h2><strong>About Anthropic</strong></h2> <p>Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p></div><h2 class="heading">About the role:</h2> <p>When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?"</p> <p>The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. We’re looking for researchers and engineers to join our efforts.&nbsp;</p> <p>People mean many different things by "interpretability". We're focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do "biology" or "neuroscience" of neural networks using “microscopes” we build, or as treating neural networks as binary computer programs we're trying to "reverse engineer".</p> <p>A few places to learn more about our work and team at a high level are <a class="text-accent-secondary-100 underline" href="https://www.youtube.com/watch?v=TxhhMTOTMDg" target="_blank">this introduction to Interpretability</a> from our research lead, <a class="text-accent-secondary-100 underline" href="https://colah.github.io/about.html" target="_blank">Chris Olah</a>; a <a class="text-accent-secondary-100 underline" href="https://open.spotify.com/episode/5UF79Uu94ia0fwC32a89LU" target="_blank">discussion of our work</a> on the <a class="text-accent-secondary-100 underline" href="https://www.nytimes.com/column/hard-fork" target="_blank">Hard Fork podcast</a> produced by the New York Times, and this <a class="text-accent-secondary-100 underline" href="https://www.anthropic.com/research/engineering-challenges-interpretability" target="_blank">blog post</a> (and accompanying video) sharing more about some of the engineering challenges we’d had to solve to get these results. Some of our team's notable publications include <a class="text-accent-secondary-100 underline" href="https://transformer-circuits.pub/2021/framework/index.html" target="_blank">A Mathematical Framework for Transformer Circuits</a>, <a class="text-accent-secondary-100 underline" href="https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html" target="_blank">In-context Learning and Induction Heads</a>, <a class="text-accent-secondary-100 underline" href="https://transformer-circuits.pub/2022/toy_model/index.html" target="_blank">Toy Models of Superposition</a>, <a class="text-accent-secondary-100 underline" href="https://transformer-circuits.pub/2024/scaling-monosemanticity/" target="_blank">Scaling Monosemanticity</a>, and our Circuits’ <a class="text-accent-secondary-100 underline" href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html" target="_blank">Methods</a> and <a class="text-accent-secondary-100 underline" href="https://transformer-circuits.pub/2025/attribution-graphs/biology.html" target="_blank">Biology</a> papers. This work builds on ideas from members' work prior to Anthropic such as the <a class="text-accent-secondary-100 underline" href="https://distill.pub/2020/circuits/" target="_blank">original circuits thread</a>, <a class="text-accent-secondary-100 underline" href="https://distill.pub/2021/multimodal-neurons/" target="_blank">Multimodal Neurons</a>, <a class="text-accent-secondary-100 underline" href="https://distill.pub/2019/activation-atlas/" target="_blank">Activation Atlases</a>, and <a class="text-accent-secondary-100 underline" href="https://distill.pub/2018/building-blocks/" target="blank">Building Blocks</a>.</p> <p>We aim to create a solid foundation for mechanistically understanding neural networks and making them safe (see our <a class="text-accent-secondary-100 underline&qu

Related Searches

Explore more opportunities matching this role's title, location, and skills.

Job Title PagesLocation PagesCompany PagesSkill Pages

Ready to apply?

Click below to apply directly on anthropic's careers page.

Get the top 10 hyper-growth roles delivered to your inbox every Tuesday.