Senior Staff Applied AI Engineer - Context Retrieval
VerifiedAbout the Role
<p>P-1549</p> <p>At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business.</p> <p><strong>The Mission</strong></p> <p>Databricks agents are only as good as the context they can retrieve. Whether an agent is answering a question about last quarter's revenue, debugging a failing job, generating SQL against a 10,000-table lakehouse, or summarizing a Wiki page, its quality is bounded by what it can find — and how well it understands what it finds.</p> <p>We are hiring a <strong>Senior Staff Applied AI Engineer</strong> to own <strong>context retrieval for Databricks agents across SaaS providers</strong>. This is a zero-to-one role with two deeply connected charters:</p> <ol> <li><strong>Build the retrieval stack</strong> — query understanding, content understanding, ranking, retrieval, and evaluation — across the Enterprise SaaS data stored across multiple systems.</li> <li><strong>Build the search subagents</strong> that sit on top of that stack and reason about <em>what context is needed</em>, <em>how to retrieve it</em>, and <em>whether the right thing actually came back</em> — closing the loop between an agent's intent and the substrate that serves it.</li> </ol> <p>If you have deep Information Retrieval wisdom, have shipped retrieval systems for RAG and agentic workloads, and want to build the substrate — and the agents on top of it — that make every Databricks agent measurably smarter, this role is for you.</p> <p><strong>What You Will Do</strong></p> <ul> <li><strong>Build the full retrieval stack from scratch.</strong> Own the end-to-end system: query understanding, content understanding and indexing, hybrid retrieval, ranking, and evaluation. Make the architectural calls that will define how Databricks agents access context for years to come.</li> <li><strong>Retrieve across heterogeneous data — structured and unstructured.</strong> Index and rank across structured assets (tables, columns, SQL queries, dashboards, code, notebooks, jobs) and unstructured content (docs, wikis, tickets, chat, images, video, audio). Each modality has its own signals — design retrieval that exploits them rather than flattens them.</li> <li><strong>Connect to the SaaS surface area customers actually use.</strong> Build connectors and retrieval adapters for the systems where enterprise knowledge lives. Treat each retrieval source with its own freshness, permissions, and ranking signals.</li> <li><strong>Optimize for two consumers at once.</strong> Retrieval must serve both LLMs (grounded, token-efficient, hallucination-resistant context) and humans (intuitive, explainable discovery). These are different objectives and require different signals — own both.</li> <li><strong>Crack query understanding for agents.</strong> Agent queries don't look like web queries. Build query rewriting, decomposition, intent classification, and entity resolution tuned for multi-turn agentic workflows.</li> <li><strong>Crack content understanding at scale.</strong> Build the pipelines that extract structure, entities, embeddings, summaries, and metadata from every supported asset type — and keep them fresh as customer data evolves.</li> <li><strong>Build search subagents that reason about retrieval.</strong> Design the agentic layer that decides <em>what context is needed</em>, <em>which sources to query</em>, <em>how to decompose and route the search</em>, and — critically — <em>whether the retrieved content is actually sufficient to answer the question</em>. These subagents will plan multi-hop searches, issue follow-up queries when results are weak, ground claims against retrieved evidence, and hand back high-confidence context (or signal failure) to upstream agents. This is where IR meets agentic reasoning.</li> <li><strong>Build the evaluation flywheel for both retrieval and subagents.</strong> Stand up offline evals (nDCG, MRR, Recall@K, Precision@K), LLM-as-judge harnesses, human-in-the-loop labeling, and online experimentation. Extend evaluation beyond ranking metrics to measure subagent decision quality — <em>did it ask the right follow-up?</em>, <em>did it correctly recognize when retrieval failed?</em>, <em>did it
Related Searches
Explore more opportunities matching this role's title, location, and skills.
Get the top 10 hyper-growth roles delivered to your inbox every Tuesday.