About the Role
<div class="content-intro"><p><strong data-stringify-type="bold">Why work at Nebius<br></strong>Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.</p> <p><strong>Where we work<br></strong>Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 1400 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.</p></div><h3><strong><span data-contrast="auto">The role</span></strong><span data-ccp-props="{}"> </span></h3> <p>We are establishing a global L3 Support Line from scratch to own the highest level of technical escalation for server and rack infrastructure across Europe and the US. Operating at the intersection of datacenter operations, R&D engineering, and ODM partners, this team will take full ownership of complex server and firmware incidents — driving root-cause resolution and converting recurring failures into scalable architectural improvements.</p> <p>You will lead a team of ~10 L3 engineers in Europe (Amsterdam HQ + other DC areas), partnering closely with the regional L3 Lead to deliver 24/7 global coverage.</p> <p data-start="2991" data-end="3255">In this role, you will act as Incident Commander for high-severity production events, establish formal problem management practices, and design enterprise-grade support frameworks for contracted bare-metal customers — including two large FAANG clients at launch.</p> <p data-start="3262" data-end="3491">This is a managerial role with deep technical accountability: you will lead people and processes while retaining the capability to drive advanced Linux, hardware, and firmware investigations when L2 reaches its technical ceiling.</p> <p>You’re welcome to work in our office in <strong>Amsterdam</strong>, <strong>the Netherlands.</strong></p> <p><strong><span data-ccp-props="{"335559685":720,"335559991":360}"><span class="TextRun MacChromeBold SCXW95146830 BCX0" lang="EN-US" data-contrast="auto"><span class="NormalTextRun SCXW95146830 BCX0" data-ccp-charstyle="Strong">Your responsibilities will include:</span></span><span class="EOP SCXW95146830 BCX0" data-ccp-props="{"134233117":true,"134233118":true}"> </span></span></strong></p> <h4 data-start="1302" data-end="1344"><span style="text-decoration: underline;">Incident Command (Highest Priority)</span></h4> <ul data-start="1345" data-end="1624"> <li data-start="1345" data-end="1415"> <p data-start="1347" data-end="1415">Act as Incident Commander for high-severity infrastructure incidents</p> </li> <li data-start="1416" data-end="1477"> <p data-start="1418" data-end="1477">Lead structured triage and drive permanent root-cause fixes</p> </li> <li data-start="1478" data-end="1564"> <p data-start="1480" data-end="1564">Align L2, Cloud Ops, R&D, NOC, DC Automation, and ODM vendors during critical events</p> </li> <li data-start="1565" data-end="1624"> <p data-start="1567" data-end="1624">Establish clear postmortems and follow-through mechanisms</p> </li> </ul> <h4 data-start="1626" data-end="1665"><span style="text-decoration: underline;">Problem Management & Reliability</span></h4> <ul data-start="1666" data-end="1926"> <li data-start="1666" data-end="1740"> <p data-start="1668" data-end="1740">Identify recurring failure patterns and convert them into scalable fixes</p> </li> <li data-start="1741" data-end="1797"> <p data-start="1743" data-end="1797">Build structured escalation loops with R&D and vendors</p> </li> <li data-start="1798" data-end="1875"> <p data-start="1800" data-end="1875">Lead quarterly reliability reviews across platforms, firmware, and hardware</p> </li> <li data-start="1876" data-end="1926"> <p data-start="1878" data-end="1926">Translate analytics into preventive improvements</p> </li> </ul> <h4 data-start="1928" data-end="1964"><span style="text-decoration: underline;">Build & Scale the L3 Function</span></h4> <ul data-start="1965" data-end="2232"> <li data-start="1965" data-end="2044"> <p data-start="1967" data-end="2044">Design the L3 operating model (intake, prioritization, ownership, escalation)</p> </li> <li data-start="2045" data-end="2096"> <p data-start="2047" data-end="2096">Hire and grow a distributed team across EU and US</p> </li> <li data-start="2097" data-end="2169"> <p data-start="2099" data-end="2169">Define collaboration models across internal teams and external vendors</p> </li> <li data-start="2170" data-end="2232"> <p data-start="2172" data-end="2232">Influence cross-functional outcomes without direct authority</p> </li> </ul> <h4 data-start="2234" data-end="2270"><span style="text-decoration: underline;">Enterprise Bare Metal Support</span></h4> <ul data-start="2271" data-end="2439"> <li data-start="2271" data-end="2364"> <p data-start="2273" data-end="2364">Define enterprise-grade support processes (SLA handling, escalation paths, severity models)</p> </li> <li data-start="2365" data-end="2439"> <p data-start="2367" data-end="2439">Act as senior escalation interface for complex customer-impacting issues</p> </li> </ul> <p><strong><span data-contrast="auto"><span data-ccp-charstyle="Strong">We expect you to have:</span></span><span data-ccp-props="{"134233117":true,"134233118":true}"> </span></strong></p> <ul> <li data-start="2755" data-end="2848"> <p data-start="2757" data-end="2848">Experience building or leading L3 / escalation support for datacenter server infrastructure</p> </li> <li data-start="2849" data-end="2914"> <p data-start="2851" data-end="2914">Strong Incident Commander experience in production environments</p> </li> <li data-start="2915" data-end="2982"> <p data-start="2917" data-end="2982">Background supporting enterprise customers under contractual SLAs</p> </li> <li data-start="2983" data-end="3061"> <p data-start="2985" data-end="3061">Proven ability to build incident & problem management processes from scratch</p> </li> <li data-start="3062" data-end="3126"> <p data-start="3064" data-end="3126">People leadership experience (hiring, coaching, scaling teams)</p> </li> <li data-start="3127" data-end="3164"> <p data-start="3129" data-end="3164">Strong English communication skills</p> </li> </ul> <p><strong><span data-contrast="auto"><span data-ccp-charstyle="Strong">It will be </span><span data-ccp-charstyle="Strong">an added bonus</span><span data-ccp-charstyle="Strong"> if you have:</span></span><span data-ccp-props="{"134233117":true,"134233118":true}"> </span></strong></p> <ul> <li data-start="3188" data-end="3251"> <p data-start="3190" data-end="3251">Deep Linux, hardware, and firmware troubleshooting capability</p> </li> <li data-start="3252" data-end="3311"> <p data-start="3254" data-end="3311">GPU server platform experience (e.g., NVIDIA diagnostics)</p> </li> <li data-start="3312" data-end="3353"> <p data-start=&quo
Related Searches
Explore more opportunities matching this role's title, location, and skills.
Ready to apply?
Click below to apply directly on Nebius's careers page.
Similar Roles
Get the top 10 hyper-growth roles delivered to your inbox every Tuesday.