Senior Site Reliability Engineer - AI/ML optimized GPU clusters

The organization Our client operates one of the largest GPU infrastructures in the world — 90,000+ GPUs and 10InfiniBand fabrics across five global data centers. Their infrastructure doubles in size every year. We’re looking for engineers who love getting deep into Linux systems, pushing hardware and software to their limits, and making the world’s fastest AI and HPC workloads run even faster. They develop their own proprietary stack / cloud environment, fully optimized for AI/ML applications. The role Your responsibilities will include: Ensure fault-tolerance, scale, and uninterrupted operations for the service. Use cutting-edge cloud technology to solve a variety of infrastructure problems. Implement and improve CI/CD processes. Your profile Solid experience with programming languages (like Go, Python, or C++), beyond scripting; You have experience in environments with a multitude of GPUs distributed over multiple nodes ; Good understanding of classic algorithms and data structures ; Commercial experience with, and deep understanding of, Unix/Linux systems and network technology ; Solid experience with CI/CD and IaC ; Experience with containerization and configuration management (Ansible, Salt, Terraform, Docker, Kubenetes , Helm). It will be an added bonus if you have: A desire to be involved in backend development; Experience designing, developing, and running high-load distributed systems; Experience with a variety of cloud platforms. Coding interviews are part of the process. What's offered Competitive salary and comprehensive benefits package. Opportunities for professional growth and taking ownership in a massivley scaling environment. Flexible working arrangements. A dynamic and collaborative work environment that values initiative and innovation. On-site in Amsterdam or full-remote (across Europe).

Place of work

Talent Job Seeker
Noord-Brabant
app.general.countries.Netherlands

About us

Identifica el mejor Talento con Talent Job Seeker



Job ID: 10310398 / Ref: cbf465c0e4609b6dd1b0c7956869183a

Talent Job Seeker