Manager - Site Reliability Engineer
- business Talent Job Seeker
- directions_car Colombo
- workFull-time
Overview A dynamic and skilled Manager -SRE is required to drive reliability, performance, and operational excellence across critical systems and services. This role involves working closely with engineering teams to build scalable infrastructure, streamline processes, and ensure seamless service delivery. The ideal candidate will have strong troubleshooting skills, deep technical understanding, and leadership capability to guide SRE practices. Key Responsibilities: Lead and manage a team of Site Reliability Engineers, providing guidance, mentorship, and support. Collaborate with cross-functional teams to define and implement strategies for improving system reliability, scalability, and performance. Monitor and analyze system performance metrics, identifying areas for improvement and implementing proactive solutions. Troubleshoot and resolve complex technical issues, ensuring minimal impact on system availability. Implement and maintain monitoring, alerting, and incident response systems. Develop and maintain documentation for system configurations, processes, and procedures. Stay up-to-date with industry trends and emerging technologies, recommending and implementing innovative solutions. Job requirements Previous experience in a similar role, managing a team of Site Reliability Engineers. Strong knowledge of Kubernetes. Proficiency in scripting and automation using languages like Python, Bash, or PowerShell. Experience with monitoring and logging tools, such as Prometheus, Grafana, or ELK stack. Excellent problem-solving and troubleshooting skills. Strong communication and leadership abilities.
Colombo
app.general.countries.Sri Lanka
Place of work
Talent Job SeekerColombo
app.general.countries.Sri Lanka
About us
Identifica el mejor Talento con Talent Job Seeker
Job ID: 10450564 / Ref: 797be19edd25b2051bcd9a4f6a624da5