Senior DevOps Engineer
- business Talent Job Seeker
- workFull-time
About the Role We are looking for a Senior DevOps Engineer with deep expertise in platform infrastructure to support a commercial SaaS product. You will be responsible for designing, building, and maintaining secure, scalable cloud infrastructure with a strong emphasis on Kubernetes orchestration, identity management, and security best practices. This role offers the opportunity to work on cutting-edge infrastructure supporting AI/ML workloads and GPU-accelerated computing. Key Responsibilities Design, deploy, and manage production Kubernetes clusters across cloud environments (AWS EKS, GCP GKE, Azure AKS, Native) Implement and maintain Infrastructure as Code using Terraform. Architect and implement authentication and authorization systems (OAuth 2.0, OIDC, SAML, RBAC) Design and enforce security policies, network segmentation, and zero-trust architecture Build and optimize CI/CD pipelines for automated testing, security scanning, and deployment Implement secrets management solutions Monitor infrastructure health, performance, and security using observability tools Manage cloud costs and optimize resource utilization Document infrastructure architecture, runbooks, and operational procedures Collaborate with development teams to ensure smooth deployments and platform reliability Required Qualifications 6+ years of experience in DevOps, SRE, or Infrastructure Engineering roles Strong hands-on experience with Kubernetes (cluster administration, Helm, operators, CRDs) Proficiency with at least one major cloud provider (AWS, GCP, or Azure) and associated services Experience implementing authentication/authorization systems (OAuth 2.0, OIDC, SAML, JWT) Solid understanding of cloud security principles, IAM policies, and network security Advanced proficiency using mulitple AI Coding Assistant to develop software with guardrails to ensure high quality. Experience with Infrastructure as Code (Terraform strongly preferred) Proficiency in scripting languages (Bash, Python, Go) Experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD) Familiarity with container security, vulnerability scanning, and compliance frameworks Strong troubleshooting skills and experience with production incident management Preferred Qualifications Experience with GPU workloads and AI/ML infrastructure (NVIDIA GPU Operator, CUDA, vGPU) Familiarity with ML platforms and model serving infrastructure (Kubeflow, MLflow, Ray, Triton) Experience supporting commercial SaaS products with high availability requirements Knowledge of service mesh technologies (Istio, Linkerd, Cilium) Experience with identity providers and SSO integration (Okta, Auth0, Keycloak) Familiarity with compliance frameworks (SOC 2, HIPAA, GDPR, FedRAMP) Experience with GitOps workflows and tools (ArgoCD, Flux) Cloud certifications (AWS Solutions Architect, CKA/CKAD, GCP Professional Cloud Architect) Multi-cloud or hybrid cloud architecture experience Technical Environment Cloud Platforms: AWS (primary), GCP, Azure Orchestration: Kubernetes (EKS/GKE), Helm, Kustomize, ArgoCD Infrastructure as Code: Terraform, Pulumi, CloudFormation Security & Identity: OAuth 2.0, OIDC, Vault, AWS IAM, Cert-Manager Observability: Prometheus, Grafana, DataDog, ELK Stack, OpenTelemetry Networking: VPC, Load Balancers, Ingress Controllers, Service Mesh AI/ML (Preferred): NVIDIA GPU Operator, Kubeflow, Ray, Triton Inference Server
United States of America
Place of work
Talent Job SeekerUnited States of America
About us
Identifica el mejor Talento con Talent Job Seeker
Job ID: 10463879 / Ref: a7664e22518b113fc1699a4e5e1f3dc4