Site Reliability Engineer (Expert) 0630

Job Location

Midrand, South Africa

Job Description

Why Join Us? Work on high-availability, multi-region deployments Shape our observability strategy and implement automation at scale Collaborate with development teams to enhance service reliability Lead incident response and drive systematic improvements Essential Skills & Experience 10 years in SRE, DevOps, or similar roles Strong networking fundamentals Skilled with AWS and cloud-native technologies Proficiency in Python, Go, or JavaScript/TypeScript Experience with Docker, Kubernetes, CI/CD, and GitOps (Flux/ArgoCD) Knowledge of monitoring tools (Grafana, Prometheus, Loki, Tempo) Bonus Skills Advanced Kubernetes certification (CKA/CKAD) Experience with Terraform, PostgreSQL, MongoDB Expertise in performance optimization & cost management Security hardening & compliance implementation Tech Stack You'll Work With Containerization: Kubernetes, Docker Observability: Grafana Stack, Prometheus Infrastructure: Cloud-native technologies Programming: Go, Python, TypeScript/JavaScript CI/CD: Modern pipeline tools Multi-region deployments & microservices architecture Key Responsibilities System Reliability: Design and implement scalable infrastructure solutions Observability: Architect and maintain monitoring & alerting systems Automation: Develop automated workflows to reduce manual effort Incident Management: Lead major incident response and drive improvements Technical Leadership: Mentor team members and influence engineering decisions Tool Development: Build internal tools to enhance operational efficiency Best Practices: Establish and enforce SRE methodologies ð© Ready to take on this challenge? Apply now with your latest and detailed CV!

Location: Midrand, ZA

Posted Date: 7/4/2025

View More Jobs

Contact Information

Contact	Human Resources