We’re looking for a talented Senior Site Reliability Engineer to join our team. In this role, you will ensure the reliability, scalability, and performance of our systems while enabling rapid development and deployment of features. You will work at the intersection of software engineering and operations, helping to design, automate, and maintain production systems that serve millions of users.
Key Responsibilities
- Design, implement, and maintain scalable, highly available, and fault-tolerant systems.
- Automate infrastructure, deployments, monitoring, and incident response processes
- Troubleshoot and resolve complex production issues, performing root cause analysis and implementing preventive measures
- Collaborate closely with development teams to improve system reliability, performance, and scalability
- Implement monitoring, alerting, and observability systems to ensure system health and proactive issue detection
- Participate in capacity planning, disaster recovery, and performance tuning initiatives.
- Mentor junior engineers and share best practices in SRE, DevOps, and cloud-native operations.
- Contribute to the design and evolution of the platform’s architecture with a focus on reliability and operational excellence.
Qualifications
- 5+ years of experience in site reliability, DevOps, or system engineering roles.
- Strong experience with container orchestration (Docker, Kubernetes).
- Solid understanding of CI/CD pipelines, infrastructure as code (Terraform, Ansible, or similar).
- Knowledge of database scaling and caching strategies (SQL, NoSQL, Redis, or similar).
- Deep knowledge of monitoring, logging, and observability tools (Prometheus, Grafana, ELK, OpenTelemetry).
- Hands-on experience with Linux systems, networking, and security best practices.
- having programming or scripting skills (.NET, Go, Bash, or similar).
- Experience designing for high availability, disaster recovery, and large-scale distributed systems.
- Excellent communication skills for collaborating with cross-functional teams and mentoring others.
- Experience with high-traffic web applications or microservices architectures.