Collaborate with development and infrastructure teams to design and implement scalable, reliable, and secure DevOps solutions.
Build, maintain, and optimize CI/CD pipelines using GitLab CI/CD for automated testing, deployment, and delivery.
Manage and operate containerized environments with Docker and Kubernetes, ensuring high availability, fault tolerance, and observability.
Design and manage infrastructure using Infrastructure as Code (IaC) principles and configuration management tools.
Implement and enhance monitoring, alerting, and visualization using Prometheus, Grafana, and related tools to ensure system reliability and performance.
Work closely with backend teams (Node.js) to support microservices architecture and service-level reliability goals.
Maintain and optimize Nginx configurations for traffic management, load balancing, and security hardening.
Design, deploy, and manage PostgreSQL databases in high-availability and performance-optimized environments, including replication, backup, and monitoring strategies.
Implement centralized logging and analysis pipelines using the ELK Stack (Elasticsearch, Logstash, Kibana) for better observability and troubleshooting.
Contribute to the architecture and scaling of Ceph storage clusters and distributed storage systems.
Participate in incident management, root cause analysis, and continuous improvement of system resilience.
Automate operational workflows, provisioning, and recovery procedures to improve efficiency and reduce manual overhead.
Ensure system and application security following best practices for container hardening, network security, and secrets management.
Qualifications
3دyears of professional experience as a DevOps Engineer, Site Reliability Engineer, or in a related role.
Strong experience managing Linux-based production systems.
Proficiency in Bash or Python scripting for automation and system administration.
Proven hands-on experience with Docker and Kubernetes in production.
Strong understanding of Git, GitLab CI/CD, and modern DevOps practices.
Experience with monitoring and observability stacks (Prometheus, Grafana, Alertmanager).
Experience managing Nginx for high-performance web serving, load balancing, and reverse proxying.
Experience managing and tuning PostgreSQL databases.
Experience implementing and maintaining ELK Stack for centralized logging and observability.
Familiarity with infrastructure as code tools (e.g., Terraform, Ansible).
Experience working with Ceph or similar distributed storage solutions.
Good understanding of microservices and containerized architecture.
Nice to Have
Experience with Node.js application environments and debugging.
Familiarity with WebRTC and WebSocket or real-time communication systems.
Experience with security hardening and vulnerability management in containerized deployments.
Hands-on experience with load testing, performance tuning, and capacity planning