

We are seeking a motivated NOC Engineer to join our SRE team, supporting our mission to deliver reliable, high-availability services and maintain the health of our infrastructure; In this role, you will be responsible for 24/7 monitoring of system performance and uptime, ensuring rapid detection and escalation of incidents, and collaborating closely with technical teams to maintain operational excellence.
Responsibilities
. Proactively monitor production systems, applications, and infrastructure using industry-standard tools;
. Respond to alerts and incidents, performing initial triage and escalating issues to relevant teams as needed;
. Ensure the continuous availability and health of Java-based applications, as well as critical frontend and backend services;
. Assist in investigating recurring issues, identifying patterns, and contributing to RCA (Root Cause Analysis);
. Maintain accurate shift logs and incident documentation, providing clear and concise reports to technical stakeholders;
. Collaborate with SRE, DevOps, and development teams to improve monitoring coverage and alerting rules;
. Identify opportunities for automation or process improvement, and support their implementation as experience grows;
. Adhere to established operational procedures and contribute to their continuous improvement;
Requirements
. At least 2 years of experience in a NOC, IT operations, or similar monitoring-focused role;
. Willingness to work in a 24/7 shift rotation, including nights, weekends, and holidays;
. Good working knowledge of Linux system administration and command-line troubleshooting (equivalent to LPIC-1 level);
. Solid understanding of networking concepts and common protocols (equivalent to CCNA level);
. Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK stack, or similar platforms;
. Exposure to backend technologies and databases such as MySQL, MongoDB, Redis, HAProxy, Kafka, or RabbitMQ;
. Ability to analyze alerts and logs, and perform effective initial troubleshooting;
. Strong sense of responsibility and attention to detail in operational environments;
. Good communication skills, with the ability to document incidents and escalate effectively;
. Self-motivated, organized, and adaptable, with a passion for continuous learning and quality improvement;
Preferred Qualifications
. Experience supporting Java-based applications in production environments;
. Familiarity with incident management and escalation workflows;
. Exposure to automation scripting (e.g., Bash, Python) or basic SRE tasks;
. Familiarity with Docker and Kubernetes for container management;
. Prior experience in a high-availability or cloud-based infrastructure environment;
. Eagerness to learn and grow into more advanced SRE responsibilities;
Benefits
. Transportation discount and voucher
. Organizational food discount
. Learning budget
. Team Building Budget
. Wellness Budget
. Comprehensive health, dental, and vision insurance
ثبت مشکل و تخلف آگهی
ارسال رزومه برای اسنپ باکس