We are seeking a motivated NOC Engineer to join our SRE team, supporting our mission to deliver reliable, high-availability services and maintain the health of our infrastructure; In this role, you will be responsible for 24/7 monitoring of system performance and uptime, ensuring rapid detection and escalation of incidents, and collaborating closely with technical teams to maintain operational excellence.
Responsibilities
- Proactively monitor production systems, applications, and infrastructure using industry-standard tools;
- Respond to alerts and incidents, performing initial triage and escalating issues to relevant teams as needed;
- Ensure the continuous availability and health of Java-based applications, as well as critical frontend and backend services;
- Assist in investigating recurring issues, identifying patterns, and contributing to RCA (Root Cause Analysis);
- Maintain accurate shift logs and incident documentation, providing clear and concise reports to technical stakeholders;
- Collaborate with SRE, DevOps, and development teams to improve monitoring coverage and alerting rules;
- Identify opportunities for automation or process improvement, and support their implementation as experience grows;
- Adhere to established operational procedures and contribute to their continuous improvement;
Requirements
- At least 2 years of experience in a NOC, IT operations, or similar monitoring-focused role;
- Willingness to work in a 24/7 shift rotation, including nights, weekends, and holidays;
- Good working knowledge of Linux system administration and command-line troubleshooting (equivalent to LPIC-1 level);
- Solid understanding of networking concepts and common protocols (equivalent to CCNA level);
- Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK stack, or similar platforms;
- Exposure to backend technologies and databases such as MySQL, MongoDB, Redis, HAProxy, Kafka, or RabbitMQ;
- Ability to analyze alerts and logs, and perform effective initial troubleshooting;
- Strong sense of responsibility and attention to detail in operational environments;
- Good communication skills, with the ability to document incidents and escalate effectively;
- Self-motivated, organized, and adaptable, with a passion for continuous learning and quality improvement;
Preferred Qualifications
- Experience supporting Java-based applications in production environments;
- Familiarity with incident management and escalation workflows;
- Exposure to automation scripting (e.g., Bash, Python) or basic SRE tasks;
- Familiarity with Docker and Kubernetes for container management;
- Prior experience in a high-availability or cloud-based infrastructure environment;
- Eagerness to learn and grow into more advanced SRE responsibilities;
Benefits
- Transportation discount and voucher
- Organizational food discount
- Learning budget
- Team Building Budget
- Wellness Budget
- Comprehensive health, dental, and vision insurance