دیجی کالا
دیجی کالا

Operation & Performance Monitoring Engineer

Tehran/ Vanak
Full Time
شنبه تا چهارشنبه
-
-
More than 5001 employees
Internet Provider / E-commerce / Online Services
Iranian company dealing only with Iranian entities
1385
Privately held
توضیحات بیشتر

key Requirements

3 years experience in similar position

Job Description

About the Role:

We are seeking a dedicated and proactive Monitoring Specialist to join our 24x7 operations team.
The candidate will be responsible for ensuring the seamless performance, availability, and reliability of
applications and systems through advanced monitoring tools like Grafana, ELK (Elasticsearch,
Logstash, Kibana), Prometheus, and other APM (Application Performance Monitoring) tools. This role
demands real-time monitoring, swift troubleshooting, and effective communication to maintain and
enhance operational performance.

Responsibilities:

Monitoring and Incident Management:

  • Continuously monitor application and system performance using Grafana, ELK, Prometheus, and other tools.
  • Identify, analyze, and resolve performance bottlenecks, latency issues, and system alerts.
  • Manage and escalate incidents based on defined SLAs and protocol.

Proactive Issue Resolution:

  • Conduct root cause analysis (RCA) for recurring issues and provide recommendations for resolution.
  • Develop and implement automated alerting and escalation mechanisms to enhance operational efficien.

System Health and Optimization:

  • Analyze metrics and logs to ensure optimal performance and system reliability.
  • Collaborate with development, infrastructure, and DevOps teams for performance tuning and capacity planning.

Documentation and Reporting:

  • Maintain detailed logs of incidents, resolutions, and RCA outcomes.
  • Generate periodic reports on system performance, availability, and incident trends for stakeholders.

Continuous Improvement:

  • Recommend and implement enhancements to monitoring dashboards and tools.
  • Stay updated on the latest monitoring technologies and integrate them into existing workflows.


Requirements:

  • Proficient in monitoring tools like Grafana, ELK (Elasticsearch, Logstash, Kibana), Prometheus, a similar platforms.
  • Hands-on experience with Application Performance Monitoring (APM) tools.
  • Good understanding of Linux/Unix operating systems.
  • Familiarity with cloud platforms and containerized environments (Docker, Kubernetes).

Analytical and Problem-Solving Skills:

  • Ability to interpret logs, metrics, and system alerts effectively.
  • Strong troubleshooting skills for applications, infrastructure, and network layers.

Communication and Collaboration:

  • Excellent communication skills to report and escalate issues promptly.
  • Experience collaborating with cross-functional teams, including DevOps, developers, and infrastructure specialists.

Job Requirements

Gender
Men / Women

ثبت مشکل و تخلف آگهی

ارسال رزومه برای دیجی کالا