Site Reliability Engineer

(2 days ago)

KHODRO45 4.5

Tehran/ Mirdamad

Full Time

Working days and hours

Saturday to Wednesday

Business trips

Facilities and Benefits

Loan -Bonus -Health insurance -Coffee shop -Occasional packages and gifts

درباره شرکت سازمان از نگاه آمار مراحل استخدام تصاویر سازمان

Company Size

201 - 500 employees

Industry

Internet Provider / E-commerce / Online Services

Company Type

Iranian company dealing only with Iranian entities

Establishment year

1397

Ownership type

Privately held

توضیحات بیشتر

key Requirements

3 years experience in similar position

Job Description

Role Summary

We are looking for a Site Reliability Engineer who will work closely with our development teams to continuously improve the uptime, scalability, and reliability of our services. This role focuses on application‑level reliability, architecture best practices, automation, and enabling developers — and does not involve day‑to‑day infrastructure maintenance or sysadmin responsibilities.

Key Responsibilities

Partner with development teams to design and improve service architectures with a strong focus on reliability, scalability, and reducing operational toil
Contribute to defining and promoting 12‑Factor App and cloud‑native best practices across teams
Support development teams in deploying and optimizing their services on Kubernetes, without being responsible for infrastructure operations
Build internal tools, scripts, and automation (primarily in Python) to enhance delivery quality, observability, and operational efficiency
Define and implement SLOs/SLIs/SLAs and establish well‑structured reliability standards
Improve service observability by designing metrics, dashboards, and alerting
Participate in incident analysis and root cause investigations, focusing on application and service layers
Identify and automate repetitive processes to reduce operational overhead
Explore and leverage AI‑powered tools to improve development, testing, and operational workflows

Required Skills & Experience

Hands‑on experience deploying and debugging services on Kubernetes
Strong programming skills, preferably in Python
Solid understanding of SRE principles including SLO/SLA/SLI, error budgets, monitoring, and alerting
Strong familiarity with the 12‑Factor methodology and cloud‑native application design
Experience with observability tools (e.g., Prometheus, Grafana)
Ability to analyze complex service‑level issues and propose pragmatic solutions
Familiarity with CI/CD pipelines and release engineering practices

Nice to Have

Experience using AI tools to enhance development, debugging, testing, or operational workflows
Knowledge of containerization and modern deployment practices
Experience designing developer golden paths or platform engineering practices
Understanding of DevOps concepts and ability to collaborate effectively with DevOps and Infra teams

Personal Attributes

A passion for reducing toil and improving software quality through automation
Strong communication skills and ability to collaborate closely with development teams
Product‑oriented thinking with a focus on end‑to‑end service reliability
System‑level thinking and the ability to identify architectural bottlenecks

Job Requirements

Age

22 - 50 Years Old

Gender

Men / Women

ثبت مشکل و تخلف آگهی

ارسال رزومه برای خودرو45

سوابق ارسال رزومه برای این شرکت