SRE Expert

(10 days ago)

Agah

Tehran/ Vanak

Full Time

Working days and hours

8-17

Business trips

Facilities and Benefits

Loan -Health insurance -Flexible working hours -Game room -Lunch -Snacks -Resting space -Breakfast

درباره شرکت

Company Size

1001 - 5000 employees

Industry

Finance / Investment

Company Type

Iranian company dealing with Iranian and foreign customers

Establishment year

1384

Ownership type

Privately held

توضیحات بیشتر

key Requirements

5 years experience in similar position

Docker - Intermediate

Kubernetes - Intermediate

Job Description

A Site Reliability Engineer (SRE) plays a pivotal role in ensuring that an organization's IT services and infrastructure are highly available, scalable, and efficient. This position often involves a blend of development, operations, and troubleshooting tasks.

System Reliability and Availability: Ensure high availability and reliability of services and infrastructure. This includes proactive monitoring, incident response, and post-mortem analysis to prevent recurrence of incidents.
Performance Management: Monitor and optimize system performance to meet the service level objectives (SLOs) and service level agreements (SLAs). This involves understanding and managing the capacity and scalability of services.
Incident Management and Response: Lead the response to system outages and performance issues, including on-call duties. Develop automation tools to help in the rapid resolution of incidents and to prevent their recurrence.
Automation and Tooling: Design and implement automation tools and frameworks to reduce manual operational work. This could include scripts for deployment, monitoring, and infrastructure management.
Cross-functional Collaboration: Work closely with development teams to design and implement scalable, reliable, and efficient systems. This involves providing input on architectural decisions, optimizing resource utilization, and ensuring system resilience.
Continuous Improvement: Continuously analyze current processes and systems for improvement opportunities. Implement best practices for system reliability and availability.
Disaster Recovery and Backup: Develop and maintain disaster recovery plans, including regular testing to ensure system resilience.
Documentation: Maintain detailed documentation of the system architecture, configurations, processes, and service records to ensure that the knowledge is shared and accessible within the team.

Requirements / Skills

Education: A bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Experience: Proven experience in a site reliability engineering role or similar, with a strong background in software development and system administration.
Technical Skills:
- Proficiency in programming languages.
- Experience with cloud services and container orchestration tools (Kubernetes, Docker).
- Strong understanding of networking principles and protocols.
- Experience with continuous integration and deployment (CI/CD) practices.
Problem-Solving Skills: Ability to troubleshoot and resolve complex technical issues under pressure.
Communication Skills: Excellent verbal and written communication skills, with the ability to effectively communicate technical concepts to non-technical stakeholders.
Teamwork: Ability to work collaboratively in a cross-functional team and interact effectively with developers, operations teams, and management.

Job Requirements

Age

27 - 38 Years Old

Gender

Men / Women

Military service

Military service must be done

Education

Bachelor| Computer and IT

Software

Kubernetes| Intermediate

Docker| Intermediate

ثبت مشکل و تخلف آگهی

ارسال رزومه برای شرکت کارگزاری آگاه

سوابق ارسال رزومه برای این شرکت

SRE Expert

Company benefits

key Requirements

Job Description

Job Requirements