شرکت کارگزاری آگاه
شرکت کارگزاری آگاه

SRE Expert

Tehran/ Vanak
Full Time
8-17
-
Loan -Health insurance -Flexible working hours -Game room -Lunch -Snacks -Resting space -Breakfast
501 - 1000 employees
Finance / Investment
Iranian company dealing with Iranian and foreign customers
1384
Privately held
توضیحات بیشتر

key Requirements

5 years experience in similar position
Docker - Intermediate
Kubernetes - Intermediate

Job Description

A Site Reliability Engineer (SRE) plays a pivotal role in ensuring that an organization's IT services and infrastructure are highly available, scalable, and efficient. This position often involves a blend of development, operations, and troubleshooting tasks.

  • System Reliability and Availability: Ensure high availability and reliability of services and infrastructure. This includes proactive monitoring, incident response, and post-mortem analysis to prevent recurrence of incidents.
  • Performance Management: Monitor and optimize system performance to meet the service level objectives (SLOs) and service level agreements (SLAs). This involves understanding and managing the capacity and scalability of services.
  • Incident Management and Response: Lead the response to system outages and performance issues, including on-call duties. Develop automation tools to help in the rapid resolution of incidents and to prevent their recurrence.
  • Automation and Tooling: Design and implement automation tools and frameworks to reduce manual operational work. This could include scripts for deployment, monitoring, and infrastructure management.
  • Cross-functional Collaboration: Work closely with development teams to design and implement scalable, reliable, and efficient systems. This involves providing input on architectural decisions, optimizing resource utilization, and ensuring system resilience.
  • Continuous Improvement: Continuously analyze current processes and systems for improvement opportunities. Implement best practices for system reliability and availability.
  • Disaster Recovery and Backup: Develop and maintain disaster recovery plans, including regular testing to ensure system resilience.
  • Documentation: Maintain detailed documentation of the system architecture, configurations, processes, and service records to ensure that the knowledge is shared and accessible within the team.


Requirements / Skills

  • Education: A bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Experience: Proven experience in a site reliability engineering role or similar, with a strong background in software development and system administration.
  • Technical Skills:
  • - Proficiency in programming languages.
  • - Experience with cloud services and container orchestration tools (Kubernetes, Docker).
  • - Strong understanding of networking principles and protocols.
  • - Experience with continuous integration and deployment (CI/CD) practices.
  • Problem-Solving Skills: Ability to troubleshoot and resolve complex technical issues under pressure.
  • Communication Skills: Excellent verbal and written communication skills, with the ability to effectively communicate technical concepts to non-technical stakeholders.
  • Teamwork: Ability to work collaboratively in a cross-functional team and interact effectively with developers, operations teams, and management.

Job Requirements

Age
27 - 38 Years Old
Gender
Men / Women
Military service
Military service must be done
Education
Bachelor| Computer and IT
Software
Kubernetes| Intermediate Docker| Intermediate

ثبت مشکل و تخلف آگهی

ارسال رزومه برای شرکت کارگزاری آگاه