We seek an experienced engineer to join our team and help us fulfill our mission by delivering a rich feature set, ensuring high availability, and maintaining stellar performance. In this role, you will be responsible for designing and developing monitoring tools and platforms that guarantee the high-performance stability of our infrastructure.
Responsibilities
- Maintain and configure monitoring services to ensure reliability and uptime;
- Implement monitoring strategies to track the health and performance of systems and services;
- Troubleshoot and resolve issues within the monitoring platforms;
- Optimize and enhance existing monitoring tools for better performance and scalability;
- Collaborate with other technical teams to integrate monitoring solutions across the infrastructure;
- Develop both backend and frontend components of monitoring solutions;
- Implement automated processes for data protection, disaster recovery, and failover procedures;
- Develop, implement, and maintain procedures to measure and track service performance and quality;
- Document problems, define solutions, prioritize issues, and assess the impact of problems;
Requirements
- At least 2 years of work experience as a Software Engineer, SRE, or related positions;
- Proven experience in backend development, preferably using Go, Python, and Flask;
- Strong knowledge of Linux system management and administration;
- Ample experience configuring and automating monitoring tools (Prometheus, Grafana, Zabbix, etc.);
- Experience with performance tuning and optimization for high-traffic systems;
- Experience with at least one logging stack, preferably ELK (Elasticsearch, Logstash, Kibana);
- Experience with CI/CD pipelines and infrastructure as code (IaC) tools like Ansible or GitLab;
- Knowledge of microservices architecture, containerization, and orchestration tools like Docker Swarm and Kubernetes;
- Familiarity with open-source services such as HAProxy, MySQL, Redis, and Memcached;
- Self-motivated, proactive, and capable of multi-tasking in a collaborative environment;
- Excellent problem-solving mindset and the ability to diagnose complex technical issues;
- Detail-oriented and the be able to manage multiple projects and meet deadlines;
- Excellent communication and collaboration skills, essential for effectively working with and supporting team members;
Preferred Qualifications
- Experience in software design and architecture, with a strong understanding of data structures and algorithms;
- Familiarity with design patterns, with the ability to create scalable and maintainable architectures;
- Familiarity in frontend development with knowledge of modern JavaScript frameworks;
- Experience with networking principles of operation systems (DNS, Routing, Firewalls, etc.);
- Knowledge of cloud platform development tools like OpenStack;
- Prior experience in a similar role within a cloud-based environment;
- Experience using artificial intelligence in the implementation of anomaly detection methods;
Suppose you are a skilled developer with experience in system operations and infrastructure management, and eager to work in a dynamic and challenging environment. In that case, we'd love to hear from you!