Responsible for designing, operating, and continuously improving the organization’s cloud and virtualization platforms based on OpenStack and FusionSphere. The role ensures high availability, scalability, automation, and security across compute, storage, and networking layers, while also operating, maintaining, and optimizing Kubernetes clusters to support reliable and scalable containerized applications.
Responsibilities:
- Develop service operations and troubleshooting processes to maintain technological excellence and improve operational efficiency and service uptime.
- Provide operational input to the architecture and strategies of Charging Systems, Mediation, and Roaming.
- Implement operational strategies related to Charging Systems, PCRF, and VAS Cloud Infrastructure.
- Ensure the stability, availability, and continuous operation of Charging, PCRF, and VAS services to meet business requirements.
- Collaborate with Network, ITS, and other related teams to ensure proper integration between systems, applications, and nodes.
- Operate, maintain, and optimize cloud infrastructure based on OpenStack and FusionSphere.
- Design and implement high availability (HA), load balancing, and scalability solutions for critical services.
- Perform advanced troubleshooting across compute, network, and storage layers.
- Administer core OpenStack components such as Nova, Neutron, Cinder, Glance, Keystone, and Heat.
- Manage virtualization clusters and hypervisors (KVM).
- Implement and maintain SDN and overlay technologies such as VXLAN, Geneve, and SR‑IOV.
- Manage and tune storage backends including Ceph, FusionSphere Storage, and SAN.
- Conduct capacity planning, monitoring, and performance optimization.
- Implement infrastructure automation using Ansible (Terraform is considered an advantage).
- Maintain accurate technical documentation and operational runbooks.
- Work closely with DevOps, Security, and Application teams.
- Communicate with the NOC (Network Operation Center) regarding service impacts, changes, and escalation procedures.
- Coordinate with ITS and business teams during major incidents to ensure rapid response and proper support.
- Manage second‑line support for Charging Systems, PCRF, and VAS‑Cloud services and guide first‑line support teams.
- Analyze industry trends in service operations and identify areas for internal improvement.
- Deploy and manage workloads using Helm, Kubernetes manifests, and GitOps workflows.
- Implement observability solutions using Prometheus, Alertmanager, and Grafana.
- Operate and support ELK / Elastic Stack for logging and analytics.
- Participate in on‑call rotations for infrastructure or cluster‑related incidents when required.
Requirements
- Bachelor’s degree or higher in Computer Engineering, Information Technology, or a related field.
- Minimum 3 years of experience in a relevant specialization.
- Experience working in medium to large organizations.
- Strong knowledge of cloud computing and virtualization platforms.
- Advanced experience with OpenStack components and architecture.
- Strong understanding of Kubernetes clusters and container orchestration.
- Expertise in virtualization technologies and KVM hypervisors.
- Experience with SDN/overlay networking technologies.
- Knowledge of storage systems and backend technologies such as Ceph or SAN.
- Advanced troubleshooting skills across compute, network, and storage layers.
- Experience with observability tools including Prometheus, Alertmanager, and Grafana.
- Knowledge of ELK / Elastic Stack for logging and analytics.
- Proficiency in automation tools such as Ansible.
- Strong understanding of system performance, capacity management, and disaster recovery.
- Familiarity with IT service management, configuration management, and infrastructure architecture.