دیجی پی
دیجی پی

MLOps Engineer

Tehran/ Jordan
Full Time
شنبه تا چهارشنبه
-
-
201 - 500 employees
Banking
Iranian company dealing only with Iranian entities
1397
Privately held
توضیحات بیشتر

key Requirements

3 years experience in similar position

Job Description

We are looking for a MLOps Engineer to support and enhance our machine learning platform across retail and financial data domains. The role focuses on building automated ML pipelines, maintaining training and serving infrastructure, improving observability, and ensuring reliable model operations across our modern data ecosystem.
Our stack includes Airflow, Docker, MLflow, FastAPI, MinIO, Apache Iceberg, Trino, pandas, Spark, SQL Server, PostgreSQL, ClickHouse, and containerized environments for data and ML workflows.

 

Responsibilities: 

  1. 1. ML Pipelines & Feature Engineering
    • Develop and maintain ML training pipelines using:
    o Airflow
    o Python
    o Pandas, Ray or Spark for scalable feature processing
    • Implement feature engineering jobs that run on Spark or Ray depending on workload characteristics.
    • Automate dataset preparation and scheduled retraining workflows.
    2. Model Lifecycle & Versioning
    • Use MLflow (PostgreSQL backend + MinIO artifacts) for experiment tracking, model registry, and reproducible runs.
    • Maintain model versioning, metadata, and deployment readiness checks.
    3. Model Deployment
    • Package and deploy ML models into production using FastAPI-based inference services.
    • Build deployment pipelines (CI/CD) for promoting models across stage/prod.
    • Implement testing, logging, and payload validation for inference endpoints.
    4. Data & Model Storage
    • Manage training datasets, model binaries, and artifacts stored in MinIO.
    • Integrate model workflows with Apache Iceberg tables and expose features through Trino.
    • Ensure efficient data reads/writes for Spark and Ray jobs.
    5. Automation & Orchestration
    • Create reliable Airflow DAGs for:
    o Data ingestion
    o Feature pipelines
    o Model training
    o Retraining + evaluation workflows
    • Apply DataHub metadata/lineage to Airflow tasks (inlets/outlets).
    6. Monitoring & Observability
    • Implement model monitoring, drift detection, and pipeline health checks.
    • Add metrics, logs, and alerts for:
    o Model serving latency
    o Data quality issues
    o Training/retraining failures
    o Resource usage (Spark/Ray/Airflow workers)
    7. Platform Support
    • Manage containerized environments for ML workflows (Docker).
    • Work with data engineers and ML engineers to optimize Spark and Ray cluster configs.
    • Support troubleshooting across Airflow tasks, Spark/Ray jobs, and model serving APIs.

Requirments: 

Technical Skills
• 2–4+ years of experience in MLOps, ML Engineering, or Data Engineering with ML pipelines.
• Strong Python (pandas, ML, data processing workflows).
• Experience with Apache Airflow in production (Docker environments preferred).
• Practical experience with one of the distributed frameworks:
o Spark (PySpark)
o Ray (Bonus)
• Hands-on experience with:
o MLflow for tracking and model registry
o FastAPI for model serving
o MinIO or other S3-compatible object storage
• Solid understanding of SQL; experience with PostgreSQL, SQL Server, or ClickHouse.
• Experience with Docker-based development and deployment.


Nice to Have

• Experience with Apache Iceberg and Trino.
• Familiarity with DataHub or other metadata/lineage tools.
• Experience monitoring ML systems (Prometheus/Grafana, ELK, etc.).


Soft Skills

• Strong debugging skills for data/ML pipelines.
• Ability to work cross-functionally with data science and engineering teams.
• Good documentation habits and clear communication.

Job Requirements

Gender
Men / Women

ثبت مشکل و تخلف آگهی

ارسال رزومه برای دیجی پی