
We are seeking a Data Engineer (Senior) to develop data pipelines, support our ML feature engineering workflows, and contribute to the ongoing evolution of our data lake house based on Iceberg, MinIO, Spark and Trino and SQL databases. You will work closely with senior engineers, ML engineers, and data scientists and analysts to deliver reliable, well-tested data flows.
Responsibilities:
• Develop ETL/ELT pipelines using Apache Airflow and Spark integrating MySQL, MongoDB, PostgreSQL and ClickHouse into Iceberg.
• Build and maintain ingestion scripts and feature extraction jobs using Python, Spark, and SQL.
• Build real-time data pipelines using Kafka, Spark streaming and Flink
• Maintain data assets stored in MinIO and queried via Trino.
• Contribute to the design and improvement of data Lakehouse domains and staging layers.
• Collaborate with ML engineers to prepare feature sets for ML models.
• Help maintain FastAPI-based ML model serving pipelines (I/O schemas, data validation, transformations).
• Implement data quality tests, anomaly monitoring, and automated alerts.
• Have experience with data cataloging and lineage management
• Participate in code reviews, write documentation, and support platform reliability.
• Troubleshoot production issues related to data ingestion, storage, or compute.
Requirements:
• 3–5 years of experience in Data Engineering or related roles.
• Strong Spark and Kafka experience
• Good SQL knowledge; ability to work with PostgreSQL, MySQL and SQL Server.
• Experience working with Airflow DAGs
• Understanding of object storage (S3/MinIO) and data lake concepts.
• Familiarity with data modeling, staging layers, and transformation patterns.
• Knowledge of Iceberg/Delta/Parquet-based workflows (nice to have).
• Strong analytical mindset and willingness to learn distributed systems.
ثبت مشکل و تخلف آگهی
ارسال رزومه برای دیجی پی
مقایسه من با سایر متقاضیان