Job Description:
We are looking for a highly skilled Data Engineer with experience in building and managing data systems, responsible for designing and maintaining scalable data pipelines, enabling efficient data ingestion, storage, and processing across various platforms, while also optimizing data for analytical workloads.
Key Responsibilities:
- Design and develop data pipelines: Create and manage scalable ETL/ELT pipelines, ensuring efficient data flow into SQL Server, ClickHouse, and MinIO.
- Data lake and warehouse management: Architect and maintain data storage solutions using MinIO for object storage and ClickHouse for analytical queries, ensuring optimal performance for large-scale datasets.
- Big data processing: Implement and optimize real-time and batch data processing workflows using Kafka, Apache Flink, and SQL Server.
- OLAP systems: Develop and manage OLAP systems using ClickHouse and SQL Server to support high-performance analytical queries.
- Data governance and security: Implement and enforce best practices for data security, integrity, and governance.
- Collaboration: Partner with data scientists, analysts, and stakeholders to ensure that data infrastructure meets the needs of business-critical analytics.
- Automation and orchestration: Use Apache Airflow to orchestrate and automate data workflows, ensuring the reliability and scalability of data pipelines.
Qualifications:
- Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- Experience:
- 2+ years of experience in data engineering.
- Experience in building and managing OLAP systems for large-scale analytical workloads.
- Experience in working with both SQL and NoSQL databases.
- Advanced Python and SQL programming skills for data processing and automation.
- Skills:
- Expertise in designing and maintaining data lakes and warehouses, particularly with SQL Server, ClickHouse, and MinIO.
- Strong understanding of OLAP system optimization and performance tuning.
- Experience with ETL/ELT development using Python and orchestration tools like Apache Airflow.
- Familiarity with CI/CD pipelines and version control for managing data engineering projects.
- Solid understanding of stream data processing frameworks for data ingestion and transformation.
- Familiarity with data governance, security, and compliance best practices.
- Experience with Docker for containerization and Kubernetes for orchestration.
- Strong problem-solving skills and the ability to work effectively in a collaborative environment.
Preferred Qualifications:
- Experience with distributed storage systems and large-scale data infrastructure.
- Certifications in SQL Server, Kafka, or other relevant technologies.
- Experience in database administration (DBA).
- Familiarity or experience with AI and ML.
Benefits:
- Competitive salary with performance-based bonuses.
- Comprehensive health and wellness benefits.
- Opportunities for professional development and career growth.
- Flexible working environment.