We are looking for an experienced Data Engineer to join our growing Data Infrastructure team. The hired will be responsible for expanding and optimizing our data ingestion pipeline architecture; Will support our data scientists and business intelligence on data initiatives and will ensure our optimal data infrastructure is consistent, stable, and robust throughout ongoing projects. The ideal candidate is proactive and self-motivated, supporting the data needs of multiple teams, systems, and products; And will be excited by the prospect of optimizing or even re-designing TAPSI’s data infrastructure architecture to support our next generation of products and data initiatives.
Tasks in detail:
- Creating and maintaining the optimal data ingestion pipeline architectures.
- Assembling large, complex data sets that meet functional / non-functional business requirements.
- Identifying, designing, and implementing internal process improvements; such as automating manual processes, optimizing data delivery, re-designing infrastructure for greater scale-ability, etc.
- Building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and big data technologies.
- Building analytic tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
- Creating data tools for the data analysts and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Developing ETL routines in order to populate databases from sources and also to create aggregates.
- Troubleshooting data issues within and across the business and presents solutions to these issues.
- Leading innovation through exploration, bench-marking, making recommendations, and implementing big data technologies for platforms.
- Performing data updates, indexing, and maintenance in the application database.
Requirements:
- Minimum 3 years of experience as a Data Engineer.
- Bachelor’s degree in Computer Engineering, Computer Science or another quantitative field.
- Big data tools; such as Hadoop (YARN, HDFS), Spark, Kafka, etc.
- Relational SQL and No-SQL databases, including PostgreSQL and MongoDB.
- Data pipeline and workflow management tools: Airflow, Luigi, etc.
- Stream-processing systems: Storm, Spark-Streaming, etc.
- Object-oriented/object function scripting languages: Python, etc.
- Experience in building and optimizing data ingestion pipelines.
- Experience in manipulating, processing, and extracting value from large disconnected data-sets.
- Experience in supporting and working with cross-functional teams in a dynamic environment.
- Familiarity with Linux operated machines.
- Experience in DevOps topics including Docker and Kubernetes.
- Mandatory knowledge in network infrastructure concepts.