Design, create, maintain and improve ETL and data pipeline to support entire cycle of data usage for data analytics team
Create and maintain data catalog, data linkage and data life cycle for data analyst team
Create and improve automation entire data process from data injection, ETL orchestration to model deployment
Analyze problem and find resolution, as well as work closely with administration to monitor performance and advise any necessary infrastructure changes
Work with business domain experts, data scientist and developers to identify relevant data for analysis
Qualification
Strong coding skill i.e. Python, Spark, SQL
Experience working with varies forms of data infrastructure inclusive of relational database such as SQL, Delta lake and Spark
Experience in cloud platform, especially Azure Databrick and Power BI
Experience with orchestration tools e.g. batch and real-time data processing
Experience with CICD pipeline data and machine learning model deployment
Knowledge in ETL and data pipeline framework, data patterns such as structured, semi structured, understand data in various type of database and data store such as Hive, HBase, Impala, MongoDB, Delta Lake, graph database using batch and streaming mechanism leveraging on-premise and on cloud Big data architecture
Knowledge in data science fundamental such as database, operation system, algorithms, data structures, etc
Interest leaning and working with new technology stack