Role Description
Responsible for constructing and validating sustainable data pipelines, data structures, and big data solutions, including data lake platforms, to facilitate easy data search and retrieval within the organization. You will work closely with the data structure designer to develop a comprehensive enterprise data structure.
Core Responsibilities
Data Quality and Reliability :
- Design and Implement methods to improve data reliability and quality.
- Combine raw data from different sources to create consistent and machine-readable formats.
Structural Development :
Lead the development and testing of data structure that enable data extraction and transformation for predictive or prescriptive modeling.Process Definition :
Define and set development, test, release, update, and support processes for data engineering operations. Troubleshoot and fix code bugs.Big Data Models :
Lead the development of big data models / use cases based on the data structure and prepare them for data operations endQuery Execution :
Create and execute queries on structured and unstructured data sources to identify process issues or perform mass updates.Feature Layer :
Lead the development and Implementation of the feature zone with the required features / KPIs required by different stakeholders and that supports building robust machine learning models.Batch S8cheduling and Reporting :
Ensure that batch production scheduling and report distribution are accurate and timely.Data Transformation Processes :
Design processes supporting data transformation, data structures, metadata, dependency, and workload management.Competencies
Perfo8rmance ExcellenceLeadership & EmpowermentCollab. & Creating SynergyAgility & ResilienceInnovation & Going DigitalStrategic ThinkingPeople CentricityTechnical Competencies
Proficiency in Java, C#, and Python for developing robust applications
strong Experience with Cloudera or any Big data platform with thecomplementary service like Apache Hive, Apache Scala, BizSpark, Impala, Apache Spark, Data Security, Kafka, HBase, Sqoop, NiFi, Python for Programming, Python for Data Analysis & ML, ...etc.
Proficiency in handling and processing large datasets using distributed computing frameworks.Strong knowledge of SQL for querying and managing relational databases.Deep Understanding of data warehousing principles and best practices.Proficiency in handling and processing large datasets using distributed computing frameworksProficiency in data analysis and visualization tools