Join us in forging a path to greatness. ESDS is a crucible of innovation.
Brief Background
The Data Lake Engineer plays a critical role in architecting, developing, and maintaining robust data lake environments that support enterprise-wide data needs. This role involves designing scalable data pipelines, ensuring high data quality, managing secure and efficient cloud infrastructure, and enabling seamless access to data for analytics and business intelligence. The ideal candidate will bridge the gap between data engineering and infrastructure, driving data strategy and collaboration across teams to unlock actionable insights from vast and diverse data sources.
What the Role needs to Achieve
Design robust data pipelines, manage cloud infrastructure, and ensure data quality for seamless analytics.
ROLES AND RESPONSIBILITIES
Architecture & Deployment: Design, build, and manage cloud-native data lake infrastructure on cloud platforms, with a focus on scalability, resilience, and cost-efficiency.
Data Pipeline Engineering: Develop and maintain robust ETL/ELT workflows using big data technologies like Apache Spark, Hadoop, Kafka, and cloud-native services.
Security & Compliance: Implement advanced security controls, including fine-grained access policies, encryption, and auditing mechanisms to ensure compliance with data privacy regulations (e.g., GDPR, HIPAA, PDPB).
Data Governance & Quality: Collaborate with data stewards and governance teams to establish and enforce standards for metadata management, lineage tracking, and data quality monitoring.
Monitoring & Optimization: Continuously track the health and performance of the data lake environment, proactively identifying and resolving bottlenecks or anomalies to ensure reliable, high-throughput operations.
Issue Resolution: Troubleshoot and resolve challenges in data ingestion, transformation, access, and storage with a focus on minimizing downtime and enhancing user experience.
Architecture Optimization: Tune infrastructure components and cloud resource allocation to maximize performance and control costs across data workloads.
Cross-functional Collaboration: Partner with data scientists, analysts, and business stakeholders to understand data needs and deliver scalable, tailored solutions that support analytics and ML initiatives.
Documentation & Standards: Maintain detailed technical documentation covering architecture, data pipelines, metadata structures, access protocols, and operational procedures.
ESSENTIAL KNOWLEDGE AND SKILLS REQUIRED
Strong proficiency with cloud platforms such as AWS, Azure, or GCP, and big data tools like Apache Spark, Hadoop, and Kafka.
Skilled in programming languages such as Python, Java, or Scala, and experienced with both SQL and NoSQL database technologies.
Solid understanding of data governance concepts, data quality management, and metadata practices.
Demonstrated ability to troubleshoot and resolve complex data engineering challenges effectively.
Excellent analytical thinking, problem-solving abilities, and communication skills to engage with cross-functional teams and stakeholders.
EDUCATIONAL QUALIFICATIONS
Bachelor’s degree in Computer Science or a related field; Master’s degree in Data Engineering or a relevant discipline is preferred.
EXPERIENCE
Minimum of 3+ years of hands-on experience in data engineering