Job reference #6170_8470618
We’re seeking talented Data Engineers to help build, scale and maintain and optimize our data workflows into our data lake.
Key Responsibilities
- Design and develop high volume, scalable, extensible and reliable big data processing and analytics pipelines.
- Provide technical solutions for product needs as well as for supporting operations.
- Acquire, clean and analyze large, messy data sets from Big Data sources
- Integrate data from multiple internal/external data sources and APIs
- Automate, extend and scale the data processing and analytics pipeline
- Create custom tools to streamline and optimize workflow and enable cohesive data driven applications
- Design and develop SQL scripts and tools to support adhoc analytical requests
Requirement
- Proficiency in distributed computing principles. Experience with Cloudera/MapR/Hortonworks.
- Proficiency with Spark, Hadoop MapReduce, HDFS. Strong expertise in Scala/Spark is very desirable.
- Excellent knowledge of data structures, algorithms and design patterns
- Deep understanding of SQL/NoSQL and system performance on Big Data platforms
- Proficiency in SQL and/or at least one of the high level programming languages, Java/Scala/Python
- Experience in large scale data analysis on platforms like Hadoop/Spark in Pig / Hive / Spark
- Experience with NoSQL databases, such as HBase, Cassandra, MongoDB is a plus
- Ability to work independently and collaboratively within a team
- Flexible, adaptive, quick learner