Today, data operations and delivery is expensive, time consuming and frustrating. But it doesn’t have to be. Our client helps companies reliably process and onboard the data they need, when they need it, and where they need it. This allows their customers to focus on driving productivity and creating better business outcomes. Their cloud platform and expert services connects data users and suppliers so data can flow. They have raised $40m and are now looking to add to the team!
- They are capable (through a solid command of various scripting languages (Python, R, SQL)) of taking any source of data and performing an EVL(ST) Extract, Validate, Load, Standardize, Transform to the correct data store, in the form agreed upon by the Data Engineer and the end-user.
- Contribute to the design and development of our Python data workflow management platform
- Design and develop tools to wrangle datasets of small and large volumes of data into cleaned, normalized, and enriched datasets
- Build and enhance a large, scalable Big Data platform (Spark, Hadoop)
- Refine processes for normalization and performance-tuning analytics
- 5+ years of full-time experience in a professional environment
- Expertise in Python
- Experience with ETL and/or other big data processes
- Experience with at least 2 popular big data / distributed computing frameworks, eg. Spark, Hive, Kafka, Map Reduce, Flink
- Experience working independently, or with minimal guidance