An opportunity to join a leading data engineering and information supply chain operator, with Series B funding from Two Sigma and others.
Our client’s platform provides an industry-wide solution by operating and maintaining firms' data supply chains, so firms get past the first mile challenge of ingesting data and speed ahead to generating returns. With this, they continue to scale and enhance their platform, so clients can have access to best-in-class technologies and services.
They clean, normalize, and enrich datasets, delivering them delightfully through our platform in the cloud. That way, businesses can say goodbye to the burdens of data, and hello to the benefits. Our team provides data engineering support to our partners, so they have more time and energy for finding insights and creating value.
A network where data is simple. So, businesses can get straight to the point.
By market definition, a data engineer is responsible for creating, maintaining and understanding data and the resulting delivery infrastructure. They are the connection between smart business users and not-so-smart data repositories.
They are capable (through a solid command of various scripting languages (Python, R, SQL)) of taking any source of data and performing an EVL(ST) Extract, Validate, Load, Standardize, Transform to the correct data store, in the form agreed upon by the Data Engineer and the end-user.
Data Engineers are often responsible for the efficacy, quality and elegance of their solutions. They are business savvy and understand the importance of the data they are piping in - to an extent.
- Contribute to the design and development of our Python data workflow management platform
- Design and develop tools to wrangle datasets of small and large volumes of data into cleaned, normalized, and enriched datasets
- Build and enhance a large, scalable Big Data platform (Spark, Hadoop)
- Refine processes for normalization and performance-tuning analytics
- You love building elegant solutions that scale
- You bring deep experience in the architecture and development of quality backend production systems, specifically in Python
- You love working on high-performing teams, collaborating with team members, and improving our ability to deliver delightful experiences to our clients
- You are excited by the opportunity to solve challenging technical problems, and you find learning about data fascinating
- You understand Server, Network, and Hosting Environments, RESTful and other common APIs, common data distribution, and hosted storage solutions
- 5+ years of full-time experience in a professional environment
- Expertise in Python
- Experience with ETL and/or other big data processes
- Experience with at least 2 popular big data / distributed computing frameworks, eg. Spark, Hive, Kafka, Map Reduce, Flink
- Experience working independently, or with minimal guidance
- Strong problem solving and troubleshooting skills
- Ability to exercise judgment to make sound decisions
- Proficiency in multiple programming languages
- Strong communications skills, interpersonal skills, and a sense of humor
- Data skills: RDBMS SQL and NOSQL, structured and unstructured data, BigQuery
- Proficiency in Jupyter, C24; familiarity with ETL, CDC, and workflow tools
- Experience working in a cloud-based environment, such as GCP or AWS
We’ve moved firmly into a data driven world. Companies that can get their hands on more data quickly and efficiently have an edge. We help companies achieve this edge.
Today, data delivery and operations is expensive, time consuming and frustrating. But it doesn’t have to be. We help companies reliably process and onboard the data they need, when they need it, and where they need it. Our cloud platform and expert services connect data users and suppliers so data can flow. This allows our customers to focus on driving productivity and creating better business outcomes.
NO REMOTE - NO CONTRACT - NO C2C - NO 3rd PARTIES