Lead Data Engineer – Leadership track
- Work at the new Soho, NYC or Boston HQ of a leading Healthcare firm with a market leading package as they go through an unprecedented merger
- The Advanced Analytics teams include 100+ of the best Data Scientists, Data Engineers and Consultants in the country
- Manages and responsible for successful delivery of large-scale data structures, pipelines and efficient ETL workflows.
- Design, implement and build ETL pipelines that deliver data with measurable quality
- Data engineering team lead for large and complex projects involving multiple resources and tasks, providing individual mentoring in support of company objectives.
Day to day
- Designs and develops complex and large-scale data structures and pipelines to organize, collect and standardize data to generate insights and addresses reporting needs.
- Building real-time data pipelines using Redshift, S3, Kinesis, Spark structured streaming, Akka
- streams, and similar stacks on leading cloud platforms.
- Writes complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for realtime and offline analytic processing.
- Develop frameworks, standards & reference material for architecture and associated products.
- Designs data marts and data models to support Data Science and other internal customers.
- Behaves as mentor to junior team members to provide technical advice.
- Applies knowledge of systems and products to consult and advise on additional efforts across multiple domains spanning broader enterprise.
- Collaborates with data science team to transform data and integrate algorithms and models into highly available, production systems.
- Uses in-depth knowledge on Hadoop architecture, HDFS commands and experience designing & optimizing queries to build scalable, modular, and efficient data pipelines.
- Uses advanced programming skills in Python, Java or any of the major languages to build robust data pipelines and dynamic systems.
- Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
- Experiments with available tools and advises on new tools in order to determine optimal solution given the
requirements dictated by the model/use case.
What will help you get this role
- Masters or PHD in Computer Science preferred
- 8 or more years of progressively complex related experience
- In-depth knowledge of large-scale search applications and building high volume data pipelines
- Experience building and implementing data transformation and processing solutions
- Advanced knowledge in Hadoop or Spark architecture, HDFS commands and experience designing & optimizing queries against data in the HDFS environment
- Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar
- Experience with bash shell scripts, UNIX utilities & UNIX Commands
- Ability to understand and build complex systems and solve challenging analytical problems
- Ability to leverage multiple tools and programming languages to analyze and manipulate large data sets from disparate data sources
- Proven ability to create innovative solutions to highly complex technical problems
- Ability to communicate technical ideas and results to non-technical clients in written and verbal form
- A tight-knit team of passionate people and a tech-first business
- Autonomy and end-to-end ownership
- Very competitive pay, equity, full medical, dental & vision benefits and more
- Opportunity for fast growth & promotion
- Opportunity to work in one of our other offices