Data Engineer - Python and Spark

  • Location

    New York

  • Sector:

    Data Science

  • Job type:


  • Salary:


  • Contact:

    Mia Monaghan

  • Contact email:


  • Job ref:


A fast-growing Data Analytics startup in New York who has been featured on CNN, The Washington Post, The Associated Press & The Huffington Post are looking to add Data Engineers to their growing team.

Still less than 20 in size but growing our client have an incredible leading proprietary social graph that leverages publicly-available data to map real-world relationships.As a Data Engineer you will be responsible for sourcing, enhancing and integrating data sources Your work will directly affect their clients in the form of election outcomes, increasing political and non-profit fundraising yields and optimizing advertising spends and risk assessments.


  • Collaboratively architect, build and launch new social graph components that enhance profiles, increase coverage and edge accuracy.
  • Create, maintain, and scale data pipelines between and for data ingesters, the social graph, machine learning predictors, client deliverables, and data warehousing.
  • Implement systems for monitoring of streaming and batch data processing (e.g. DataDog, Nagios). Track data quality and consistency.
  • Evangelize solid coding practices (e.g. test driven development, code reviews, continuous deployment, automated linting, staging environments).
  • Contribute to the architectural designs and decision making around data stores, schemas, data security and cloud storage.


  • BS or MS degree in Computer Science, Math, Statistics or another technical field.
  • 2-3+ years of applied software engineering experience (especially startups, Python).
  • Python Expertise: classes & inheritance, map & filter functions, list comprehension, generators, decorators, style guides, pylint, pytest, pdb
  • SQL/Hive Expertise: where clauses, joins, group bys, windowing functions, exploding  
  • Spark Expertise: SparkSQL, Caching, Checkpointing, Dataframes, RDDs
  • Expertise in building and maintaining reliable ETL jobs.
  • Ability to write well-abstracted, reusable, object-oriented code components.
  • Enjoy working in a fast-paced environment, highly collaborative and ambitious startup work environment.
  • Understanding of summary statistics and basic mathematical modelling.
  • Experience working in teams, packaging and deploying code in a production setting.
  • Experience with Amazon Web Services (RDS, S3, EC2, EMR, Data Pipeline).

Additional Skills:

  • Experience with open source search platforms such as Solr, Elastic Search or alike.
  • Background in data wrangling various structured, unstructured data sets, consuming APIs (e.g. rate limiting and exponential back-offs) and alike.
  • Knowledge of graph storage and computation frameworks (e.g. GraphX, TitanDB, Neo4J).
  • Familiarity with Scala and/or Java, Apache Spark internals and job optimization.
  • Engagements in a variety of coding projects, examples including but not limited to browser extensions, full stack development, web scraping & mechanical turk automation.
  • Significant interest or background in politics, advertising technology and/or behavior modelling is a big plus.

Our client are paying up to $160,000 plus benefits