Back to Jobs

Senior Data Engineer: Massive Graph Processing

Help us process a trillion edge graph as quickly and efficiently as possible. We’re Identity Engineering, and we maintain a massive graph that connects together the different identifiers for consumers (e.g., anonymized email addresses and phone numbers) and the devices they use online. The engineering systems we’ve developed are constantly ingesting new edges from thousands of different sources and finding numerous types of relevant paths to power our suite of core products.

Here are some highlights of projects we’re working on

  • Pregel Path Computer: This system finds relevant graph paths using the pregel graph computation framework as implemented in Apache Giraph. There are challenges in running Giraph at the scale of our graph and we’re constantly looking to refine our Pregel algorithms.
  • Edge Ingestion and Partitioning Framework: We could never process all trillion edges at once and luckily we don’t have to. Instead we process subgraphs that contain specific types of edges. Our edge ingestion and partitioning framework manages different Hadoop datastores for different types of edges and automates the ingestion of new edge data. It leverages our Seek MSJ framework to efficiently incorporate new data into existing edge stores.
  • Path Computation as a Service: We provide a service to other engineering teams for finding specific types of paths within our massive graph. It handles 20,000 requests a day and this is possible due to its use of caching and intelligently batching similar request together.

We currently use a 79,800 core Hadoop cluster with 90 PB of disk space and 256 TB of RAM (shared across all of our data engineering) to power our systems. We’re also exploring moving everything to AWS. We develop in Java and use MapReduce, Giraph, and Spark. We’re open to new technologies and languages if they help us better solve a problem.

We take pride in operating as a high performance team, while maintaining our kindness and humility. We find feedback to be important in helping us grow as individual and as a team and we’re always looking for chances to share positive and constructive feedback.

You may be a good fit if you

  • Have 3+ years of experience writing and deploying production code.
  • Have a passion for building large scale, distributed systems and are comfortable writing high performance code.
  • Have a startup personality: smart, ethical, friendly, hard-working and productive.
  • Are a data enthusiast who wants to be surrounded by brilliant teammates and huge challenges.

Benefits

  • People. Work with talented, collaborative, and friendly people who love what they do.
  • Take our engineering, business development, and culture in a whole new direction during one of our four Hackweeks every year.
  • Stock.
  • Unlimited paid time off
  • Competitive Medical, dental, & vision insurance
  • 401K Matching
  • Employee Stock Purchase Plan (ESPP)
  • Commuter benefits
  • Catered meals & stocked kitchen
  • Events including games nights, happy hours, camping trips and sporting leagues

More about us

We are the leader in data connectivity, helping the world’s largest brands use their data to improve customer interactions on any channel and device. We thrive on mind-bending technical challenges and value entrepreneurship, humility, and constant personal growth.

There is so much more that we want to build and that we could continue to improve. We value strong engineers who are agile enough to hit the ground running and tackle challenges.

To see more Data Engineer Jobs click here