Do you get excited about working on a product that can impact health outcomes for students and their families in America’s K-12 school system? Do you like to solve problems by writing great code that will be used in the flagship product of the company, solving unique challenges in the amazingly interesting intersection of Healthcare and Edtech? We are looking for someone that can build data pipelines that scale, integrate disperse data sets, cares about the end users, and wants to work across teams to help support the product amongst a team of talented colleagues in a NYC startup.
We are rapidly growing and we’re seeking a Data Engineer that can work on the latest and greatest technology stacks, can take ownership of complex projects involving system integrations, big data analytics, machine learning, and visualizations. This data focused role will report directly to the CTO and work with our Director of Data Science to empower their team on projects like predictive alerts for patients with negative clinical event trends. Become a thought leader interacting with Product and Customer Success teams our usage of data driven decisions and insights. You’ll have the opportunity to research, and build exciting new features that change accessibility, and improve healthcare outcomes!
Aside from your technical expertise and having a strong understanding of what our business does, you understand the importance of HIPAA and FERPA when it comes to real production data and the security and care that must be taken.
You're a great fit for our Data Engineering role if you’ve succeeded in challenges similar to the ones below
Bachelor's degree in computer science, data science, bioinformatics, or related field
Built a data pipeline before that consumed events, validated, denormalized, cleaned, and enriched them and moved them to a data warehouse. Full ETL at scale.
Ability to shift gears and work effectively in an agile environment
Ability to take ownership and see a project through the full SDLC and production maintenance
Have an opinion about what software and tools we use, and work directly with the CTO and lead engineers on technology roadmap planning
Experience with Scala, Java, or Python in a Spark or AWS EMR environment
Experience with relational database systems that use SQL (and NoSQL a plus)
Experience with Hadoop, MapReduce, HDFS, AWS Kinesis and Firehouse or Kafka, AWS Redshift
Experience with unix, shell scripting, DevOps, and web services and API building
Experience in data visualization libraries like D3 or Google Charts
Experience in Big Data (dealing with the 4 Vs) and/or Machine Learning algorithms and libraries
Bonus if experience with caching and search systems like Redis and ElasticSearch