Software Engineer (Big Data and ML Infrastructure)
A Paradigm-Shifting Company
Dandelion is an early-stage company focused on creating a platform for AI algorithmic development in healthcare. We are founded by experienced executives within health tech, hospital systems, and academia (Oscar Health, Mass General Brigham, MIT, UChicago, Berkeley). We are focused on providing a solution that enables AI to scale across healthcare. Our partnerships with health systems have enabled us to provide high-fidelity, clinically detailed data that many AI companies are searching for in order to build representative, equitable solutions for patients and communities. We believe that this company will transform the way healthcare and AI intersect for many years to come. The Role Dandelion works with healthcare data in the petabyte range, so having world-class data architecture and stewardship is critical to our shared success. You will work closely with Technical Product Management, Data Science, and Health System Partners to develop scalable solutions to process and store terabytes of structured and unstructured data from legacy systems to a cutting-edge machine learning platform. You should have deep expertise in data platforms in AWS and/or GCP. You should have excellent business and interpersonal skills to work with internal and external stakeholders to understand data requirements to implement efficient and scalable ETL solutions.
- Design, implement, and manage data pipelines between health systems and Dandelion’s data platform;
- Lead data extraction teams to surface data, process, regulatory, and technology issues through identification, measurement, and monitoring of our operations;
- Become a trusted partner to these stakeholders, allowing the movement of up to 25 PB of clinical data from legacy healthcare systems into a secure Cloud environment;
- Own relationships with our technical counterparts at multiple health systems;
- Development and improvement of existing ETL infrastructure focusing on data quality, efficiency, and security;
- Collaborate with health system partners, technical product management, and clinical informatics to develop and implement data pipelines that allows for transparency and auditability;
- Summarize the complexity of these data pipelines and operations into clear explanations and documentation for internal and external audiences.
- 3+ years in Python development Experience
- 3+ years of SQL ○ Experience working with OLAPs (i.e. AWS Redshift, Google Bigquery, Snowflake, or similar).
- Multiple years of experience in System Administration, preferably in a cloud environment
- Experience using virtualization and containerization technologies (e.g. Docker, VMware)
- Experience using OLAPs and efficiently extracting and transforming large volumes of data.
- Experience interacting with business users to determine optimal infrastructure and deploy software solutions
- Proficiency with one or more scripting languages (e.g. Bash, Python)
- 2+ years of work experience with non-technical stakeholders;
- Experience designing and improving workflows and standing up accompanying operating and technical procedures;
- Strong analytical decision-making and organizational skills;
- Startup experience
- Proficiency in data harmonization, database architecture, and/or cloud computing
- Experience with healthcare data (familiar with HIPAA PHI elements)
- Experience with designing and implementing real-time pipelines
- Experience with data quality, QA, and validation
- Experience with E2E data pipeline optimization
Resumes, questions, and requests for assistance or an accommodation due to a disability may be directed to email@example.com