Join Our Talent Network
Skip to main content

Informatics Data Engineer

Winston Salem, NC, United States
Job ID: 131019
Job Family: Information Services
Status: Full Time
Shift: Day
Job Type: Regular
Department Name: 55811085045501-Clinical and Translational Science Institute (CTSI)

Overview

BACKGROUND 

The Office of Informatics in the Clinical and Translational Sciences Institute serves the research community of the Wake Forest School of Medicine by providing analytic services and resources necessary to support our academic learning health system.  

Our data engineers are the foundation for everything we do by making sure that all the ‘big data’ from across our enterprise is accurate, clean, and readily available in common data models and specifications.  They work closely with the rest of our tight-knit data team who rely on their work to transform clinical and educational data into research-ready datamarts and reporting tools that drive forward operational efficiencies and improved patient care.  

The Office of Informatics is based out of newly renovated offices at the Innovation Quarter in downtown Winston-Salem with a hybrid model which supports remote work options.   

JOB SUMMARY 

Designs, builds, implements, and maintains data processing pipelines for the extraction, transformation, and loading (ETL) of data from various data sources. Develops robust and scalable solutions that transform data into useful formats for analysis and data sharing, enhance data flow, and enable end users to consume, analyze and share data faster and easier. Writes complex SQL queries to support analytics needs. Evaluates and recommends tools and technologies for data infrastructure and processing. Collaborates with statisticians, data scientists, programmers, data analysts, product teams, and other stakeholders to translate business requirements to technical specifications and coded data pipelines. Works with structured and unstructured data from a variety of data stores, such as data lakes, relational database management systems, and/or data warehouses. 

EDUCATION/EXPERIENCE/CERTIFICATIONS: 

  • Bachelor’s degree and 4+ years of experience or, an equivalent combination of education and experience in computer programming. 

ESSENTIAL FUNCTIONS 

  • Builds custom ingestion pipelines to incorporate data from novel sources such as outputs from machine learning models  
  • Ensures that there is visibility on the status of automated data tasks to catch mistakes before they become problems  
  • Collaborate with other members of the data team to improve performance and stability of transformation tasks  
  • Participate in design conversations for improving the architecture of our data infrastructure  
  • Supports team in identifying and implementing data integration and quality control strategies to improve data quality and availability.  
  • Prepares research data for ingestion and conversion to a unified data standard using ETL and automation tools. 
  • Assists the team by maintaining the database environment by creating views/queries, documentation, and data pipelines. 
  • Owns data integrity, availability, documentation, and efficient access to data. 
  • Identifies opportunities for process improvement in the end-to-end data development and delivery lifecycle.  
  • Incorporates automation wherever possible to improve access to data and analyses.  

SKILLS/QUALIFICATIONS 

  • Strong initiative and proven ability to work independently 
  • Requires moderate skill set and proficiency in discipline. Conducts work assignments of increasing complexity, under moderate supervision with some latitude for independent judgment. 
  • Experience with data replication tools or services such as Meltano, Airbyte, Fivetran, or Stitch  
  • Experience with orchestration tools such as Airflow, Luigi, Prefect, or Dagster  
  • Experience using scalable and distributed compute, storage, and networking resources such as those provided by Azure, especially in the context of Microsoft Fabric  
  • Experience with code versioning systems such as Git 
  • Knowledge of common file formats for analytic data workloads like Parquet, ORC, or Avro  
  • Knowledge of high-performance table formats such as Apache Iceberg or Delta Lake 
  • Additional consideration given for experience with tools, languages, data processing frameworks, and databases such as R, Python, SQL, MongoDB, Redis, Hadoop, Spark, Hive, Scala, BigTable, Cassandra, Presto, Strom. 
  • Experience with healthcare and/or biomedical research operations and systems a plus 
  • Ability to communicate on a professional level with customers and staff 
  • Superior problem-solving skills