Join Our Talent Network
Skip to main content

Data Engineer Programmer – NCCT CTMC

Winston Salem, NC, United States
Job ID: 135115
Job Family: Information Services
Status: Full Time
Shift: Day
Job Type: Regular
Department Name: 55811085045501-Clinical and Translational Science Institute (CTSI)

Overview

JOB SUMMARY 

The National Center for Clinical Trials (NCCT) is designed to serve as an innovative platform to revolutionize and catalyze the conduct of clinical trials−greatly accelerating the translation of scientific findings into improvements in the prevention, diagnosis, and treatment of disease for our communities and patients. The NCCT will offer core services for patient recruitment and enrollment, trial administration and follow-up, and to gather real world data and evidence. 

 

The Clinical Trial Methods Center (CTMC) has been established within the Wake Forest University School of Medicine to provide the necessary tools and expertise that the NCCT will access and apply to deliver many of its core services.  

 

The Data Engineer Programmer will be part of a team that provides informatics expertise including integrating and normalizing data from disparate sources, creating data ELT pipelines, and API development; and will report to the Informatics Lead Programmer who oversees the team of application and data programmers in the CTMC. This position will also work with Investigators, Evaluation staff, Industry Sponsors, and NCCT Leadership to identify and evaluate new data sources, automate data ingestion, and create data management processes and tools.  Data infrastructure enables informational insight in support of clinical trial startup, from site and study feasibility, through patient recruitment and data collection, to follow-up outcomes and process assessment.      

 

ESSENTIAL FUNCTIONS 

  • Attend project and departmental meetings and contribute to the project design concerning data management needs  
  • Collaborate with faculty, team leads, and stakeholders to anticipate, define, and satisfy data needs  
  • Builds custom ingestion pipelines to incorporate data from novel sources   
  • Ensures that there is visibility on the status of automated data tasks to catch mistakes before they become problems   
  • Collaborate with other members of the data team to improve performance and stability of transformation tasks   
  • Participate in design conversations for improving the architecture of our data infrastructure   
  • Supports team in identifying and implementing data integration and quality control strategies to improve data quality and availability.   
  • Prepares research data for ingestion and conversion to a unified data standard using ETL and automation tools.  
  • Assists the team by maintaining the database environment by creating views/queries, documentation, and data pipelines.  
  • Owns data integrity, availability, documentation, and efficient access to data.  
  • Identifies opportunities for process improvement in the end-to-end data development and delivery lifecycle. 
  • Incorporates automation wherever possible to improve access to data and analyses.   
  • Performs other related duties as needed 

 

EDUCATION/EXPERIENCE 

Bachelor's Degree in an applicable field with 3 years of experience working on full-cycle data analytics and visualization projects; or an equivalent combination of education and experience.   

 

SKILLS/QUALIFICATIONS 

  • Strong initiative and proven ability to work independently  
  • Requires moderate skill set and proficiency in discipline. Conducts work assignments of increasing complexity, under moderate supervision with some latitude for independent judgment.  
  • Experience with data replication tools or services such as Meltano, Airbyte, Fivetran, or Stitch   
  • Experience with orchestration tools such as Airflow, Luigi, Prefect, or Dagster   
  • Experience using scalable and distributed compute, storage, and networking resources such as those provided by Azure, especially in the context of the Snowflake data stack   
  • Experience with code versioning systems such as Git  
  • Knowledge of common file formats for analytic data workloads like Parquet, ORC, or Avro   
  • Knowledge of high-performance table formats such as Apache Iceberg or Delta Lake  
  • Additional consideration given for experience with tools, languages, data processing frameworks, and databases such as R, Python, SQL, MongoDB, Redis, Hadoop, Spark, Hive, Scala, BigTable, Cassandra, Presto, Strom.  
  • Experience with healthcare and/or biomedical research operations and systems a plus  
  • Ability to communicate on a professional level with customers and staff  
  • Superior problem-solving skills 
  • Familiarity with the clinical trial lifecycle a plus  

 

Position is located in Winston-Salem, NC - May be eligible for remote employment