With synthetic data - data which does not contain confidential patient information - the NIH will be able to open up access of the largest available repository of patient-level COVID-19 electronic medical records to researchers around the world working on treatments and vaccines.
The US’ National COVID Cohort Collaborative (N3C) represents 2.7m+ screened individuals, including over 413,000 COVID-19 positive patients, and 2.6B rows of data. Syntegra is currently synthesizing entire data sets from almost 100 sites.
“The N3C Enclave contains all relevant data about COVID including the care trajectories of all treatments, vaccinations, etc. And it is refreshed as new data is collected," Syntegra told this publication. "With the support of the Bill and Melinda Gates Foundation, our role is to create synthetic versions of any data in the Enclave, to provide rapid, widespread access without violating privacy. We have successfully created synthetic versions of test sets, as we prepare to roll out large scale COVID synthetic data.”
“The NIH and N3C are responsible for gathering this data in a workstream broadly covering data ingestion, normalization, quality controls, and harmonization, ensuring a clean and high quality source of data at massive scale. Syntegra takes this high quality curated data as its input. The synthetic output precisely mimics the quality of the input.
“Syntegra’s ability to create a synthetic version of this entire dataset while maintaining this full level of clinical detail will be a key element in unlocking the potential benefits for the treatment and understanding of COVID, accelerating treatments and care for patients. This impact is not only a key need as the vaccine rolls out, but will be an ongoing need as the medical community begins to understand and develop treatments for “Long COVID”.
Data access for physicians, scientists, and researchers will help accelerate enable key focus areas for the N3C such as disparities (racial and ethnic) in spread and risk, predictors of hospitalization, long term adverse effects, and the impact of COVID-19 on hospitals.
Syntegra says the promise of 'big-data' and precision medicine depends on acess to data, while guaranteeing patient privacy: "With the COVID-19 pandemic, there has never been a time when rapid, low burden access to patient-level data, at scale, was more urgent.”
The synthetic data will be accessible once the NIH gives the green light.
Potential beyond COVID-19
Syntegra has also engaged with the Federal Drug Administration (FDA) to evaluate the role of synthetic data in regulatory decisions for COVID-19 and beyond.
“Syntegra’s novel approach enables patient level research and innovation without the need to access patient level data. This opportunity includes currently available real world data as well as expansion to new data such as social determinants of health or genomic data which are often excluded from research due to privacy concerns," explains Syntegra.
“These expansive real world datasets are of direct interest to the FDA, if privacy can be assured. That is where synthetic data is key. The FDA is exploring with us several aspects of drug approval, including synthetic control arms, improved trial design and ”what if” analysis, ongoing drug safety monitoring, and approval of new indications for small sub-populations and rare diseases.”