Back to Blog
Research··5 min read

Diffusion Models for Healthcare: Transforming Medical Time Series Data

Discover how Synthefy's diffusion models are revolutionizing healthcare analytics through synthetic ECG and PPG data generation for privacy, stress-testing, and data augmentation.

#Diffusion Models#AI for Healthcare#Generative AI Use Cases#Wearables#Medical AI
Diffusion Models for Healthcare: Transforming Medical Time Series Data

Synthefy's Diffusion Models are Transforming Healthcare

"Generate an electrocardiogram (ECG) signal that looks like a female non-smoker with a pacemaker and atrial fibrillation to stress-test my disease classifier."

"Simulate a photoplethysmogram (PPG) signal for a male smoker with dark skin, running on a treadmill, to augment my wearable sensor dataset."

These examples illustrate how diffusion models — advanced generative AI tools — are reshaping healthcare analytics. Recent breakthroughs, such as AlphaFold (predicting protein structures with atomic accuracy) and DiffDock (molecular docking for drug discovery), showcase the immense potential of Generative AI models across life sciences. Now, Synthefy's diffusion models are transforming medical time series data applications, from anonymizing patient data to stress-testing diagnostic systems and augmenting datasets with rare cases.

Synthefy improves disease classifiers with synthetic data

Synthefy improves disease classifiers with synthetic data. Collecting real patient data, especially for rare phenotypes, is costly. However, the lack of representative data leads to biased and poor models. Synthefy is able to create high-quality synthetic time series for rare, imbalanced datasets, which significantly improves overall model accuracy.

Why Synthetic Medical Time Series Data?

Use Cases

Anonymization and Privacy: Synthetic data replicates real trends without compromising patient confidentiality, enabling secure collaboration and research.

Stress-Testing Systems: Rare conditions or extreme scenarios can be simulated to evaluate diagnostic algorithms and improve robustness.

Data Augmentation: Synthetic data reduces bias, enriches datasets with underrepresented populations or scenarios, and enhances model robustness for edge cases.

Reducing Costly Data Collection: Often, data collection in the field is extremely expensive, such as recruiting patients with a rare phenotype and administering an ECG. For example, collecting ECG signals with a Holter Monitor for 48 hours alone could cost on average $606 per patient according to Torquise Health. However, using Synthefy's computational models to augment a dataset costs only a few cents per sample.

Why Medical Data Needs Key Patient Context

Medical data is highly contextual. Multi-modal approaches like Synthefy's incorporate patient metadata — such as age, gender, health conditions, and activity levels — into the generation process. This ensures synthetic data reflects real-world complexity. For example:

  • ECG Signals: Conditioned by demographics, health history, and device configurations.
  • PPG Data: Influenced by skin pigmentation, physical activity, and environmental factors.

ECG Case Study: Leveraging PTB-XL+ for Advanced Diagnostics

Synthefy utilized the PTB-XL+ dataset, a comprehensive feature dataset supplementing the PTB-XL ECG dataset, to generate realistic synthetic ECG signals. The dataset includes harmonized ECG features from commercial algorithms like Uni-G and Marquette™ 12SL™, as well as the open-source ECGDeli. It also provides ground truth diagnostic statements for machine learning tasks.

Goals

  • High-Quality Classifiers: Train on synthetic data and test on real data using Train on Synthetic, Test on Real (TSTR) methodology.
  • Realistic Samples: Generate signals indistinguishable from real samples, validated by experts. Moreover, we beat today's state-of-the-art Generative Adversarial Networks (GANs) by ~5x, since they don't adequately factor in metadata.
  • Statistical Fidelity: Align synthetic data with real data in both time and frequency domains.

Achievements

  • TSTR Accuracy: Classifiers trained on synthetic ECG data achieved 93% accuracy when tested on real data, compared to 95% accuracy for models trained on real data alone.
  • Statistical Fidelity: Synthetic signals closely matched real data in Fourier coefficients and temporal features. Moreover, we beat generative adversarial networks (GANs) by 5x on our ability to match the real data distribution.
  • Stress Testing: Scenarios like "a male smoker with a pacemaker experiencing bradycardia" were simulated to test robustness of disease classifiers.

Synthefy's diffusion model accurately captures the distribution of time series data

Synthefy's diffusion model accurately captures the distribution of time series data.

GANs fail to accurately create the data distribution

GANs fail to accurately create the data distribution since they suffer from mode collapse and their antiquated architectures are poor for dealing with contextual metadata.

Synthefy creates high-quality synthetic ECG signals

Synthefy creates high-quality synthetic ECG signals. Training on synthetic and real data significantly boosts the accuracy of disease classifiers.

GANs create poor samples due to mode collapse

GANs create poor samples due to mode collapse, lack of sophistication compared to diffusion models, and their inability to incorporate key contextual metadata.

These results underline the utility of diffusion models in developing robust diagnostic tools for edge cases.

PPG Case Study: Enabling Robust Wearable Algorithms with PPG-DaLiA

Synthefy used the PPG-DaLiA dataset for this case study. This publicly available dataset features multimodal physiological data, including PPG, 3D-accelerometer, and ECG signals recorded from wrist- and chest-worn devices. Data was collected from 15 subjects during real-life activities, providing a rich testbed for wearable algorithms.

Collecting real sensor data for wearables

Collecting real sensor data, especially for rare scenarios or demographics, is costly. However, lack of balanced datasets lead to biased and poor ML models. Synthefy can train models on purely synthetic data with extremely high accuracy with much less cost than collecting real data. This is broadly applicable to wearables.

Goals

  • Enhance Algorithm Robustness: Train on synthetic PPG data to address variability in skin type, activity, and environmental factors.
  • Augment Dataset Diversity: Generate realistic PPG signals for underrepresented scenarios.
  • Compensate for Motion Artifacts: Improve motion artifact resilience in wearable algorithms.

Achievements

  • Binary Classifier Accuracy: Classifiers trained on synthetic BVP signals achieved 91% AUC on real test data for distinguishing between two skin types.
  • Significantly beat GANs: As shown below, samples from Synthefy's diffusion models match the distribution of real data in the time and frequency domain much more closely than GANs.
  • Task-Specific Synthesis: Synthetic signals realistically captured cardiovascular activity variations across age, skin type, and activity level.
  • Insights from Model Correlations: Demonstrated correlation learning in synthetic data. For example, simulating a subject transitioning from sitting to cycling resulted in increased cardiovascular variation.

Collecting real sensor data for wearables

Collecting real sensor data for wearables

The results showcase the capability of Synthefy's diffusion models to support wearable device manufacturers in validating and expanding their datasets.

Multi-Modal Patient Survival Analysis

Diffusion models extend beyond time series data to multi-modal applications. For instance, they can forecast patient survival based on demographic factors, disease progression, and hospital records. This approach offers transformative potential for healthcare providers, insurers, and researchers.

Applications

  • Healthcare Providers: Predict survival rates and optimize care plans.
  • Insurance: Assess risk profiles based on integrated patient data.
  • Research: Explore correlations between demographics, disease type, and outcomes.

Contact Us

Synthefy's diffusion models are revolutionizing medical time series data with applications in privacy, stress-testing, and augmentation. Explore how our solutions can empower your healthcare initiatives. Contact us today to learn more.


Interested in learning more about how Synthefy can transform your healthcare data? Get in touch with our team to explore custom solutions for your organization.

Related Articles