Samsung × Synthefy: Deploying Real-Time Heart Rate AI on Microcontrollers with Synthetic Data

Overview

Heart rate sensors in wearables often underperform during intense activity or for users with darker skin tones—resulting in degraded accuracy in real-world settings. In this article, you'll learn how Synthefy and Samsung partnered to address this challenge using synthetic physiological signals generated by a foundation model.

These synthetic waveforms filled critical gaps in training data, enabling Samsung to develop microcontroller-scale heart rate detection algorithms with real-time inference capability. The result: more accurate, inclusive, and efficient health monitoring—without additional hardware complexity or cost.

1. Why Generate Synthetic PPG Sensor Data?

Photoplethysmography (PPG) is the primary modality for heart rate sensing in wearables. Yet its performance suffers during high-motion activity and for individuals with darker skin, where optical noise and signal attenuation distort readings. These issues introduce fairness concerns and reduce model generalization—especially when training data lacks adequate diversity.

Samsung and Synthefy collaborated to solve this through a hardware-software co-design approach. Synthefy developed TimeGen-V1, a conditional generative model that produces realistic PPG signals conditioned on metadata such as:

Heart rate
Activity type
Skin tone (Fitzpatrick scale)
Subject identity

Samsung then used these synthetic signals to train models that generalize across rare and underrepresented scenarios—without changing hardware or collecting new data.

As shown in Figure 1, TimeGenV1's synthetic samples closely match the distribution of real PPG in both time and frequency domains. In contrast, today's previous state-of-the-art Generative Adversarial Network (GAN) models fail to capture high-frequency or multi-modal characteristics typical of motion-heavy or high-HR signals. This validated the use of diffusion-based generation to address physiological and demographic edge cases with high fidelity.

Figure 1: Synthefy Time Weaver generates realistic PPG signals (orange) that closely match with ground-truth, unseen test data (blue), whereas GANs (green) struggle to generate meaningful signals.

2. The Dataset and Preprocessing

To benchmark TimeGenV1's utility, the team used the PPG-DaLiA dataset—a publicly available, multi-modal dataset that includes PPG, accelerometer, gyroscope, and ECG-derived heart rate data from 15 subjects. Each subject completed a series of scripted activities including sitting, walking, cycling, and climbing stairs.

Signals were segmented into 512-sample (8-second) non-overlapping windows at 64 Hz. Metadata for each window included:

Subject ID
Activity type
Heart rate
Skin tone (binned)
Session time

Per-subject normalization ensured consistency across samples. The dataset was split into 90% training, 5% validation, and 5% testing, with temporal disjointing to prevent overlap.

Importantly, test metadata combinations were held out during training to evaluate TimeGenV1's ability to generalize to new physiological and behavioral contexts—simulating deployment conditions in the field.

3. Synthefy's Diffusion Model for PPG Time Series

TimeGenV1 is a conditional diffusion model that generates time series by learning to reverse a noise process. For this task, it was configured to generate PPG signals from metadata inputs.

Conditional diffusion models enable fine-grained control over signal generation. Changing heart rate, skin tone, or activity metadata results in predictable changes in waveform morphology, enabling targeted synthesis of rare physiological patterns.

4. Synthefy's Model is State-of-the-Art

Qualitative and statistical evaluations showed TimeGenV1 consistently outperforms GAN-based baselines. As shown in Figure 1, TimeGenV1-generated signals (orange) align closely with ground-truth PPG (blue), capturing realistic morphology and periodicity. GANs (green) exhibit unstable or over-smoothed output, especially under motion.

Moreover, TimeGenV1 closely matches the statistical distribution of real PPG signals in both the time and frequency domain compared to GANs (Figure 2).

Figure 2: Distribution of Time and Frequency Domain. Synthefy's Time Weaver (orange) closely matches the ground-truth distribution (blue) in time (left) and frequency (right) domains. This is all for unseen, new test samples.

To assess metadata fidelity, the team synthesized high-HR waveforms. In Figure 3, synthetic data from Subject S11 conditioned on HR ≈125 BPM matched real data from high-HR Subject S5, preserving oscillation and amplitude structure.

Ground Truth vs. Synthetic PPG Signal at High Heart Rate

Figure 3: Comparison of real PPG data from Subject S5 (blue) and synthetic data generated by TimeGenV1 by increasing the heart rate of another subject (S11). The synthesized waveform (red) preserves periodic structure and amplitude variations at high heart rate.

A second test, shown in Figure 4, demonstrates robustness under repeated generation: small variability is present, but core waveform features remain intact.

Figure 4: Multiple seeds for Subject S11 show consistent waveform morphology, strong periodicity, and stable peak-to-trough amplitudes at lower heart rate.

These results confirm that TimeGenV1 can simulate realistic and physiologically plausible PPG waveforms in previously unseen conditions.

5. Synthefy Synthetic Data Improves Samsung's Heart Rate Detection Algorithms

Samsung developed a family of compact Temporal Convolutional Networks (TCNs) to test whether TimeGenV1's synthetic data improved downstream heart rate estimation.

Three training regimes were tested:

TRTR: Trained on real data only, tested on unseen real data
TRSTR: Trained on both real and synthetic data, tested on unseen real data

As shown in Figure 5, TRSTR consistently yielded the lowest MAE across activity types, especially for underrepresented motion states like stair climbing. Models trained with synthetic data were more robust to demographic and physiological variation.

Downstream ML Evaluation (MAE) Across Activities

Figure 5: Mean Absolute Error (MAE) across activities for different training regimes: real-only (TRTR) and combined training (TRSTR). Each bar represents the performance under different data augmentation conditions, showing consistent improvements for underrepresented states like 'Stairs'. Clearly, adding synthetic data improves performance on real, held-out test conditions.

To meet deployment constraints, Samsung applied structured pruning, reducing model size from 512k to just 1.56k parameters—over 300× compression. These pruned models retained or exceeded baseline performance when trained with synthetic data.

As shown in Figure 6, synthetic augmentation pushed the Pareto frontier, achieving better accuracy at significantly smaller sizes.

Pareto Frontier for PPG-based HR Estimation

Figure 6: The combination of TimeGenV1 and Samsung's proprietary HR detection algorithm establishes a new Pareto frontier for PPG-based HR estimation, achieving lower error at significantly smaller model sizes. Models deployed on MCUs are annotated with reported runtime latencies.

6. Samsung's Deployment on Edge Microcontrollers

A key objective of this collaboration was to demonstrate that synthetic data could enable not only accurate heart rate models—but models compact enough to run on the lowest-power hardware used in commercial wearables.

Samsung deployed the smallest model, PPGNet-1.56k, on an ARM Cortex-M4F microcontroller—commonly found in smartwatches, fitness trackers, and IoT health patches. The device used was the Arduino Nano 33 BLE Rev2, featuring a 64 MHz Cortex-M4F core with 256 KB SRAM and 1 MB flash.

The model was converted to TensorFlow Lite (TFLite) in full-precision FP32, avoiding quantization entirely.

Real-Time Performance

Inference was executed using TFLite Micro on-device, with inputs consisting of 8-second, 4-channel windows (PPG + IMU). End-to-end latency was consistently under 40 ms per window, and memory usage remained below 20 KB—well within budget for commercial-grade firmware deployment.

As shown in Figure 7, this model achieved:

R² = 0.86 correlation between predicted and ECG-derived heart rate
Full match in accuracy with the original TensorFlow model
Stable real-time inference with no hardware-specific tuning

Figure 7: a) Correlation (R² = 0.86) between MCU-predicted HR from PPGNet-1.56k and ground-truth HR (color-coded by subject). b) MCU accuracy matches parent TensorFlow model via full-precision deployment. c) Real-time inference on Subject S7 with <40 ms per-window latency.

A New Benchmark in Edge AI

This result pushes the Pareto frontier: the boundary of best-possible trade-offs between model size and performance. In edge AI, moving the Pareto frontier means delivering the same or better accuracy with smaller, faster, and cheaper models.

Previous benchmarked models for PPG-based heart rate estimation often required:

100k+ parameters to achieve sub-7 BPM MAE
Dedicated DSP blocks or floating-point acceleration
Quantization and retraining steps for 8-bit inference
Inference times >100 ms or reliance on external SoCs

In contrast, Samsung's TimeGenV1-trained TCN runs in under 40 ms on bare-metal firmware, with no special hardware or post-training quantization needed. This makes it viable for always-on, ultra-low-power health monitoring, opening new possibilities for battery-limited devices like wearables, earbuds, and remote care sensors.

In short, synthetic data didn't just help match performance—it enabled a deployment breakthrough on the most constrained platforms.

7. Future Steps and Work With Us

This project demonstrates that synthetic data is production-ready for real-world edge AI applications. TimeGenV1 enabled Samsung to train compact, accurate, and equitable heart rate models using metadata-aware synthetic augmentation—delivered on a 64 MHz MCU with kilobytes of memory.

What's Next

Synthefy is now scaling this approach to new use cases:

Expanding TimeGenV1 to WESAD, MIMIC-III, and proprietary wearables datasets
Building domain-general foundation models for time series across health, networking, and energy
Integrating privacy safeguards (e.g., differential privacy) for safe data sharing
Enabling interpretability to link metadata and signal features
Offering TimeGenV1 as a synthetic data service for model developers and device makers

Ready to Build with Synthetic Data?

For teams building wearable, clinical, or edge-AI health products, TimeGenV1 offers a scalable, cost-effective way to simulate rare, underrepresented, or sensitive scenarios—without real-world data collection. It accelerates development, improves generalization, and supports fairer AI.

Contact Us to Learn More →

Research Paper

For more technical details about this work, read the full research paper:

Read the Full Paper on OpenReview →

Interested in leveraging synthetic time series data for your health AI applications? Get in touch with our team to explore custom solutions for your organization.

Samsung × Synthefy: Deploying Real-Time Heart Rate AI on Microcontrollers with Synthetic Data

Overview

1. Why Generate Synthetic PPG Sensor Data?

2. The Dataset and Preprocessing

3. Synthefy's Diffusion Model for PPG Time Series

4. Synthefy's Model is State-of-the-Art

5. Synthefy Synthetic Data Improves Samsung's Heart Rate Detection Algorithms

6. Samsung's Deployment on Edge Microcontrollers

Real-Time Performance

A New Benchmark in Edge AI

7. Future Steps and Work With Us

What's Next

Ready to Build with Synthetic Data?

Research Paper

Related Articles

Company

Resources

Overview

1. Why Generate Synthetic PPG Sensor Data?

2. The Dataset and Preprocessing

3. Synthefy's Diffusion Model for PPG Time Series

4. Synthefy's Model is State-of-the-Art

5. Synthefy Synthetic Data Improves Samsung's Heart Rate Detection Algorithms

6. Samsung's Deployment on Edge Microcontrollers

Real-Time Performance

A New Benchmark in Edge AI

7. Future Steps and Work With Us

What's Next

Ready to Build with Synthetic Data?

Research Paper

Related Articles

Diffusion Models for Healthcare: Transforming Medical Time Series Data

"DALL-E" for Timeseries: Scaling Time Series ML with Synthetic Data Generation