Why LLMs Can't Solve Time Series

Large Language Models (LLMs) have captured the imagination of the world. And for good reason — they're powerful, flexible, and general-purpose. But somewhere along the way, we started treating them as the answer to every problem.

Let's be clear: LLMs are not a universal solution. And when it comes to time series modeling, they're the wrong tool for the job.

A Motivating Example

We perform a simple experiment to demonstrate how poor LLMs are at forecasting. We model the oil price (daily, at close) and forecast the price using ChatGPT and Claude, and compare the outputs. We report the Mean Square Error (MSE), a popular error metric for forecasting tasks.

These models completely miss the spike and instead show a relatively smooth line matching the previous trend.

LLMs Are Transformers — and Transformers Have Limits

LLMs are built on transformer architectures. Transformers were a major breakthrough in deep learning, unlocking new capabilities in natural language processing, image generation, and even molecular modeling. Their power lies in self-attention — the mechanism that lets the model dynamically decide which parts of an input to focus on.

But transformers come with assumptions. Chief among them:

The input data is a sequence of discrete tokens
The only way to condition the model is to include more tokens in context

For language, these assumptions are perfect. Text is inherently tokenized — words, subwords, punctuation. Conditioning with additional context ("Summarize this article…", "Translate this sentence…") fits naturally into the token stream.

Time series? Not so much.

Why Transformers Struggle with Time Series

Time series data is continuous. It's made up of values like 93.4, 71.2, 108.0 — sequences of real numbers sampled over time. To use a transformer, we'd need to discretize these values into tokens. But discretization is lossy, arbitrary, and ultimately unnatural for most real-world signals.

And there's another issue: conditioning.

Let's say we're trying to forecast a person's heart rate over time. We may want to condition the model on metadata — like age, gender, medical history, and lab results. But this metadata comes in many formats:

Binary (e.g. smoker/non-smoker)
Categorical (e.g. sex)
Continuous (e.g. hemoglobin level)
Unstructured text (doctor's notes)

Transformers require all of this to be serialized into the same token stream. That's awkward at best and destructive at worst.

Put simply:

Transformers force us to contort time series into a format that loses the very information we want to preserve.

What We Need Instead

Time series problems demand models that are:

Natively continuous
Able to incorporate arbitrary metadata — regardless of type
Capable of generating and reasoning over dense, multivariate signals over time

That's exactly what we've built at Synthefy.

Our Approach: Diffusion Models for Time Series

Our core technology is a diffusion model purpose-built for time series. If you've seen DALL·E or Midjourney generate stunning images from text prompts, you've seen diffusion in action. We do the same — but for time series. Our models can generate or forecast signals conditioned on any metadata, no matter the format or domain.

To enable this, we developed a universal metadata encoder — an architecture that lets us condition time series predictions on text, tabular data, categorical variables, and continuous signals. It's like CLIP, but general-purpose for real-world forecasting tasks.

Real-World Results

Our models have shown state-of-the-art results across domains:

Energy demand forecasting
Retail sales and inventory simulation
Medical time series (ECG, PPG, heart rate)

Synthefy models (top row, red) produce samples that match the ground truth samples (blue) much more closely than previous methods like GANs (bottom row, red).

Synthefy models produce samples that match ground truth

Synthefy models (top row, red) produce samples that match the ground truth samples (blue) much more closely than previous methods like GANs (bottom row, red).

Generative AI ≠ Just LLMs

Generative AI is more than just chatbots and code completion. It's a new paradigm for modeling and generating structured data — of all kinds. And it demands architectures that match the structure of the data itself.

LLMs are great at language. But time series is its own domain, with its own rules. That's why we're not trying to bend transformers to fit time series — we're building the right tools from the ground up.

The Bottom Line

If you're trying to understand, forecast, or simulate complex time-dependent behavior, LLMs won't get you there.

Synthefy will.

Try Synthefy Today →

— Team Synthefy

Originally published on Medium

Why LLMs Can't Solve Time Series

A Motivating Example

LLMs Are Transformers — and Transformers Have Limits

Why Transformers Struggle with Time Series

What We Need Instead

Our Approach: Diffusion Models for Time Series

Real-World Results

Generative AI ≠ Just LLMs

The Bottom Line

Related Articles

Company

Resources

A Motivating Example

LLMs Are Transformers — and Transformers Have Limits

Why Transformers Struggle with Time Series

What We Need Instead

Our Approach: Diffusion Models for Time Series

Real-World Results

Generative AI ≠ Just LLMs

The Bottom Line

Related Articles

Data Enrichment: The Missing Ingredient in Time Series Modeling

Diffusion Models for Healthcare: Transforming Medical Time Series Data

"DALL-E" for Timeseries: Scaling Time Series ML with Synthetic Data Generation