Why LLMs Can't Solve Time Series
Discover why Large Language Models struggle with time series forecasting and what the industry needs instead.
Large Language Models (LLMs) have captured the imagination of the world. And for good reason — they're powerful, flexible, and general-purpose. But somewhere along the way, we started treating them as the answer to every problem.
Let's be clear: LLMs are not a universal solution. And when it comes to time series modeling, they're the wrong tool for the job.
A Motivating Example
We perform a simple experiment to demonstrate how poor LLMs are at forecasting. We model the oil price (daily, at close) and forecast the price using ChatGPT and Claude, and compare the outputs. We report the Mean Square Error (MSE), a popular error metric for forecasting tasks.

These models completely miss the spike and instead show a relatively smooth line matching the previous trend.
LLMs Are Transformers — and Transformers Have Limits
LLMs are built on transformer architectures. Transformers were a major breakthrough in deep learning, unlocking new capabilities in natural language processing, image generation, and even molecular modeling. Their power lies in self-attention — the mechanism that lets the model dynamically decide which parts of an input to focus on.
But transformers come with assumptions. Chief among them:
- The input data is a sequence of discrete tokens
- The only way to condition the model is to include more tokens in context
For language, these assumptions are perfect. Text is inherently tokenized — words, subwords, punctuation. Conditioning with additional context ("Summarize this article…", "Translate this sentence…") fits naturally into the token stream.
Time series? Not so much.
Why Transformers Struggle with Time Series
Time series data is continuous. It's made up of values like 93.4, 71.2, 108.0 — sequences of real numbers sampled over time. To use a transformer, we'd need to discretize these values into tokens. But discretization is lossy, arbitrary, and ultimately unnatural for most real-world signals.
And there's another issue: conditioning.
Let's say we're trying to forecast a person's heart rate over time. We may want to condition the model on metadata — like age, gender, medical history, and lab results. But this metadata comes in many formats:
- Binary (e.g. smoker/non-smoker)
- Categorical (e.g. sex)
- Continuous (e.g. hemoglobin level)
- Unstructured text (doctor's notes)
Transformers require all of this to be serialized into the same token stream. That's awkward at best and destructive at worst.
Put simply:
Transformers force us to contort time series into a format that loses the very information we want to preserve.
What We Need Instead
Time series problems demand models that are:
- Natively continuous
- Able to incorporate arbitrary metadata — regardless of type
- Capable of generating and reasoning over dense, multivariate signals over time
That's exactly what we've built at Synthefy.
Our Approach: Diffusion Models for Time Series
Our core technology is a diffusion model purpose-built for time series. If you've seen DALL·E or Midjourney generate stunning images from text prompts, you've seen diffusion in action. We do the same — but for time series. Our models can generate or forecast signals conditioned on any metadata, no matter the format or domain.
To enable this, we developed a universal metadata encoder — an architecture that lets us condition time series predictions on text, tabular data, categorical variables, and continuous signals. It's like CLIP, but general-purpose for real-world forecasting tasks.
Real-World Results
Our models have shown state-of-the-art results across domains:
- Energy demand forecasting
- Retail sales and inventory simulation
- Medical time series (ECG, PPG, heart rate)
Synthefy models (top row, red) produce samples that match the ground truth samples (blue) much more closely than previous methods like GANs (bottom row, red).

Synthefy models (top row, red) produce samples that match the ground truth samples (blue) much more closely than previous methods like GANs (bottom row, red).
Generative AI ≠ Just LLMs
Generative AI is more than just chatbots and code completion. It's a new paradigm for modeling and generating structured data — of all kinds. And it demands architectures that match the structure of the data itself.
LLMs are great at language. But time series is its own domain, with its own rules. That's why we're not trying to bend transformers to fit time series — we're building the right tools from the ground up.
The Bottom Line
If you're trying to understand, forecast, or simulate complex time-dependent behavior, LLMs won't get you there.
Synthefy will.
— Team Synthefy
Originally published on Medium