AI & Machine LearningJune 12, 20267 min read

Synthefy-Nori: The Foundation Model That Replaces XGBoost

Train nothing, predict anything. Synthefy-Nori-V1 is the only fully open-source tabular foundation model — 6M parameters, zero training, and #1 mean R² across a 96-dataset regression benchmark.

#Tabular Data#Foundation Models#In-Context Learning#XGBoost#Benchmarks#Open Source

Synthefy-Nori: The Foundation Model That Replaces XGBoost

Train Nothing, Predict Anything

For more than a decade, the workflow for any tabular machine learning problem has been identical. Collect data. Label it. Run XGBoost or CatBoost with a couple hundred trees. Grid-search the hyperparameters. Cross-validate. Ship the model. Problem solved, right?

Not exactly. After you ship, you watch performance drift in production. Retrain monthly. Version the model. Deploy again. Rinse and repeat for every new dataset, every new business question, every exogenous event. The model isn't the expensive part — the loop around it is.

Today, we're saying goodbye to that never-ending cycle. We're releasing Synthefy-Nori, a foundation model that handles regression with zero training. You pass your labeled rows as context and it returns predictions on your unlabeled rows. No gradient updates. No hyperparameter sweeps. No retraining.

On 96 real-world datasets, Synthefy-Nori is the best performing model while being a tenth of the size of its next closest neighbor. It beats TabPFN-3 from Prior Labs, the strongest model right now.

The best part? It's the only fully open-source tabular foundation model. Code on GitHub, model weights on Hugging Face. Download it today, point it at your data, and build with us.

What Are Tabular Foundation Models?

Every domain of machine learning has had its foundation model moment. Vision had ImageNet and then VLMs; language had BERT and then GPT. But tabular data — the Excel spreadsheets and SQL tables that actually drive business — is still waiting for its moment. The state of the art is XGBoost, fit and tuned from scratch on every problem.

A tabular foundation model is a single pretrained neural network that solves a brand-new tabular problem with zero training on that table. You never call .fit(). You show the model your labeled rows as examples and ask it to label the rest, exactly how you show a language model a few examples in a prompt. This is the power of in-context learning (ICL), applied to tables.

The high-level mechanism is the same one behind LLMs: a transformer trained on a vast and diverse distribution of tasks. The difference is what it is trained on. While an LLM pretrains on real-world text, a tabular foundation model pretrains on millions of synthetic datasets. Each dataset is generated from a random process such as a causal graph or a regression tree. While they may share a generating process, each generated training sample is unique, and the model is scored on predicting held-out rows from each one. Memorizing is useless when no table ever repeats, so the only winning strategy is to infer each table's rule from its context rows at inference time. Thus, a tabular foundation model learns more than a single way to do regression — it learns to learn from context.

None of this is hypothetical. Prior Labs proved that this approach works with TabPFN, with their latest TabPFN-3 reaching state-of-the-art performance. Synthefy-Nori brings stronger synthetic training data and a stronger architecture, leading the pack on a 96-dataset regression benchmark.

Without Synthefy: pick a model, train and tune, validate, then retrain as the data drifts. With Synthefy: one forward pass from your data to predictions

What's Special About Synthefy-Nori?

Synthefy-Nori is a 6M-parameter transformer that excels at regression tasks through in-context learning. It achieves better performance at a tenth of the size of comparable SOTA tabular models.

You hand it one table; internally it treats the labeled rows as context and the unlabeled rows as queries, and returns predictions with calibrated uncertainty in a single inference call. All of that capability comes from pretraining, not from your data. Synthefy's synthetic pretraining data is built from structural causal models, dedicated regression priors, and a deliberately broad set of real-world nuisances like class imbalance, label noise, missingness, correlated/redundant columns, heavy tails, and categorical-heavy tables. It's a harder, broader mix than what other leading models train on.

The architecture is purpose-built for tables. Every block interleaves two attentions. Sample attention runs across rows: query rows attend to the labeled context rows, literally looking up how similar labeled examples were labeled. Feature attention runs across columns: the model relates features to one another, discovering hidden interactions and ignoring irrelevant ones. Stacked over many blocks, this builds a dataset-specific "fit on the fly", purely from attention.

Synthefy-Nori doesn't emit just one number — it predicts a full set of quantiles, and a quantile-distribution decoder turns them into a point estimate plus calibrated prediction intervals, with proper heavy-tail handling.

The result is a model that memorizes nothing and predicts everything: one frozen network, one forward pass, regression, uncertainty included.

The Benchmark

We evaluated Synthefy-Nori-V1 on 96 regression datasets from three independent sources: TabArena (the modern tabular leaderboard), TALENT (a 72-dataset diversity suite), and OpenML-Reg (the classic open-data benchmark). Same train/test splits, same preprocessing, same hardware for every model.

Against the Model it Replaces

On TabArena's 13 regression datasets, we put Synthefy-Nori-V1 head-to-head against tuned XGBoost and LightGBM — specifically AutoGluon's best-quality preset, a 4-hour AutoML budget that tunes and ensembles them and more. With zero training and zero tuning, our tabular foundation model wins on 9 of the 13 datasets, and on the four it loses the gap is under 2%. TabArena skews toward larger, more modern datasets — the regime where gradient boosting is strongest — so this is a conservative place to draw the comparison, not a favorable one.

Synthefy-Nori-V1 wins 9 of 13 TabArena regression datasets against tuned XGBoost and LightGBM (AutoGluon best-quality, 4h), with every loss under 2%

Against the State of the Art

We're not the first to show that a foundation model can beat gradient boosting. Prior Labs pioneered tabular foundation models with TabPFN and has led the field. Their latest TabPFN-3 is the current strongest tabular foundation model.

Aggregated across all 96 datasets, Synthefy-Nori-V1 is #1 based on mean R², winning two of the three benchmark sources outright and trailing slightly on TabArena. Our current training curriculum is directed at closing that gap.

The lead isn't limited to point accuracy. On ScoringBench, the independent leaderboard for probabilistic forecasting, Synthefy-Nori-V1 ranks #1 in CRPS and across the beta-energy family of proper scoring rules — confirming that its predictions aren't just accurate on average, but well-calibrated across the full distribution.

Highest aggregate R² of any tabular foundation model: Synthefy-Nori-V1 at 0.7507 leads TabPFN-3, TabPFN-2.6, TabPFN-2.5, TabICLv2, and LimiX-2M on mean R² across 96 regression evaluations

Speed, Size, and Hardware

No model is both faster and more accurate. Synthefy-Nori-V1 is 6 million parameters, which makes it smaller than many XGBoost ensembles, while its nearest competitors carry 2–10× the parameters for the same or lower R². It runs on a single GPU: anything with ≥ 8 GB VRAM handles typical workloads (around 10K rows × 100 features), and only the largest contexts (50K rows × 1K features) call for an A100-class card.

Inference takes seconds, not hours. For example, on one H100, the diamonds regression (16K rows, 9 features) takes ~2.8 seconds end to end, and the heaviest benchmark dataset (4K rows, 1,024 features) about 14 seconds. The equivalent XGBoost cycle of training, tuning, validation, and drift-driven retraining is measured in hours per iteration, repeated over the model's lifetime. And because it ships pretrained, there's nothing to stand up: no cluster, no training pipelines, no experiment tracking. You just need one library call from one machine.

A tenth of the size of the model it beats: Synthefy-Nori-V1 at 6M parameters versus TabPFN-3 at 58.3M, with higher mean R²

Teaser: Thinking Mode

The benchmark numbers above come from our base model alone, but they aren't the ceiling. Thinking Mode is an inference-time capability that decides how to process each individual dataset before predicting on it: picking the right augmentations, normalizations, and preprocessing on the fly, with no human in the loop. The model effectively auto-tunes its own inference pipeline, per dataset.

The gains land where you need them — on large and hard datasets — and they compound in aggregate. For scale: Synthefy-Nori-V1's lead over TabPFN-3 (0.7507 vs 0.7477) already exceeds the entire jump from TabPFN-2.6 to TabPFN-3, and Thinking Mode pushes mean R² to 0.7531 — widening that lead by roughly 80%.

Thinking extends the lead: Synthefy-Nori-V1 + Thinking reaches 0.7531 mean R², above the base model and well ahead of TabPFN-3

We're not quite ready to share the architectural details. It's the part of the system that turns a strong foundation model into one that thinks before it predicts. There's more coming soon in a follow-up release.

What This Changes for You

The real cost of gradient boosting is not accuracy. It's the workflow and loop around the model: cleaning data to keep it happy, tuning hyperparameters, validating, monitoring drift, retraining, versioning, redeploying. In-context learning deletes that loop altogether.

The mental shift is the same one that happened with language models. Three years ago, you fine-tuned a separate model for every task — one for named entity recognition, another for intent detection, another for question answering. Today you call one LLM API and it handles all of them zero-shot. We're applying that approach to structured data. Here's what that means for each stage of the loop:

You stop cleaning data to keep the model happy. Missing values, imbalanced classes, noisy labels, redundant columns, heavy-tailed targets, tangled nonlinear interactions — Synthefy was pretrained on synthetic data deliberately built to contain all of it. It tolerates the mess that normally eats days of imputing, encoding, and feature-engineering before XGBoost will behave. With Synthefy, you just hand it the raw rows.
The training-and-tuning ritual evaporates. No model.fit(), no learning-rate sweeps, no early-stopping callbacks. You call .predict(X_train, y_train, X_test). The labeled rows become in-context examples, the unlabeled rows get predictions back in seconds, on a single GPU. And there are no knobs to turn — the model detects high dimensionality and projects it down, applies the right transform when a target is skewed, and configures its own preprocessing per dataset. The hyperparameter search that is an XGBoost project simply isn't part of the workflow.
Data drift becomes a non-event. Drift used to mean "spin up a training run." With in-context learning it means "send the new rows in as context." There's no model to retrain, no hyperparameters to revisit, no offline/online gap, no model-versioning sprawl.

Most time spent on data science projects isn't designing and training models. It's exploratory analysis, cleaning data, hyperparameter searching, and cross-validating endlessly — all the setup you do to make sure classical models can work. For a decade, we've adjusted our workflows to accommodate the classical models. Now, we've built a model that doesn't need that hand-holding. And that's the simple reason to pick up Synthefy-Nori today: less data wrangling, honest uncertainty, and zero tuning, all from a 6M-parameter model sitting at the top of a 96-dataset benchmark we didn't curate.

A Closer Look: When It Wins, It Wins Big

Across all 96 datasets the two models usually tie, which is why the average margin reads as modest. But that average hides where it counts: of the 11 datasets where either model has a decisive edge (>0.02 R²), Synthefy-Nori-V1 takes 8 — and by the widest margins. The standout is Job Profitability, where R² climbs from 0.14 to 0.41, tripling the explained variance. socmob and sulfur win under a second independent benchmark suite as well, so these aren't harness flukes — they're real, public datasets anyone can check.

Where Nori wins big: the datasets where it beats TabPFN-3 by more than noise — Job Profitability 0.14→0.41, socmob 0.78→0.89, SAT11-HAND 0.70→0.78, WLAN RSSI 0.89→0.94, sulfur 0.88→0.91 and more, 8 of the 11 decisive matchups

And it gets there fast. On the small-to-mid tables in-context models are built for — up to ~100k cells, about half the benchmark — Nori returns predictions in roughly a second, faster than TabPFN-3 in every size band, at 6M parameters versus 58.3M. There's no training run and no cluster to stand up: one library call on a single GPU. (Very large tables, past ~100k cells, still favor a quick gradient-boosted model — we'd rather be straight with you about that.)

Median wall-clock latency on the 48 benchmark datasets up to 100k cells: Synthefy-Nori-V1 at 6M parameters is faster than TabPFN-3 at 58.3M in every table-size band

Get Started

Synthefy-Nori is available now.

Code on GitHub: github.com/Synthefy/synthefy-nori
Model weights on Hugging Face: huggingface.co/Synthefy/Nori
Documentation: docs.synthefy.com

If you've spent the last decade tuning XGBoost on tabular data, this is the upgrade. Download it, point it at the dataset you're retraining right now, and compare.

If you have any questions or want to collaborate, join our Discord community or reach out at insights@synthefy.com.