GitPedia

Tabular Survey

Awesome Tabular Deep Learning for "Representation Learning for Tabular Data: A Comprehensive Survey"

From LAMDA-Tabular·Updated June 13, 2026·View on GitHub·

Awesome Tabular Deep Learning for "[Representation Learning for Tabular Data: A Comprehensive Survey](https://arxiv.org/abs/2504.16109)". If you use any content of this repo for your work, please cite the following bib entry: The project is distributed under the MIT License license, first published in 2025. Key topics include: tabular, tabular-data, tabular-data-machine-learning, tabular-deep-learning, tabular-methods.

Representation Learning for Tabular Data: A Comprehensive Survey

Awesome Tabular Deep Learning for "Representation Learning for Tabular Data: A Comprehensive Survey". If you use any content of this repo for your work, please cite the following bib entry:

@article{jiang2026tabularsurvey,
         journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
         title={Representation Learning for Tabular Data: A Comprehensive Survey},
         author={Jun-Peng Jiang and
                 Si-Yang Liu and
                 Hao-Run Cai and
                 Qi-Le Zhou and
                 Han-Jia Ye},
         year={2026},
         volume={48},
         number={6},
         pages={6488-6508}
}


@article{jiang2025tabularsurvey,
         title={Representation Learning for Tabular Data: A Comprehensive Survey}, 
         author={Jun-Peng Jiang and
                 Si-Yang Liu and
                 Hao-Run Cai and
                 Qile Zhou and
                 Han-Jia Ye},
         journal={arXiv preprint arXiv:2504.16109},
         year={2025}
}

Feel free to create new issues or drop me an email if you find any interesting paper missing in our survey, and we shall include them in the next version.

Updates

[01/2026] Accepted to TPAMI.

[04/2025] arXiv paper has been released.

[04/2025] The repository has been released.

Introduction

Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks (DNNs) recently demonstrating promising results through their capability of representation learning.
In this survey, we systematically introduce the field of tabular representation learning, covering the background, challenges, and benchmarks, along with the pros and cons of using DNNs.
We organize existing methods into three main categories according to their generalization capabilities: specialized, transferable, and general models. Specialized models focus on tasks where training and evaluation occur within the same data distribution. We introduce a hierarchical taxonomy for specialized models based on the key aspects of tabular data—features, samples, and objectives—and delve into detailed strategies for obtaining high-quality feature- and sample-level representations.
Transferable models are pre-trained on one or more datasets and subsequently fine-tuned on downstream tasks, leveraging knowledge acquired from homogeneous or heterogeneous sources, or even cross-modalities such as vision and language.
General models, also known as tabular foundation models, extend this concept further, allowing direct application to downstream tasks without additional fine-tuning. We group these general models based on the strategies used to adapt across heterogeneous datasets.
Additionally, we explore ensemble methods, which integrate the strengths of multiple tabular models. Finally, we discuss representative extensions of tabular learning, including open-environment tabular machine learning, multimodal learning with tabular data, and tabular understanding tasks.

<div align="center"> <img src="resources/taxo.png" width="90%"> </div> <div align="center"> <img src="resources/Tabular_Deep_Learning.png" width="90%"> </div>

Some Basic Resources

Benchmarks

DateNamePaperPublicationCode
2026TaR-ViRBeyond Text-Only: Towards Multimodal Table Retrieval in Open-WorldICLRCode
2026TABLETTABLET: A Large-Scale Dataset for Robust Visual Table UnderstandingICLRCode
2026SPARTASPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and TablesICLRCode
2026TabStructTabStruct: Measuring Structural Fidelity of Tabular DataICLRCode
2026TopBenchTopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question AnsweringICMLCode
2026TEmBedTowards Universal Tabular Embeddings: A Benchmark Across Data TasksCoRRCode
2026Tabular DL Optimizer BenchmarkBenchmarking Optimizers for MLPs in Tabular Deep LearningCoRRCode
2025TabArenaTabArena: A Living Benchmark for Machine Learning on Tabular DataNeurIPSCode
2025MLE-BenchMLE-Bench: Evaluating Machine Learning Agents on Machine Learning EngineeringICLRCode
2025TabReDTabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning BenchmarksICLRCode
2024Data-Centric BenchmarkA Data-Centric Perspective on Evaluating Machine Learning Models for Tabular DataNeurIPSCode
2024Better-by-DefaultBetter by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular DataNeurIPSCode
2024LAMDA-Tabular-BenchA Closer Look at Deep Learning Methods on Tabular DatasetsCoRRCode
2024DMLR-ICLR24-Datasets-for-BenchmarkingTowards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine LearningDMLRCode
2023TableShiftBenchmarking Distribution Shift in Tabular Data with TableShiftNeurIPSCode
2023TabZillaWhen Do Neural Nets Outperform Boosted Trees on Tabular Data?NeurIPSCode
2023EncoderBenchmarkingA benchmark of categorical encoders for binary classificationNeurIPSCode
2022Grinsztajn et al. BenchmarkWhy do tree-based models still outperform deep learning on tabular data?NeurIPSCode
2021RTDLRevisiting Deep Learning Models for Tabular DataNeurIPSCode
2021WellTunedSimpleNetsWell-tuned Simple Nets Excel on Tabular DatasetsNeurIPSCode

Awesome Deep Tabular Toolboxs

  • RTDL: A collection of papers and packages on deep learning for tabular data.
  • TALENT: A comprehensive toolkit and benchmark for tabular data learning, featuring 30 deep methods, more than 10 classical methods, and 300 diverse tabular datasets.
  • pytorch_tabular: A standard framework for modelling Deep Learning Models for tabular data.
  • pytorch-frame: A modular deep learning framework for building neural network models on heterogeneous tabular data.
  • DeepTables: An easy-to-use toolkit that enables deep learning to unleash great power on tabular data.
  • AutoGluon: A toolbox which automates machine learning tasks and enables to easily achieve strong predictive performance.
  • ...

Other Awesome Repositories

TabPFN and its extensions

Some summary repositories

Specialized Methods

DateNamePaperPublicationCode
2025xRFMxRFM: Accurate, scalable, and interpretable feature learning models for tabular dataCoRRStatic Badge
2025RFMMechanism for feature learning in neural networks and backpropagation-free machine learning modelsScienceStatic Badge
2025TabAutoPNPNetLeveraging Periodicity for Tabular Deep LearningElectronicsStatic Badge
2025ModernNCARevisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades LaterICLRStatic Badge
2025TabMTabM: Advancing Tabular Deep Learning with Parameter-Efficient EnsemblingICLRStatic Badge
2024ExcelFormerCan a deep learning model be a sure bet for tabular prediction?KDDStatic Badge
2024AMFormerArithmetic feature interaction is necessary for deep tabular learningAAAIStatic Badge
2024GRANDEGRANDE: gradient-based decision tree ensembles for tabular dataICLRStatic Badge
2024DOFENDOFEN: Deep Oblivious Forest ENsembleNeurIPSStatic Badge
2024RealMLPBetter by default: Strong pre-tuned mlps and boosted trees on tabular dataNeurIPSStatic Badge
2024BiSHopBishop: Bi-directional cellular learning for tabular data with generalized sparse modern hopfield modelICMLStatic Badge
2024SwitchTabSwitchtab: Switched autoencoders are effective tabular learnersAAAI
2024PTaRLPtarl: Prototype-based tabular representation learning via space calibrationICLRStatic Badge
2024TabRTabr: Tabular deep learning meets nearest neighbors in 2023ICLRStatic Badge
2023An inductive bias for tabular deep learningNeurIPS
2023TabRetTabret: Pre-training transformer-based tabular models for unseen columnsCoRRStatic Badge
2023TromptTrompt: Towards a better deep neural network for tabular dataICML
2023TANGOSTangos: Regularizing tabular neural networks through gradient orthogonalization and specializationICLRStatic Badge
2022MLP-PLROn embeddings for numerical features in tabular deep learningNeurIPSStatic Badge
2022SAINTSAINT: Improved neural networks for tabular data via row attention and contrastive pre-trainingNeurIPS WSStatic Badge
2022DANetsDanets: Deep abstract networks for tabular data classification and regressionAAAIStatic Badge
2022DNNRDNNR: differential nearest neighbors regressionICMLStatic Badge
2022HopularHopular: Modern hopfield networks for tabular dataCoRRStatic Badge
2022LSPINLocally Sparse Neural Networks for Tabular Biomedical DataICMLStatic Badge
2021Net-DNFNet-DNF: Effective Deep Modeling of Tabular DataICLR
2021FT-TransformerRevisiting deep learning models for tabular dataNeurIPSStatic Badge
2021TabNetTabnet: Attentive interpretable tabular learningAAAIStatic Badge
2021DCNv2DCN V2: improved deep & cross network and practical lessons for web-scale learning to rank systemsWWWStatic Badge
2021Well-tuned simple nets excel on tabular datasetsNeurIPSStatic Badge
2021NPTSelf-attention between datapoints: Going beyond individual input-output pairs in deep learningNeurIPSStatic Badge
2020Survey on categorical data for neural networksJournal of big data
2020TabTransformerTabtransformer: Tabular data modeling using contextual embeddingsCoRRStatic Badge
2020GrowNetGradient boosting neural networks: GrownetCoRRStatic Badge
2020NODENeural oblivious decision ensembles for deep learning on tabular dataICLRStatic Badge
2020STGFeature Selection using Stochastic GatesICMLStatic Badge
2019AutoIntAutoint: Automatic feature interaction learning via self-attentive neural networksCIKMStatic Badge
2018RLNsRegularization learning networks: deep learning for tabular datasetsNeurIPSStatic Badge
2017SNNSelfnormalizing neural networksNIPSStatic Badge

Transferable Methods

DateNamePaperPublicationCode
2026Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical ReasoningCoRR
2025A survey on self-supervised learning for non-sequential tabular dataMachine LearningStatic Badge
2025Tab2VisualTab2Visual: Overcoming Limited Data in Tabular Data Classification Using Deep Learning with Visual RepresentationsCoRR
2024LFRSelf-supervised representation learning from random data projectorsICLR
2024UniTabEUniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data ScienceICLR
2024CM2Towards cross-table masked pretraining for web data miningWWWStatic Badge
2024TP-BERTaMaking pre-trained language models great on tabular predictionICLRStatic Badge
2024CARTECARTE: pretraining and transfer for tabular learningICMLStatic Badge
2024FeatLLMLarge language models can automatically engineer features for few-shot tabular learningICMLStatic Badge
2024LM-IGTDLM-IGTD: a 2d image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networksCoRR
2023DoRADora: Domain-based self-supervised learning framework for low-resource real estate appraisalCIKMStatic Badge
2023Transfer learning with deep tabular modelsICLR
2023ReConTabRecontab: Regularized contrastive representation learning for tabular dataCoRR
2023TabRetTabret: Pre-training transformer-based tabular models for unseen columnsCoRRStatic Badge
2023ORCACross-modal fine-tuning: Align then refineICMLStatic Badge
2023TabTokenUnlocking the transferability of tokens in deep models for tabular dataCoRR
2023Transfer learning with deep tabular modelsICLR
2023XtabXtab: Cross-table pretraining for tabular transformersICMLStatic Badge
2023Meta-TransformerMeta-transformer: A unified framework for multimodal learningCoRRStatic Badge
2023BinderBinding language models in symbolic languagesICLRStatic Badge
2023CAAFELarge language models for automated data science: Introducing caafe for context-aware automated feature engineeringNeurIPSStatic Badge
2023TaPTaPGenerative table pre-training empowers models for tabular predictionEMNLPStatic Badge
2023TabLLMTabllm: few-shot classification of tabular data with large language modelsAISTATSStatic Badge
2023UniPredictUnipredict: Large language models are universal tabular predictorsCoRR
2023TablEyeTableye: Seeing small tables through the lens of imagesCoRR
2022Revisiting pretraining objectives for tabular deep learningCoRRStatic Badge
2022SEFSSelf-supervision enhanced feature selection with correlated gatesICLRStatic Badge
2022METMET: masked encoding for tabular dataCoRR
2022SAINTSAINT: Improved neural networks for tabular data via row attention and contrastive pre-trainingNeurIPS WSStatic Badge
2022SCARFScarf: Self-supervised contrastive learning using random feature corruptionICLR
2022StabStab: Self-supervised learning for tabular dataNeurIPS WS
2022DENDistribution embedding networks for generalization from a diverse set of classification tasks
2022TransTabTranstab: Learning transferable tabular transformers across tablesNeurIPSStatic Badge
2022PtabPtab: Using the pre-trained language model for modeling tabular dataCoRR
2022LIFTLIFT: language-interfaced fine-tuning for non-language machine learning tasksNeurIPSStatic Badge
2021SubTabSubtab: Subsetting features of tabular data for self-supervised representation learningNeurIPSStatic Badge
2021DACLTowards domain-agnostic contrastive learningICML
2021IGTDConverting tabular data into images for deep learning with convolutional neural networksScientific reportsStatic Badge
2020VIMEVIME: extending the success of self- and semi-supervised learning to tabular domainNeurIPSStatic Badge
2020Meta-learning from tasks with heterogeneous attribute spacesNeurIPS
2020TACA novel method for classification of tabular data using convolutional neural networksBiorxiv
2019Super-TMLSupertml: Two-dimensional word embedding for the precognition on structured tabular dataCVPR WS

General Methods

DateNamePaperPublicationCode
2026RelatronRelatron: Automating Relational Machine Learning over Relational DatabasesICLRCode
2026CausalFMFoundation Models for Causal Inference via Prior-Data Fitted NetworksICLRCode
2025TabSTAR*TabSTAR: A Tabular Foundation Model for Tabular Data with Text FieldsNeurIPSStatic Badge
2025Mitra*Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation ModelsNeurIPSStatic Badge
2025TabDPT*Tabdpt: Scaling tabular foundation modelsNeurIPSStatic Badge
2025EquiTabPFN*Equitabpfn: A targetpermutation equivariant prior fitted networksNeurIPSStatic Badge
2025LimiXLimiX: Unleashing Structured-Data Modeling Capability for Generalist IntelligenceCoRRStatic Badge
2025Beta*Tabpfn unleashed: A scalable and effective solution to tabular classification problemsICMLStatic Badge
2025MotherNetMotherNet: Fast Training and Inference via Hyper-Network TransformersICLRStatic Badge
2025TabPFN v2Accurate predictions on small data with a tabular foundation modelNatureStatic Badge
2025TabForestPFN*Fine-tuned in-context learning transformers are excellent tabular data classifiersCoRR
2025APT*Zero-shot meta-learning for tabular prediction tasks with adversarially pre-trained transformerCoRRStatic Badge
2025TabICL*Tabicl: A tabular foundation model for in-context learning on large dataICMLStatic Badge
2025*Scalable in-context learning on tabular data via retrieval-augmented large language modelsCoRR
2024HyperFastHyperfast: Instant classification for tabular dataAAAIStatic Badge
2024MIXTUREPFN*Mixture of incontext prompters for tabular pfnsCoRR
2024LoCalPFN*Retrieval & fine-tuning for in-context tabular modelsNeurIPSStatic Badge
2024LE-TabPFN*Towards localization via data embedding for tabPFNNeurIPS WS
2024TabFlex*Tabflex: Scaling tabular learning to millions with linear attentionNeurIPS WS
2024*Exploration of autoregressive models for in-context learning on tabular dataNeurIPS WS
2024TabuLa-8BLarge scale transfer learning for tabular data via language modelingNeurIPSStatic Badge
2024GTLFrom supervised to generative: A novel paradigm for tabular deep learning with large language modelsKDDStatic Badge
2024MediTabMeditab: Scaling medical tabular data predictors via data consolidation, enrichment, and refinementIJCAI
2023TabPTMTraining-free generalization on heterogeneous tabular data via meta-representationCoRR
2023TabPFNTabpfn: A transformer that solves small tabular classification problems in a secondICLRStatic Badge

* denotes that the method is a variation of TabPFN, some of which requires fine-tuning for downstream tasks.

Ensemble Methods

DateNamePaperPublicationCode
2025TabMTabM: Advancing Tabular Deep Learning with Parameter-Efficient EnsemblingICLRStatic Badge
2025TabPFN v2Accurate predictions on small data with a tabular foundation modelNatureStatic Badge
2025BetaTabpfn unleashed: A scalable and effective solution to tabular classification problemsCoRR
2025LLM-Boost, PFN-BoostTransformers Boost the Performance of Decision Trees on Tabular Data across Sample SizesCoRRStatic Badge
2024HyperFastHyperfast: Instant classification for tabular dataAAAIStatic Badge
2024GRANDEGRANDE: gradient-based decision tree ensembles for tabular dataICLRStatic Badge
2023TabPTMTraining-free generalization on heterogeneous tabular data via meta-representationCoRR
2023TabPFNTabpfn: A transformer that solves small tabular classification problems in a secondICLRStatic Badge
2020TabTransformerTabtransformer: Tabular data modeling using contextual embeddingsCoRRStatic Badge
2020GrowNetGradient boosting neural networks: GrownetCoRRStatic Badge
2020NODENeural oblivious decision ensembles for deep learning on tabular dataICLRStatic Badge

Extensions

Clustering

Anomaly Detection

Tabular Generation

Interpretability

Open-Environment Tabular Machine Learning

Multi-modal Learning with Tabular Data

Tabular Understanding

Please refer to Awesome-Tabular-LLMs for more information.

Workshops

Acknowledgment

This repo is modified from TALENT.

Correspondence

This repo is developed and maintained by Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qile Zhou, and Han-Jia Ye. If you have any questions, please feel free to contact us by opening new issues or email:

Contributors

Showing top 5 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from LAMDA-Tabular/Tabular-Survey via the GitHub API.Last fetched: 6/24/2026