GitPedia

CTGAN

Conditional GAN for generating synthetic tabular data.

From sdv-dev·Updated June 8, 2026·View on GitHub·

This repository is part of The Synthetic Data Vault Project, a project from DataCebo. The project is written primarily in Python, distributed under the Other license, first published in 2019. It has gained significant community traction with 1,560 stars and 329 forks on GitHub. Key topics include: data-generation, generative-adversarial-network, synthetic-data, synthetic-data-generation, tabular-data.

Latest release: v0.12.1
February 13, 2026View Changelog →
<div align="center"> <br/> <p align="center"> <i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i> </p>

Development Status
PyPI Shield
Unit Tests
Downloads
Coverage Status
Forum

<div align="left"> <br/> <p align="center"> <a href="https://github.com/sdv-dev/CTGAN"> <img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/CTGAN-DataCebo.png"></img> </a> </p> </div> </div>

Overview

CTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity.

Important Links
:computer: WebsiteCheck out the SDV Website for more information about our overall synthetic data ecosystem.
:orange_book: BlogA deeper look at open source, synthetic data creation and evaluation.
:book: DocumentationQuickstarts, User and Development Guides, and API Reference.
:octocat: RepositoryThe link to the Github Repository of this library.
:keyboard: Development StatusThis software is in its Pre-Alpha stage.
:busts_in_silhouette: DataCebo ForumDiscuss CTGAN features, ask questions, and receive help.

Currently, this library implements the CTGAN and TVAE models described in the Modeling Tabular data using Conditional GAN paper, presented at the 2019 NeurIPS conference.

Install

Use CTGAN through the SDV library

:warning: If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. :warning:

The SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. See the SDV documentation to get started.

Use the CTGAN standalone library

Alternatively, you can also install and use CTGAN directly, as a standalone library:

Using pip:

bash
pip install ctgan

Using conda:

bash
conda install -c pytorch -c conda-forge ctgan

When using the CTGAN library directly, you may need to manually preprocess your data into the correct format, for example:

  • Continuous data must be represented as floats
  • Discrete data must be represented as ints or strings
  • The data should not contain any missing values

Usage Example

In this example we load the Adult Census Dataset* which is a built-in demo dataset. We use CTGAN to learn from the real data and then generate some synthetic data.

python3
from ctgan import CTGAN from ctgan import load_demo real_data = load_demo() # Names of the columns that are discrete discrete_columns = [ 'workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country', 'income', ] ctgan = CTGAN(epochs=10) ctgan.fit(real_data, discrete_columns) # Create synthetic data synthetic_data = ctgan.sample(1000)

*For more information about the dataset see:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].
Irvine, CA: University of California, School of Information and Computer Science.

Join our community

Join our forum to discuss more about CTGAN, ask questions, and receive help.

Interested in contributing to CTGAN? Read our Contribution Guide to get started.

Citing CTGAN

If you use CTGAN, please cite the following work:

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. Modeling Tabular data using Conditional GAN. NeurIPS, 2019.

LaTeX
@inproceedings{ctgan, title={Modeling Tabular data using Conditional GAN}, author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan}, booktitle={Advances in Neural Information Processing Systems}, year={2019} }

Related Projects

Please note that these projects are external to the SDV Ecosystem. They are not affiliated with or maintained by DataCebo.


<div align="center"> <a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png"></img></a> </div> <br/> <br/>

The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we
created DataCebo in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of SDV, the largest ecosystem for
synthetic data generation & evaluation. It is home to multiple libraries that support synthetic
data, including:

  • 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
  • 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
    multi table and time series data.
  • 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
    generation models.

Get started using the SDV package -- a fully
integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries
for specific needs.

Contributors

Showing top 12 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from sdv-dev/CTGAN via the GitHub API.Last fetched: 6/22/2026