GitPedia

Predictive maintenance

Datasets for Predictive Maintenance

From kokikwbt·Updated June 3, 2026·View on GitHub·

This repository is intended to enable quick access to datasets for predictive maintenance (PM) tasks (under development). The following table summarizes the available features, where the mark \* on dataset names shows the richness of attributes you may check them up with higher priority. Note that RUL means remaining useful life. The project is written primarily in Jupyter Notebook, distributed under the MIT License license, first published in 2021. Key topics include: ai-engineering, anomaly, anomaly-detection, automation, condition-based-maintenance.

Predictive Maintenance

This repository is intended to enable quick access to datasets for predictive maintenance (PM) tasks (under development).
The following table summarizes the available features,
where the mark * on dataset names shows
the richness of attributes you may check them up with higher priority.
Note that RUL means remaining useful life.

<!-- :white_check_mark: --> <!-- :ballot_box_with_check: --> <center>
Timestamp#Sensor#AlarmRUL License
ALPI*x140CC-BY
CBMx163Other
CMAPSSx262-6xCC0: Public Domain
GDDx5(1)3CC-BY-NC-SA
GFDx42CC-BY-SA
HydSys*x172-4Other
MAPM*x45xOther
PPDx25xCC-BY-SA
UFD37-524Other
</center> <!-- | NASA-B | | | | | Other | | CWRU-B | | | | | CC-BY-SA | -->

Installation

  • Python=3.7
  • pandas=1.1.2

Usage

Please put datasets directory into your workspace and import it like:

python
import datasets # Dataset-specific values will be returned datasets.ufd.load_data() # A visualization pdf will be generated datasets.ufd.gen_summary()

Each dataset class has the following functions:

  • load_data(index):
    Dataset loading specified by 'index'.
    Please see README.md in each dataset directory for more details.
  • gen_summary(outdir):
    PDF file generation for full dataset visualization.

Features

Run-to-Falure

Run-to-Falure data require:

  • time column
  • event/cencoring column (categorical)
  • numerical/categorical feature columns (optional)

Notebooks

There are Jupyter notebooks for all datasets,
which may help interactive data processing and visualization.

References

Introduction to Predictive Maintenance

  1. Wikipedia:
    https://en.wikipedia.org/wiki/Predictive_maintenance
  2. Azure AI guide for predictive maintenance solutions:
    https://docs.microsoft.com/en-us/azure/architecture/data-science-process/predictive-maintenance-playbook
  3. Open source python package for Survival Analysis modeling:
    https://square.github.io/pysurvival/index.html
  4. Types of proactive maintenance:
    https://solutions.borderstates.com/types-of-proactive-maintenance/
  5. Common license types for datasets:
    https://www.kaggle.com/general/116302

Dataset Sources

  1. ALPI: Diego Tosato, Davide Dalle Pezze, Chiara Masiero, Gian Antonio Susto, Alessandro Beghi, 2020. Alarm Logs in Packaging Industry (ALPI).
    https://ieee-dataport.org/open-access/alarm-logs-packaging-industry-alpi
  2. CBM: Condition Based Maintenance of Naval Propulsion Plants Data Set
    http://archive.ics.uci.edu/ml/datasets/condition+based+maintenance+of+naval+propulsion+plants
  3. CMAPSS: NASA Turbofan Jet Engine Data Set:
    https://www.kaggle.com/behrad3d/nasa-cmaps
  4. GDD: Genesis demonstrator data for machine learning:
    https://www.kaggle.com/inIT-OWL/genesis-demonstrator-data-for-machine-learning
  5. GFD: Gearbox Fault Diagnosis:
    https://www.kaggle.com/brjapon/gearbox-fault-diagnosis
  6. HydSys: Predictive Maintenance Of Hydraulics System:
    https://archive.ics.uci.edu/ml/datasets/Condition+monitoring+of+hydraulic+systems
  7. MAPM: Microsoft Azure Predictive Maintenance:
    https://www.kaggle.com/arnabbiswas1/microsoft-azure-predictive-maintenance
  8. PPD: Production Plant Data for Condition Monitoring:
    https://www.kaggle.com/inIT-OWL/production-plant-data-for-condition-monitoring
  9. UFD: Ultrasonic flowmeter diagnostics Data Set:
    https://archive.ics.uci.edu/ml/datasets/Ultrasonic+flowmeter+diagnostics

TODO

  1. Birkl, Christoph. Oxford Battery Degradation Dataset 1. University of Oxford, 2017.
    https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac
  2. Lu, Jiahuan; Xiong, Rui; Tian, Jinpeng; Wang, Chenxu; Hsu, Chia-Wei; Tsou, Nien-Ti; Sun, Fengchun; Li, Ju (2021), “Battery Degradation Dataset (Fixed Current Profiles&Arbitrary Uses Profiles)”, Mendeley Data, V2.
    https://data.mendeley.com/datasets/kw34hhw7xg/2
  3. One Year Industrial Component Degradation
    https://www.kaggle.com/inIT-OWL/one-year-industrial-component-degradation
  4. Vega shrink-wrapper component degradation
    https://www.kaggle.com/inIT-OWL/vega-shrinkwrapper-runtofailure-data
  5. NASA Bearing Dataset:
    https://www.kaggle.com/vinayak123tyagi/bearing-dataset
  6. CWRU Bearing Dataset:
    https://www.kaggle.com/brjapon/cwru-bearing-datasets

License

All the matrials except for datasets is available under MIT lincense.
I preserve all raw data but atatch data loading and preprocessing tools
to each dataset directory so that they are quickly used in Python.
Each dataset should be used under its own lincense.

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from kokikwbt/predictive-maintenance via the GitHub API.Last fetched: 6/24/2026