TSCV
Time Series Cross-Validation -- an extension for scikit-learn
This repository is a [scikit-learn](https://scikit-learn.org) extension for time series cross-validation. It introduces **gaps** between the training set and the test set, which mitigates the temporal dependence of time series and prevents information leakage. The project is written primarily in Python, distributed under the BSD 3-Clause "New" or "Revised" License license, first published in 2019. Key topics include: backtesting, cross-validation, data-science, hyperparameter-optimization, machine-learning.
TSCV: Time Series Cross-Validation
This repository is a scikit-learn extension for time series cross-validation.
It introduces gaps between the training set and the test set, which mitigates the temporal dependence of time series and prevents information leakage.
Installation
bashpip install tscv
or
bashconda install -c conda-forge tscv
Usage
This extension defines 3 cross-validator classes and 1 function:
GapLeavePOutGapKFoldGapRollForwardgap_train_test_split
The three classes can all be passed, as the cv argument, to
scikit-learn functions such as cross-validate, cross_val_score,
and cross_val_predict, just like the native cross-validator classes.
The one function is an alternative to the train_test_split function in scikit-learn.
Examples
The following example uses GapKFold instead of KFold as the cross-validator.
pythonimport numpy as np from sklearn import datasets from sklearn import svm from sklearn.model_selection import cross_val_score from tscv import GapKFold iris = datasets.load_iris() clf = svm.SVC(kernel='linear', C=1) # use GapKFold as the cross-validator cv = GapKFold(n_splits=5, gap_before=5, gap_after=5) scores = cross_val_score(clf, iris.data, iris.target, cv=cv)
The following example uses gap_train_test_split to split the data set into the training set and the test set.
pythonimport numpy as np from tscv import gap_train_test_split X, y = np.arange(20).reshape((10, 2)), np.arange(10) X_train, X_test, y_train, y_test = gap_train_test_split(X, y, test_size=2, gap_size=2)
Contributing
- Report bugs in the issue tracker
- Express your use cases in the issue tracker
Documentations
Acknowledgments
- I would like to thank Jeffrey Racine and Christoph Bergmeir for the helpful discussion.
License
BSD-3-Clause
Citation
Wenjie Zheng. (2021). Time Series Cross-Validation (TSCV): an extension for scikit-learn. Zenodo. http://doi.org/10.5281/zenodo.4707309
latex@software{zheng_2021_4707309, title={{Time Series Cross-Validation (TSCV): an extension for scikit-learn}}, author={Zheng, Wenjie}, month={april}, year={2021}, publisher={Zenodo}, doi={10.5281/zenodo.4707309}, url={http://doi.org/10.5281/zenodo.4707309} }
Contributors
Showing top 3 contributors by commit count.
