raphaelvallat/pingouin
Statistical package in Python based on Pandas
📦 Summary
- Minor release with one new feature, several bugfixes, and internal improvements.
✨ New features
- `compute_effsize`: added `eftype='cohen_dz'` for paired-samples designs ($d_z = \bar{X-Y} / \sigma_{X-Y}$); `y` now also accepts a scalar for one-sample effect sizes (#508)
📦 Improvements
- `compute_bootci`: replaced custom bootstrap implementation with `scipy.stats.bootstrap`; default CI method upgraded to BCa; minimum SciPy bumped to 1.10 (#505)
- `intraclass_corr`: updated ICC type labels in output dataframe and documentation (#501)
🐛 Bugfixes
- `partial_corr` / `pcorr`: fixed numerical instability when variables differ by many orders of magnitude (#510)
- `partial_corr`: raise `ValueError` on identical covariates; warn on rank-deficient covariance matrix (#500)
- `bayesfactor_pearson`: fixed catastrophic float64 cancellation in one-sided tests for strongly negative `r` (#503)
- `logistic_regression`: fixed compatibility with scikit-learn >= 1.8 (#504)
💥 Breaking changes
- Removed `plot_shift` function (#502)
- ***
📋 What's Changed
- Fix broken links in doc; add uv install instructions; use uv in ruff workflow; do not trigger pytest/coverage workflow for doc-only PRs by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/496
- Use relative imports for intra-package dependencies by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/499
- fix(partial_corr): raise on identical covariates, warn on rank-defici… by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/500
- Update ICC types in output dataframe and documentation by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/501
- Fix catastrophic float64 cancellation in `bayesfactor_pearson` one-si… by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/503
- Fix test logistic regression by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/504
- Remove plot_shift function by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/502
- Add pre-commit hooks and consolidate dev dependencies by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/506
- + 5 more
📋 Changes
- Disable one-sided Bayes Factor for T-tests, which were ill-defined ([PR487](https://github.com/raphaelvallat/pingouin/pull/487))
- Update RBC calculation for Wilcoxon signed-rank test to be dependent on the alternative ([PR457](https://github.com/raphaelvallat/pingouin/pull/457))
- Sphericity fix with very low eigenvalues ([PR482](https://github.com/raphaelvallat/pingouin/pull/482))
- Fix divide-by-zero in internal _correl_pvalue when r == 1 ([PR474](https://github.com/raphaelvallat/pingouin/pull/474)
- Fix boxplot z-order in [pingouin.plot_paired()](https://pingouin-stats.org/generated/pingouin.plot_paired.html#pingouin.plot_paired) ([PR442](https://github.com/raphaelvallat/pingouin/pull/442))
- Column names update ([PR443](https://github.com/raphaelvallat/pingouin/pull/443)): removed characters that restrict column access to the bracket format (df[“p_val”]) rather than dot method (df.p_val). This includes:
- Replaced dashes with underscores in column names (e.g., p-val –> p_val)
- Replaced parentheses with underscores in column names (e.g., mean(A) –> mean_A)
- + 9 more
📋 What's Changed
- Fix boxplot z-order in `plot_paired` by @sbwiecko in https://github.com/raphaelvallat/pingouin/pull/442
- Fix the Github Action CI for Python tests by @remrama in https://github.com/raphaelvallat/pingouin/pull/445
- Update parametric.py - fix typo by @Petemir in https://github.com/raphaelvallat/pingouin/pull/448
- Clean dependencies; add extras feature by @getzze in https://github.com/raphaelvallat/pingouin/pull/451
- Pandas-friendly column names by @remrama in https://github.com/raphaelvallat/pingouin/pull/443
- replaces black/flake8 formatting/linting with ruff and ensures numpy 2.0 compatibility by @remrama in https://github.com/raphaelvallat/pingouin/pull/446
- Updated deprecated Seaborn function by @sjg2203 in https://github.com/raphaelvallat/pingouin/pull/459
- Fix ruff + CI by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/460
- + 11 more
✨ New Contributors
- @sbwiecko made their first contribution in https://github.com/raphaelvallat/pingouin/pull/442
- @Petemir made their first contribution in https://github.com/raphaelvallat/pingouin/pull/448
- @rhazn made their first contribution in https://github.com/raphaelvallat/pingouin/pull/457
- @AlexanderJCS made their first contribution in https://github.com/raphaelvallat/pingouin/pull/474
- Full Changelog: https://github.com/raphaelvallat/pingouin/compare/v0.5.5...v0.6.0
📋 What's Changed
- Fix penalty for LogisticRegression by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/403
- Switch to modern python packaging by @getzze in https://github.com/raphaelvallat/pingouin/pull/406
- Remove call to sns.despine by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/410
- Updated deprecated function by @sjg2203 in https://github.com/raphaelvallat/pingouin/pull/414
- Add errstate(divide="ignore") in Bayes Factor calculation by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/415
- Remove inplace on single column by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/423
- Fix RBC sign in mwu by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/424
- Overhaul documentation (pydata_sphinx_theme) by @yann1cks in https://github.com/raphaelvallat/pingouin/pull/432
- + 1 more
✨ New Contributors
- @getzze made their first contribution in https://github.com/raphaelvallat/pingouin/pull/406
- @sjg2203 made their first contribution in https://github.com/raphaelvallat/pingouin/pull/414
- @yann1cks made their first contribution in https://github.com/raphaelvallat/pingouin/pull/432
- Full Changelog: https://github.com/raphaelvallat/pingouin/compare/v0.5.4...v0.5.5
📋 What's Changed
- Minor typo fix in docs by @musicinmybrain in https://github.com/raphaelvallat/pingouin/pull/329
- clip r values by @remrama in https://github.com/raphaelvallat/pingouin/pull/342
- fix: deprecated parameter by @bitsnaps in https://github.com/raphaelvallat/pingouin/pull/341
- hotfix: CI crash in test_power_chi2 [WIP] by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/344
- hotfix: plot_rm_corr crash with specific column names by @remrama in https://github.com/raphaelvallat/pingouin/pull/351
- Add check for noncentrality parameters. by @agkphysics in https://github.com/raphaelvallat/pingouin/pull/347
- Use pyupgrade by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/364
- fix groupby.mean for only numeric values by @jajcayn in https://github.com/raphaelvallat/pingouin/pull/363
- + 11 more
✨ New Contributors
- @musicinmybrain made their first contribution in https://github.com/raphaelvallat/pingouin/pull/329
- @bitsnaps made their first contribution in https://github.com/raphaelvallat/pingouin/pull/341
- @agkphysics made their first contribution in https://github.com/raphaelvallat/pingouin/pull/347
- @jajcayn made their first contribution in https://github.com/raphaelvallat/pingouin/pull/363
- @kraktus made their first contribution in https://github.com/raphaelvallat/pingouin/pull/382
- Full Changelog: https://github.com/raphaelvallat/pingouin/compare/v0.5.3...v0.5.4
📋 What's Changed
- Fix numerical stability issue in multivariate_normality by @gkanwar in https://github.com/raphaelvallat/pingouin/pull/292
- Add new function for pairwise T-tests between columns of a dataframe (pingouin.ptests) by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/291
- Handle single-sample comparsion in pairwise_test by @George3d6 in https://github.com/raphaelvallat/pingouin/pull/299
- Change TestRegression class test methods to fix victim flakiness by @blazyy in https://github.com/raphaelvallat/pingouin/pull/303
- Add aesthetic flexibility to plot_rm_corr by @remrama in https://github.com/raphaelvallat/pingouin/pull/312
- Update distribution.py by @ALL-SPACE-Rob in https://github.com/raphaelvallat/pingouin/pull/310
- Plotting seaborn.FacetGrid compatibility by @remrama in https://github.com/raphaelvallat/pingouin/pull/314
- Use scikit-learn>=1.1.2 by @raphaelvallat in https://github.com/raphaelvallat/pingouin/pull/300
- + 5 more
✨ New Contributors
- @gkanwar made their first contribution in https://github.com/raphaelvallat/pingouin/pull/292
- @George3d6 made their first contribution in https://github.com/raphaelvallat/pingouin/pull/299
- @blazyy made their first contribution in https://github.com/raphaelvallat/pingouin/pull/303
- @remrama made their first contribution in https://github.com/raphaelvallat/pingouin/pull/312
- @ALL-SPACE-Rob made their first contribution in https://github.com/raphaelvallat/pingouin/pull/310
- @turkalpmd made their first contribution in https://github.com/raphaelvallat/pingouin/pull/320
**Bugfixes** a. The eta-squared (``n2``) effect size was not properly calculated in one-way and two-way repeated measures ANOVAs. Specifically, Pingouin followed the same behavior as JASP, i.e. the eta-squared was the same as the partial eta-squared. However, as explained in #251, this behavior is not valid. In one-way ANOVA design, the eta-squared should be equal to the generalized eta-squared. As of March 2022, this bug is also present in JASP. We have therefore updated the unit tests to use JAMOVI instead. _Please double check any effect sizes previously obtained with the `pingouin.rm_anova` function!_ b. Fixed invalid resampling behavior for bivariate functions in `pingouin.compute_bootci` when x and y were not paired. #281 c. Fixed bug where ``confidence`` (previously ``ci``) was ignored when calculating the bootstrapped confidence intervals in `pingouin.plot_shift`. #282 **Enhancements** a. The `pingouin.pairwise_ttests` has been renamed to `pingouin.pairwise_tests`. Non-parametric tests are also supported in this function with the `parametric=False` argument, and thus the name "ttests" was misleading #209 b. Allow `pingouin.bayesfactor_binom` to take Beta alternative model. #252 c. Allow keyword arguments for logistic regression in `pingouin.mediation_analysis`. #245 d. Speed improvements for the Holm and FDR correction in `pingouin.multicomp`. #271 e. Speed improvements univariate functions in `pingouin.compute_bootci` (e.g. ``func="mean"`` is now vectorized). f. Rename ``eta`` to ``eta_squared`` in `pingouin.power_anova` and`pingouin.power_rm_anova` to avoid any confusion. #280 g. Add support for [DataMatrix](https://pydatamatrix.eu/) objects. #286 h. Use [black](https://black.readthedocs.io/en/stable/) for code formatting.
📦 Pingouin 0.5.1
- This is a minor release, with several bugfixes and improvements. This release is compatible with SciPy 1.8 and Pandas 1.4.
- Bugfixes
- Added support for SciPy 1.8 and Pandas 1.4. https://github.com/raphaelvallat/pingouin/pull/234
- Fixed bug where [pingouin.rm_anova()](https://pingouin-stats.org/generated/pingouin.rm_anova.html#pingouin.rm_anova) and [pingouin.mixed_anova()](https://pingouin-stats.org/generated/pingouin.mixed_anova.html#pingouin.mixed_anova) changed the dtypes of categorical columns in-place https://github.com/raphaelvallat/pingouin/issues/224
- Enhancements
- Faster implementation of [pingouin.gzscore()](https://pingouin-stats.org/generated/pingouin.gzscore.html#pingouin.gzscore), adding all options available in zscore: axis, ddof and nan_policy. Warning: this function is deprecated and will be removed in the next version of Pingouin (use [scipy.stats.gzscore()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gzscore.html#scipy.stats.gzscore) instead). https://github.com/raphaelvallat/pingouin/pull/210.
- Replace use of statsmodels’ studentized range distribution functions with more SciPy’s more accurate scipy.stats.studentized_range(). https://github.com/raphaelvallat/pingouin/pull/229.
- Add support for optional keywords argument in the [pingouin.homoscedasticity()](https://pingouin-stats.org/generated/pingouin.homoscedasticity.html#pingouin.homoscedasticity) function https://github.com/raphaelvallat/pingouin/issues/218
- + 1 more
This is a **major release** with several important bugfixes. We recommend all users to upgrade to this new version. See the full changelog at: https://pingouin-stats.org/changelog.html#v0-5-0-october-2021
This is a **major release** with an important upgrade of the dependencies (requires **Python 3.7+ and SciPy 1.7+**), several enhancements in existing function and a new function to test the equality of covariance matrices ([pingouin.box_m](https://pingouin-stats.org/generated/pingouin.box_m.html#pingouin.box_m)). We recommend all users to upgrade to the latest version of Pingouin. See the full changelog at: https://pingouin-stats.org/changelog.html#v0-4-0-august-2021
This release fixes a critical error in [pingouin.partial_corr](https://pingouin-stats.org/generated/pingouin.partial_corr.html#pingouin.partial_corr): the number of covariates was not taken into account when calculating the degrees of freedom of the partial correlation, thus leading to incorrect results (except for the correlation coefficient which remained unaffected). For more details, please see https://github.com/raphaelvallat/pingouin/issues/171. For the full changelog, please see https://pingouin-stats.org/changelog.html
This is a minor release with several bug fixes in [pingouin.corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr). The full changelog can be found [here](https://pingouin-stats.org/changelog.html).
This release fixes an error in the calculation of the p-values in the [pg.pairwise_tukey()](https://pingouin-stats.org/generated/pingouin.pairwise_tukey.html#pingouin.pairwise_tukey) and [pg.pairwise_gameshowell()](https://pingouin-stats.org/generated/pingouin.pairwise_gameshowell.html#pingouin.pairwise_gameshowell) functions (https://github.com/raphaelvallat/pingouin/pull/156). Old versions of Pingouin used an incorrect algorithm for the studentized range approximation, which resulted in (slightly) incorrect p-values. In most cases, the error did not seem to affect the significance of the p-values. The new version of Pingouin uses statsmodels to estimate the p-values.
See changelog at: https://pingouin-stats.org/changelog.html
📋 Changes
- Important bugfix in pingouin.ttest() in which the 95% confidence intervals for one-sample T-test with `y` != 0 were invalid.
- Added an "options" module to control global rounding/display behavior.
- Several enhancements / new features in existing functions.
Hotfix release. See full changelog at: https://pingouin-stats.org/changelog.html
See full changelog at: https://pingouin-stats.org/changelog.html
Minor release. See full changelog at: https://pingouin-stats.org/changelog.html
See full changelog at https://pingouin-stats.org/changelog.html
📋 Changes
- Fixed a bug in pingouin.pairwise_corr caused by the deprecation of ``pandas.core.index`` in the new version of Pandas (1.0). For now, both Pandas 0.25 and Pandas 1.0 are supported.
- The standard deviation in pingouin.pairwise_ttests when using ``return_desc=True`` is now calculated with ``np.nanstd(ddof=1)`` to be consistent with Pingouin/Pandas default unbiased standard deviation.
- Added the pingouin.plot_circmean function to plot the circular mean and circular vector length of a set of angles (in radians) on the unit circle. Note that this function is still in beta and some parameters may change without warnings in the next releases.
📋 Changes
- MAJOR: Fixed a bug in [pingouin.pairwise_ttests()](https://pingouin-stats.org/generated/pingouin.pairwise_ttests.html#pingouin.pairwise_ttests) when using mixed or two-way repeated measures design. Specifically, the T-tests were performed without averaging over repeated measurements first (i.e. without calculating the marginal means). Note that for mixed design, this only impacts the between-subject T-test(s). Practically speaking, this led to higher degrees of freedom (because they were conflated with the number of repeated measurements) and ultimately incorrect T and p-values because the assumption of independence was violated. Pingouin now averages over repeated measurements in mixed and two-way repeated measures design, which is the same behavior as [JASP](https://jasp-stats.org/) or [JAMOVI](https://www.jamovi.org/). As a consequence, and when the data has only two groups, the between-subject p-value of the pairwise T-test should be (almost) equal to the p-value of the same factor in the [pingouin.mixed_anova()](https://pingouin-stats.org/generated/pingouin.mixed_anova.html#pingouin.mixed_anova) function. The old behavior of Pingouin can still be obtained using the ``marginal=False`` argument.
- Minor: Added a check in [pingouin.mixed_anova()](https://pingouin-stats.org/generated/pingouin.mixed_anova.html#pingouin.mixed_anova) to ensure that the ``subject`` variable has a unique set of values for each between-subject group defined in the ``between`` variable. For instance, the subject IDs for group1 are [1, 2, 3, 4, 5] and for group2 [6, 7, 8, 9, 10]. The function will throw an error if there are one or more overlapping subject IDs between groups (e.g. the subject IDs for group1 AND group2 are both [1, 2, 3, 4, 5]).
- Minor: Fixed a bug which caused the [pingouin.plot_rm_corr()](https://pingouin-stats.org/generated/pingouin.plot_rm_corr.html#pingouin.plot_rm_corr) and [pingouin.ancova()](https://pingouin-stats.org/generated/pingouin.ancova.html#pingouin.ancova) (with >1 covariates) to throw an error if any of the input variables started with a number (because of statsmodels / [Patsy formula formatting](https://patsy.readthedocs.io/en/latest/builtins-reference.html)).
- Upon loading, Pingouin will now use the [outdated](https://github.com/alexmojaki/outdated) package to check and warn the user if a newer stable version is available.
- Globally removed the ``export_filename`` parameter, which allowed to export the output table to a .csv file. This helps simplify the API and testing. As an alternative, one can simply use [pandas.to_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) to export the output dataframe generated by Pingouin.
- Added the ``correction`` argument to [pingouin.pairwise_ttests()](https://pingouin-stats.org/generated/pingouin.pairwise_ttests.html#pingouin.pairwise_ttests) to enable or disable Welch’s correction for independent T-tests.
📋 Changes
- Fixed a bug in which missing values were removed from all columns in the dataframe in [pingouin.kruskal()](https://pingouin-stats.org/generated/pingouin.kruskal.html#pingouin.kruskal), even columns that were unrelated. See https://github.com/raphaelvallat/pingouin/issues/74.
- The [pingouin.power_corr()](https://pingouin-stats.org/generated/pingouin.power_corr.html#pingouin.power_corr) function now throws a warning and return a np.nan when the sample size is too low (and not an error like in previous version). This is to improve compatibility with the pingouin.pairwise_corr() function.
- Fixed quantile direction in the [pingouin.plot_shift()](https://pingouin-stats.org/generated/pingouin.plot_shift.html#pingouin.plot_shift) function. In v0.3.0, the quantile subplot was incorrectly labelled as Y - X, but it was in fact calculating X - Y. See https://github.com/raphaelvallat/pingouin/issues/73
📋 Changes
- Added [pingouin.plot_rm_corr()](https://pingouin-stats.org/generated/pingouin.plot_rm_corr.html#pingouin.plot_rm_corr) to plot a repeated measures correlation
- Added the `relimp` argument to [pingouin.linear_regression()](https://pingouin-stats.org/generated/pingouin.linear_regression.html#pingouin.linear_regression) to return the relative importance (= contribution) of each individual predictor to the R^2 of the full model.
- Complete refactoring of [pingouin.intraclass_corr()](https://pingouin-stats.org/generated/pingouin.intraclass_corr.html#pingouin.intraclass_corr) to closely match the R implementation in the [psych](https://cran.r-project.org/web/packages/psych/psych.pdf) package. Pingouin now returns the 6 types of ICC, together with F values, p-values, degrees of freedom and confidence intervals.
- The [pingouin.plot_shift()](https://pingouin-stats.org/generated/pingouin.plot_shift.html#pingouin.plot_shift) now 1) uses the Harrel-Davis robust quantile estimator in conjunction with a bias-corrected bootstrap confidence intervals, and 2) support paired samples.
- Added the axis argument to [pingouin.harrelldavis()](https://pingouin-stats.org/generated/pingouin.harrelldavis.html#pingouin.harrelldavis) to support 2D arrays.
Minor release with mostly internal code refactoring. See full changelog at: https://pingouin-stats.org/changelog.html#v0-2-9-september-2019
See full changelog at: https://pingouin-stats.org/changelog.html#v0-2-8-july-2019
This is a minor release, mainly to fix dependency issues between scipy and statsmodels. **Dependencies** a. Pingouin now requires statsmodels>=0.10.0 (latest release June 2019) and is compatible with SciPy 1.3.0. **Enhancements** a. Added support for long-format dataframe in `pingouin.sphericity` and `pingouin.epsilon`. b. Added support for two within-factors interaction in `pingouin.sphericity` and `pingouin.epsilon` (for the former, granted that at least one of them has no more than two levels.) **New functions** a. Added `pingouin.power_rm_anova` function.
📋 Changes
- Fixed ERROR in two-sided p-value for Wilcoxon test (`pingouin.wilcoxon()`), the p-values were accidentally squared, and therefore smaller. Make sure to always use the latest release of Pingouin.
- `pingouin.wilcoxon()` now uses the continuity correction by default (the documentation was saying that the correction was applied but it was not applied in the code.)
- The show_median argument of the `pingouin.plot_shift()` function was not working properly when the percentiles were different that the default parameters.
- The current release of statsmodels (0.9.0) is not compatible with the newest release of Scipy (1.3.0). In order to avoid compatibility issues in the `pingouin.ancova()` and `pingouin.anova()` functions (which rely on statsmodels for certain cases), Pingouin will require SciPy < 1.3.0 until a new stable version of statsmodels is released.
- Added `pingouin.chi2_independence()` tests.
- Added `pingouin.chi2_mcnemar()` tests.
- Added `pingouin.power_chi2()` function.
- Added `pingouin.bayesfactor_binom()` function.
- + 9 more
📋 Changes
- Fixed error in p-values for one-sample one-sided T-test (pingouin.ttest()), the two-sided p-value was divided by 4 and not by 2, resulting in inaccurate (smaller) one-sided p-values.
- Fixed global error for unbalanced two-way ANOVA (pingouin.anova()), the sums of squares were wrong, and as a consequence so were the F and p-values. In case of unbalanced design, Pingouin now computes a type II sums of squares via a call to the statsmodels package.
- The epsilon factor for the interaction term in two-way repeated measures ANOVA (pingouin.rm_anova()) is now computed using the lower bound approach. This is more conservative than the Greenhouse-Geisser approach and therefore give (slightly) higher p-values. The reason for choosing this is that the Greenhouse-Geisser values for the interaction term differ than the ones returned by R and JASP. This will be hopefully fixed in future releases.
- Added pingouin.multivariate_ttest() (Hotelling T-squared) test.
- Added pingouin.cronbach_alpha() function.
- Added pingouin.plot_shift() function.
- Several functions of pandas can now be directly used as pandas.DataFrame methods.
- Added pingouin.pcorr() method to compute the partial Pearson correlation matrix of a pandas.DataFrame (similar to the pcor function in the ppcor package).
- + 17 more
📋 Changes
- Added pingouin.distance_corr() (distance correlation) function.
- pingouin.rm_corr() now requires at least 3 unique subjects (same behavior as the original R package).
- The pingouin.pairwise_corr() is faster and returns the number of outlier if a robust correlation is used.
- Added support for 2D level in the pingouin.pairwise_corr(). See Jupyter notebooks for examples.
- Added support for partial correlation in the pingouin.pairwise_corr() function.
- Greatly improved execution speed of pingouin.correlation.skipped() function.
- Added default random state to compute the Min Covariance Determinant in the pingouin.correlation.skipped() function.
- The default number of bootstrap samples for the pingouin.correlation.shepherd() function is now set to 200 (previously 2000) to increase computation speed.
- + 18 more
See full changelog at: https://pingouin-stats.org/changelog.html
See full changelog: https://pingouin-stats.org/changelog.html#v0-2-2-december-2018
