capitalone/DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
๐ Changes
- add architecture.rst for algorithm rationale, testing, versioning (https://github.com/capitalone/DataProfiler/pull/1181)
- refactored docs workflow(https://github.com/capitalone/DataProfiler/pull/1182)
๐ Changes
- refactored documentation release process
๐ Changes
- fixed test script in release process
๐ Changes
- added versioneer
- removed dask from pre-commit requirements
- refactored release process
๐ Changes
- staging/main/0.13.0 #1165
- Python 3.8 removed from tox environments (#1146)
- Python 3.11 added to Github Actions (#1090)
- PR #1162 updates the `requests` dependency to resolve vulnerabilities caused by urllib3 and certifi:
๐ What's Changed
- staging/main/0.13.0 by @armaan-dhillon in https://github.com/capitalone/DataProfiler/pull/1165
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.12.0...0.13.0
๐ Changes
- staging/main/0.12.0 #1145
- Update Documentation v0.12.0 #1152
- Remove py38 from tox envlist #1146
- Fix Tox #1143
๐ What's Changed
- staging/main/0.12.0 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1145
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.11.0...0.12.0
๐ Changes
- Version.py update 0.11.0 #1139
- Update: black version #1131
- Update Documentation #1141
- docs: update test link to latest version #1114
- Quick fix for dependency max pins #1120
- Fix memray version max #1132
๐ What's Changed
- Version.py update 0.11.0 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1139
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.9...0.11.0
๐ Changes
- Version.py update 0.10.9 #1107
- Staging into main from dev #1106
- Hot fix json bug #1105
- Docs update 0.10.9 #1108
- Add downloads tile to README #1085
๐ What's Changed
- Staging into `main` from `dev` by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1106
- Version.py update 0.10.9 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1107
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.8...0.10.9
๐ Changes
- Staging/main/0.10.8 #1081
- Depedency: matplotlib version bump #1072
- Make _assimilate_histogram() not use self #1071
- Feature: added parquet sampling #1070
- Update: Documentation 0.10.8 #1084
- Docs update to include option for sample_nrows for parquet files #1082
- Bump actions/setup-python from 4 to 5 #1078
๐ What's Changed
- Staging/main/0.10.8 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1081
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.7...0.10.8
๐ Changes
- Staging/main/0.10.7 #1068
- Hot Fix: Plugin Testing #1067
- Update: Documentation 0.10.7 #1069
๐ What's Changed
- Staging/main/0.10.7 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1068
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.6...0.10.7
๐ Changes
- Staging/main/0.10.6 #1065
- Update: Version 0.10.6 #1064
- Feature: Plugins #1060
- Hot Fix: Contribution Doc #1059
- Rename references to degree of freedom from df to deg_of_free #1056
- add_s3_connection_remote_loading_s3uri_feature #1054
- feat: add null ratio to column stats #1052
- Delay transforming priority_order into ndarray #1045
- + 4 more
๐ What's Changed
- Fix Codeowners List by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1044
- Staging/main/0.10.6 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1065
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.5...0.10.6
๐ Changes
- Categorical PSI #1040
- Categorical PSI #1039
- Update docs 0.10.5 #1042
- Update docs 0.10.5 #1041
๐ What's Changed
- Categorical PSI by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1040
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.4...0.10.5
๐ Changes
- version bump (#1032) #1036
- Staging/main/0.10.4 #1029
- added psi calculation to categorical columns #1027
- Bump actions/checkout from 3 to 4 #1024
- Minor: Profiler Path Fix in Example Notebook #1021
- modified the assignees for issue creation #1016
- Make sure random_state is a list before indexed assignment #968
- Update docs 0.10.4 #1038
- + 2 more
๐ What's Changed
- Staging/main/0.10.4 by @ksneab7 in https://github.com/capitalone/DataProfiler/pull/1029
- version bump (#1032) by @ksneab7 in https://github.com/capitalone/DataProfiler/pull/1036
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.3...0.10.4
๐ Changes
- Staging: main 0.10.3 #1004
- Fix ProfilerOptions() documentation #1002
โจ Feature: Multiprocess
- Staging: into dev feature/multiprocess #998
- Multiprocess automation feature into staging/dev. #997
- Syncing feature/multiprocess into staging/dev/multiprocess #992
- Automate multiprocess option #984
โจ Feature: `num_quantiles` option
- Staging: into dev feature/num-quantiles #990
- Fix Scipy Mend Issue #988
- HistogramAndQuantilesOption sync with dev branch #987
- Update docs to 0.10.3 #1012
- Update docs to 0.10.3 #1011
- fixed snappy install issue on Mac #1010
- Staging: into dev-gh-pages the docs for multiprocess. #1001
- Add docs to multiprocess option in StructuredOptions. #999
- + 3 more
๐ What's Changed
- Staging: main `0.10.3` by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/1004
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.2...0.10.3
๐ Changes
- hotfix[0.10.2]: cat vs float bug #973
- Staging: Update docs to 0.10.2 #978
- Update docs to 0.10.2 #979
๐ What's Changed
- hotfix[0.10.2]: cat vs float bug by @JGSweets in https://github.com/capitalone/DataProfiler/pull/973
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.1...0.10.2
๐ Changes
- Hot Fix: .astype("bool") #960
- Staging: Update docs 0.10.1 #961
- Update docs 0.10.1 #962
๐ What's Changed
- Hot Fix: `.astype("bool")` by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/960
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.10.0...0.10.1
๐ Changes
- Forking workflow directions CONTRIBUTING.md #857
- Fixing diagram rendering in CONTRIBUTING.md #862
- Fix initial value of processor_type #863
- fix: test bug due to bad mocks #878
- added differences section to unstructured data example #877
- Reservoir sampling refactor #910
- feat: add dev to workfow for testing #897
- Cms for categorical #892
- + 3 more
๐ฆ Profiler: Profile Serialization
- Staging/dev/profile serialization #940
- fix: order bug #939
- fix: null_rep mat should calculate even if datetime #933
- Profiler: load_method hotfix #932
- Top level hotfix: save / load .lower() #931
- Notebook Example save/load Profile #930
- refactor: use seed for sample for consistency #927
- Profile Builder load() serialization #925
- + 25 more
๐ฆ Profiler: Options
- staging/dev/options #909
- RowStatisticsOptions: Implementing option #871
- New preset implementation and test #867
- RowStatisticsOptions: Add option #865
- Staging update docs 0.10.0 #945
- Documentation: Fix Req #922
- Documentation: Update for Reservoir Sampling #919
- documentation update for cms specific options to category #917
- + 1 more
๐ Documentation: Profile Serialization
- Merge staging/dev-gh-pages/profile-serialization into dev-gh-pages #937
- Docs: Profiler Serialization Clean Up #936
- Docs: Profiler Serialization #928
๐ Documentation: Options
- Documentation: feature/options branch docs updates #921
- Row statistics option documentation #883
- updating docs for preset name #882
- Add documentation for median_abs_deviation option #881
- Preset test updated w new names and different toggles #880
- reset ignore, update .gitignore, update documentation on presets #874
- Fixed documentation for sampling_ratio option #873
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.9.0...0.10.0
๐ What's Changed
- Sampling ratio implement by @joshuart in https://github.com/capitalone/DataProfiler/pull/845
- StructuredOptions: `hhl_row_hashing` by @micdavis in https://github.com/capitalone/DataProfiler/pull/841
- Forking workflow directions CONTRIBUTING.md by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/857
- Fixing diagram rendering in `CONTRIBUTING.md` by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/862
- StructuredProfiler: HLLRowHashing by @micdavis in https://github.com/capitalone/DataProfiler/pull/842
- added differences section to unstructured data example by @lizlouise1335 in https://github.com/capitalone/DataProfiler/pull/877
- fix: test bug due to bad mocks by @JGSweets in https://github.com/capitalone/DataProfiler/pull/878
- Fix initial value of processor_type by @junholee6a in https://github.com/capitalone/DataProfiler/pull/863
- + 2 more
๐ Changes
- Encode int column #780
- Decode categorical #786
- Encode update format #789
- Optimization for text column profile ksneab #791
- Remove unnecessary cast() in csv_data.py (1) #796
- Remove unnecessary cast() in csv_data.py (2) #798
- Update main with change in memory-optimization #799
- Remove unnecessary cast() in data.py #800
- + 38 more
๐ What's Changed
- Create method to serialize NumericalStatsMixin and functions by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/776
- Memory testing and data gen scripts by @ksneab7 in https://github.com/capitalone/DataProfiler/pull/781
- Update for new Dask version in Validator test by @JGSweets in https://github.com/capitalone/DataProfiler/pull/784
- Encode int column by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/780
- Fix minor typo by @junholee6a in https://github.com/capitalone/DataProfiler/pull/788
- Space analysis dataset sampling addition by @ksneab7 in https://github.com/capitalone/DataProfiler/pull/787
- fix bug in dataset generation by @ksneab7 in https://github.com/capitalone/DataProfiler/pull/790
- Optimization for text column profile ksneab by @ksneab7 in https://github.com/capitalone/DataProfiler/pull/791
- + 20 more
โจ New Contributors
- @junholee6a made their first contribution in https://github.com/capitalone/DataProfiler/pull/788
- @joshuart made their first contribution in https://github.com/capitalone/DataProfiler/pull/825
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.9...0.9.0
๐ Changes
- Create BaseColumnProfiler.to_dict to make JSONable #766
- Chi2 docs update #767
- Create Profile Encoder to JSONify BaseColumnProfiler #769
- Encode categorical column #770
- Encode order column #772
- Add and test JSONify DateTimeColumn #774
- Update docs 0.8.9 #779
- fix: update ml reqs #777
- + 1 more
๐ What's Changed
- Create BaseColumnProfiler.to_dict to make JSONable by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/766
- Create Profile Encoder to JSONify BaseColumnProfiler by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/769
- Encode categorical column by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/770
- Encode order column by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/772
- Add and test JSONify DateTimeColumn by @kshitijavis in https://github.com/capitalone/DataProfiler/pull/774
- fix: update ml reqs by @JGSweets in https://github.com/capitalone/DataProfiler/pull/777
- Update to version 0.8.9 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/778
โจ New Contributors
- @kshitijavis made their first contribution in https://github.com/capitalone/DataProfiler/pull/766
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.8...0.8.9
๐ Changes
- Quick chi2 test fix #763
- Update docs 0.8.8 #765
- Chi2 docs update #767
- Update to version 0.8.8 #764
- PyPi image rendering issue #761
- [BUG] update isort version pin #760
- [BUG] isort version change #759
๐ What's Changed
- [BUG] isort version change by @micdavis in https://github.com/capitalone/DataProfiler/pull/759
- [BUG] update isort version pin by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/760
- PyPi image rendering issue by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/761
- Quick chi2 test fix by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/763
- Update to version 0.8.8 by @micdavis in https://github.com/capitalone/DataProfiler/pull/764
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.7.post1...0.8.8
๐ Changes
- Bug: requirements-ml fix #754
- Update to version 0.8.7.post1 #755
๐ What's Changed
- Bug: `requirements-ml` fix by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/754
- Update to version 0.8.7.post1 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/755
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.7...0.8.7.post1
๐ Changes
- relax requests and networkx dependencies #750
- Generate docs for v0.8.7 #752
- Update version to 0.8.7 #751
๐ What's Changed
- relax requests and networkx dependencies by @neilkg in https://github.com/capitalone/DataProfiler/pull/750
- Update to version 0.8.7 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/751
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.6...0.8.7
๐ Changes
- Removes futures from required libs #746
- Generate Docs for v0.8.6 #749
- Update version to 0.8.6 #748
๐ Changes
- Rework Graph Test for mocking missing imports #736
- Windows Install error - Path Resouces - Fixes 738 #739
- Generate Docs for v0.8.5 #744
- adding pyupgrade & autoflake #734
- loosening pin on typing-extensions #735
- removing six #740
- Use tensorflow-macos and clean up some test running warning noise #741
- Update version to 0.8.5 #742
๐ What's Changed
- loosening pin on typing-extensions by @leos in https://github.com/capitalone/DataProfiler/pull/735
- Rework Graph Test for mocking missing imports by @JGSweets in https://github.com/capitalone/DataProfiler/pull/736
- Windows Install error - Path Resouces - Fixes #738 by @rxm7706 in https://github.com/capitalone/DataProfiler/pull/739
- adding pyupgrade & autoflake by @leos in https://github.com/capitalone/DataProfiler/pull/734
- removing six by @leos in https://github.com/capitalone/DataProfiler/pull/740
- Use tensorflow-macos and clean up some test running warning noise by @leos in https://github.com/capitalone/DataProfiler/pull/741
- Update version to 0.8.5 by @JGSweets in https://github.com/capitalone/DataProfiler/pull/742
โจ New Contributors
- @leos made their first contribution in https://github.com/capitalone/DataProfiler/pull/735
- @rxm7706 made their first contribution in https://github.com/capitalone/DataProfiler/pull/739
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.4...0.8.5
๐ Changes
- Replaces Merging dict func #731
- Great Expectations Examples Fix #726
- WIP Generate Docs for v0.8.4 #729
- Generate Docs for v0.8.4 #732
- Fix numpy version and drop python 3.7 in checks #725
- Updating the version to v0.8.4 #728
๐ What's Changed
- Fix numpy version and drop python 3.7 in checks by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/725
- Great Expectations Examples Fix by @micdavis in https://github.com/capitalone/DataProfiler/pull/726
- Updating the version to v0.8.4 by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/728
- Replaces Merging dict func by @JGSweets in https://github.com/capitalone/DataProfiler/pull/731
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.3...0.8.4
๐ Changes
- Fix req missing for typing_extensions #698
- Add profiler option for column level invalid values #704
- Updated setup.cfg mypy flags and resolved related errors. #703
- Add Makefile to auto setup repo for developers #699
- Add PSI documentation in README.md #709
- Fix bug with null replication metrics #702
- PSI diff() #708
- PSI diff() bug #707
- + 10 more
๐ What's Changed
- Fix req missing for typing_extensions by @JGSweets in https://github.com/capitalone/DataProfiler/pull/698
- Pre-Commit: Default `setup.cfg` flags by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/701
- Add Makefile to auto setup repo for developers by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/699
- Quick Fix: Oxford Comma in README by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/697
- Adding `PSI` to `diff` report by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/688
- Updated setup.cfg with check-manifest by @Sanketh7 in https://github.com/capitalone/DataProfiler/pull/705
- Fix bug with null replication metrics by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/702
- Add profiler option for column level invalid values by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/704
- + 9 more
๐ Changes
- Fix req missing for typing_extensions #698
๐ What's Changed
- Fix req missing for typing_extensions by @JGSweets in https://github.com/capitalone/DataProfiler/pull/698
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.2...0.8.2.post1
๐ Changes
- added static typing to data_utils.py #662
- Added static typing to *_data classes in data_readers #677
- Adding the types of parameters and returns of functions #681
- Added static typing to data.py and filepath_or_buffer.py #682
- Fix typing and missing types #684
- Fix typing errors and missing return types #692
- Move contribute info to CONTRIBUTING.md #683
- Fix typos, remove (unintended?) indentation #690
- + 7 more
๐ What's Changed
- Add static typing to labeler models by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/672
- Quick Fix by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/680
- Add static typing to labelers/data_processing.py by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/673
- Added static typing to data_readers/base_data.py and data_readers/json_data.py by @Sanketh7 in https://github.com/capitalone/DataProfiler/pull/666
- Added static typing to data.py and filepath_or_buffer.py by @Sanketh7 in https://github.com/capitalone/DataProfiler/pull/682
- Move contribute info to CONTRIBUTING.md by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/683
- Fix typing and missing types by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/684
- Fix `matplotlib` version requirements param by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/686
- + 9 more
โจ New Contributors
- @bencomp made their first contribution in https://github.com/capitalone/DataProfiler/pull/690
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.1...0.8.2
๐ Changes
- Added static typing to data_readers/avro_data.py #657
- Added static typing to data_readers/structured_mixins.py #659
- Static Typing profiler #660
- Static Typing profilers/column profile #661
- Static typing for profilers #663
- Add static typing to data labeler and abstract classes #664
- Add static typing to labeler utils #668
- Allow diff to set format options for prepare report #669
- + 9 more
๐ What's Changed
- Static Typing profilers/utils.py by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/630
- Static Typing for Base Column Primitive Type Profilers by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/645
- Added static typing to data_readers/structured_mixins.py by @Sanketh7 in https://github.com/capitalone/DataProfiler/pull/659
- Static Typing profilers/profile_builder.py by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/643
- Static Typing profilers/numerical_column_stats.py by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/648
- Static Typing profiler by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/660
- Static Typing profilers/column profile by @tonywu315 in https://github.com/capitalone/DataProfiler/pull/661
- Added static typing to data_readers/avro_data.py by @Sanketh7 in https://github.com/capitalone/DataProfiler/pull/657
- + 10 more
โจ New Contributors
- @Sanketh7 made their first contribution in https://github.com/capitalone/DataProfiler/pull/659
- @boneyag made their first contribution in https://github.com/capitalone/DataProfiler/pull/676
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.8.0...0.8.1
๐ Changes
- DataProfiler: hotfix for handling nan values in diff #647
- Static Typing profilers/profiler_options.py #644
- refactor: validate parameters and the returns of functions #640
- Preset option in ProfileOptions #638
- GraphProfiler: add() NotImplementedError #636
- ColumnNameLabeler Setup #635
- Fix for issue #605 #634
- GraphProfiler: diff() functionality #631
- + 17 more
๐ What's Changed
- Update TF / numpy reqs, drop py3.6 by @JGSweets in https://github.com/capitalone/DataProfiler/pull/614
- Notebook Examples for DP + GE: expect_column_value_confidence by @micdavis in https://github.com/capitalone/DataProfiler/pull/622
- Notebook Examples for DP + GE: expect_column_values_vs_profile by @micdavis in https://github.com/capitalone/DataProfiler/pull/623
- Notebook Examples for DP + GE: expect_profile_numeric_columns_diff by @micdavis in https://github.com/capitalone/DataProfiler/pull/624
- Notebook Examples for DP + GE: expect_profile_numeric_columns_percent by @micdavis in https://github.com/capitalone/DataProfiler/pull/625
- New Data Labeler: ColumnNameModel Build by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/626
- Quick Add: `require_module` in `ColumnNameModel` test by @taylorfturner in https://github.com/capitalone/DataProfiler/pull/627
- Graph Profiler: save() and load() Functionality by @micdavis in https://github.com/capitalone/DataProfiler/pull/628
- + 16 more
โจ New Contributors
- @vindhyanairlj made their first contribution in https://github.com/capitalone/DataProfiler/pull/634
- @lovleen3112 made their first contribution in https://github.com/capitalone/DataProfiler/pull/638
- @stefanycoimbra made their first contribution in https://github.com/capitalone/DataProfiler/pull/640
- @tonywu315 made their first contribution in https://github.com/capitalone/DataProfiler/pull/644
- Full Changelog: https://github.com/capitalone/DataProfiler/compare/0.7.11...0.8.0
