obss/jury
Comprehensive NLP Evaluation System
23 Releases
Latest: 2y ago
v2.3.12.3.1Latest
📋 What's Changed
- Update CI actions versions. by @devrimcavusoglu in https://github.com/obss/jury/pull/134
- Update dev installation to allow for e.g. Zsh by @KennethEnevoldsen in https://github.com/obss/jury/pull/136
- Update README.md by @devrimcavusoglu in https://github.com/obss/jury/pull/137
✨ New Contributors
- @KennethEnevoldsen made their first contribution in https://github.com/obss/jury/pull/136
- Full Changelog: https://github.com/obss/jury/compare/2.3...2.3.1
v2.32.3
📋 What's Changed
- Comet version update, according changes have been made. by @devrimcavusoglu in https://github.com/obss/jury/pull/129
- Update README.md by @eltociear in https://github.com/obss/jury/pull/130
- Drop py3.7 support, change CI. by @devrimcavusoglu in https://github.com/obss/jury/pull/132
- README.md updated. Jury paper added. by @devrimcavusoglu in https://github.com/obss/jury/pull/133
✨ New Contributors
- @eltociear made their first contribution in https://github.com/obss/jury/pull/130
- Full Changelog: https://github.com/obss/jury/compare/2.2.4...2.3
v2.2.42.2.4
📋 What's Changed
- datasets dependency added with constraint. by @devrimcavusoglu in https://github.com/obss/jury/pull/126
- Add try/catch block across ZeroDivisionError for AccuracyForLanguageGeneration._compute_single_pred_single_ref by @NISH1001 in https://github.com/obss/jury/pull/123
- Package `evaluate` updated to 0.4 (from <0.3). by @devrimcavusoglu in https://github.com/obss/jury/pull/128
✨ New Contributors
- @NISH1001 made their first contribution in https://github.com/obss/jury/pull/123
- Full Changelog: https://github.com/obss/jury/compare/2.2.3...2.2.4
v2.2.32.2.3
📋 What's Changed
- `flake8` error on python3.7 by @devrimcavusoglu in https://github.com/obss/jury/pull/118
- Seqeval typo fix by @devrimcavusoglu in https://github.com/obss/jury/pull/117
- Refactored requirements (sklearn). by @devrimcavusoglu in https://github.com/obss/jury/pull/121
- Full Changelog: https://github.com/obss/jury/compare/2.2.2...2.2.3
v2.2.22.2.2
📋 What's Changed
- Migrating to `evaluate` package (from `datasets`). by @devrimcavusoglu in https://github.com/obss/jury/pull/116
- Full Changelog: https://github.com/obss/jury/compare/2.2.1...2.2.2
v2.2.12.2.1
📋 What's Changed
- Fixed warning message in BLEURT default initialization by @zafercavdar in https://github.com/obss/jury/pull/110
- `ZeroDivisionError` on precision and recall values. by @devrimcavusoglu in https://github.com/obss/jury/pull/112
- validators added to the requirements. by @devrimcavusoglu in https://github.com/obss/jury/pull/113
- Intermediate patch, fixes, updates. by @devrimcavusoglu in https://github.com/obss/jury/pull/114
✨ New Contributors
- @zafercavdar made their first contribution in https://github.com/obss/jury/pull/110
- Full Changelog: https://github.com/obss/jury/compare/2.2...2.2.1
v2.22.2
📋 What's Changed
- Fix Reference Structure for Basic BLEU calculation by @Sophylax in https://github.com/obss/jury/pull/74
- Added BLEURT. by @devrimcavusoglu in https://github.com/obss/jury/pull/78
- README.md updated with doi badge and citation inforamtion. by @devrimcavusoglu in https://github.com/obss/jury/pull/81
- Add VSCode Folder to Gitignore by @Sophylax in https://github.com/obss/jury/pull/82
- Change one BERTScore test Device to CPU by @Sophylax in https://github.com/obss/jury/pull/84
- Add Prism metric by @devrimcavusoglu in https://github.com/obss/jury/pull/79
- Update issue templates by @devrimcavusoglu in https://github.com/obss/jury/pull/85
- Dl manager rework by @devrimcavusoglu in https://github.com/obss/jury/pull/86
- + 13 more
✨ New Contributors
- @Sophylax made their first contribution in https://github.com/obss/jury/pull/74
- Full Changelog: https://github.com/obss/jury/compare/2.1.5...2.2
v2.1.52.1.5
📋 What's Changed
- Bug fix: Typo corrected in _remove_empty() in core.py. by @devrimcavusoglu in https://github.com/obss/jury/pull/67
- Metric name path bug fix. by @devrimcavusoglu in https://github.com/obss/jury/pull/69
- Full Changelog: https://github.com/obss/jury/compare/2.1.4...2.1.5
v2.1.42.1.4
📋 What's Changed
- Handle for empty predictions & references on Jury (skipping empty). by @devrimcavusoglu in https://github.com/obss/jury/pull/65
- Full Changelog: https://github.com/obss/jury/compare/2.1.3...2.1.4
v2.1.32.1.3
📋 What's Changed
- Bug fix: Bleu reshape error fixed. by @devrimcavusoglu in https://github.com/obss/jury/pull/63
- Full Changelog: https://github.com/obss/jury/compare/2.1.2...2.1.3
v2.1.22.1.2
📋 What's Changed
- Bug fix: bleu returning same score with different max_order is fixed. by @devrimcavusoglu in https://github.com/obss/jury/pull/59
- nltk version upgraded as >=3.6.4 (from >=3.6.2). by @devrimcavusoglu in https://github.com/obss/jury/pull/61
- Full Changelog: https://github.com/obss/jury/compare/2.1.1...2.1.2
v2.1.12.1.1
📋 What's Changed
- Seqeval: json normalization added. by @devrimcavusoglu in https://github.com/obss/jury/pull/55
- Read support from folders by @devrimcavusoglu in https://github.com/obss/jury/pull/57
- Full Changelog: https://github.com/obss/jury/compare/2.1.0...2.1.1
v2.1.02.1.0
📦 AutoMetric ✨
- AutoMetric is introduced as a main factory class for automatically loading metrics, as a side note `load_metric` is still available for backward compatibility and is preferred (it uses AutoMetric under the hood).
- Tasks are now distinguished within metrics. For example, precision can be used for `language-generation` or `sequence-classification` task, where one evaluates from string (generated text) while other one evaluates from integers (class labels).
- On configuration file, metrics can be now stated with HuggingFace's datasets' metrics initializiation parameters. The keyword arguments for metrics that are used on computation are now separated in `"compute_kwargs"` key.
- Full Changelog: https://github.com/obss/jury/compare/2.0.0...2.1.0
v2.0.02.0.0
✨ New Metric System
- datasets package Metric implementation is adopted (and extended) to provide high performance 💯 and more unified interface 🤗.
- Custom metric implementation changed accordingly (it now requires 3 abstract methods to be implemented).
- Jury class is now callable (implements __call__() method to be used thoroughly) though evaluate() method is still available for backward compatibility.
- In the usage of evaluate of Jury, `predictions` and `references` parameters are restricted to be passed as keyword arguments to prevent confusion/wrong computations (like datasets' metrics).
- MetricCollator is removed, the methods for metrics are attached directly to Jury class. Now, metric addition and removal can be performed from a Jury instance directly.
- Jury now supports reading metrics from string, list and dictionaries. It is more generic to input type of metrics given along with parameters.
✨ New metrics
- Accuracy, F1, Precision, Recall are added to Jury metrics.
- All metrics on datasets package are still available on jury through the use of `jury.load_metric()`
📦 Development
- Test cases are improved with fixtures, and test structure is enchanced.
- Expected outputs are now required for tests as a json with proper name.
v1.1.21.1.2
📋 Changes
- SQuAD bug fixed for evaluating with multiple references.
- Test design & cases revised with fixtures (improvement).
v1.1.11.1.1
📋 Changes
- Malfunctioning multiple prediction calculation caused by multiple reference input for BLEU and SacreBLEU is fixed.
- CLI Implementation is completed. 🎉
v1.0.11.0.1
📋 Changes
- Fix for nltk version (Colab is fixed as well).
v1.0.01.0.0
📦 Release Notes
- New metric structure is completed.
- Custom metric support is improved and no longer required to extend `datasets.Metric`, rather uses `jury.metrics.Metric`.
- Metric usage is unified with `compute`, `preprocess` and `postprocess` functions, which the only required implementation for custom metric is `compute`.
- Both string and `Metric` objects can be passed to `Jury(metrics=metrics)` now in a mixed fashion.
- `load_metric` function was rearranged to capture end score results and several metrics added accordingly (e.g. `load_metric("squad_f1")` will load squad metric which returns F1-score).
- Example notebook has added to example.
- MT and QA tasks were illustrated.
- Custom metric creation added as example.
📦 Acknowledgments
- @fcakyon @cemilcengiz @devrimcavusoglu
v0.0.60.0.6
v0.0.50.0.5
v0.0.40.0.4
v0.0.30.0.3
Multiple predictions and multiple references supportç
v0.0.20.0.2Pre-release
first pypi release
