WorksApplications/SudachiPy
Python version of Sudachi, a Japanese tokenizer.
📋 Changes
- When multiple user dictionaries with user-defined parts of speech are used, the user-defined POS IDs of the second and subsequent user dictionaries become invalid (`IndexError: list index out of range`)
📋 Changes
- Words containing digits cannot be properly registered in split information
- Slow to build user dictionary
- Some katakana words are analyzed as OOV
📋 Changes
- Added option -s to specify dictionary type
- Added argument to Dictionary class to specify dictionary type
- Removed the option to create a link
📋 Changes
- https://github.com/WorksApplications/SudachiPy/pull/151 Fix `-a` option (`print all of the fields`) (Error reported in https://github.com/WorksApplications/SudachiPy/issues/150)
Support for new dictionary format with synonym group IDs
📋 Changes
- #134 Fix Morphemelist split (Error reported in #133)
📋 Changes
- #131 Fix connection cost lookup (spaCy accuracy degradation reported in #129)
- #130 Fix `Lattice.dump` error (Reported in #128)
📋 Changes
- #127 Don't explicitly release the memoryview in Grammar
📋 Changes
- #123 Cython based optimization
- #124 Add `__str__` functions for Moprheme, MorphemeList
📋 Changes
- #121 Fix a bug causing … is converted to "", "", "…"
- #122 Improve error messages related to dictionary setup
Speed up execution by re-using unk info #117
Upgrade a dependent Python library `dartsclone` ([rixwew/darts-clone-python](https://github.com/rixwew/darts-clone-python)) to `v0.9`.
Fix the runtime error `EOS isn't connected to BOS` when input contains `阿Q[a-zA-Z]+`. Equivalent to [Fix #45 by kazuma-t · Pull Request #113 · WorksApplications/Sudachi](https://github.com/WorksApplications/Sudachi/pull/113) in the original Java Sudachi.
📋 Changes
- Proper reading forms for Hiragana and Katakana words #104
- Proper OOV flag #106
cythonize dartsclone using https://github.com/rixwew/darts-clone-python
In README.md > If you need to apply customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/Sudachi/blob/develop/src/main/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`. not worked properly but fixed. This change maybe **incompatible** if you use your defined `sudachi.json`.
📋 Changes
- delete default dictionary utility, because of PyPI dependency
📋 Changes
- link command, utility to manage dictionary
- default dictionary, improve utility to install SudachiPy
- `/resources` moved into `/sudachipy`
- `tests` module independent from `/resources`
This release is just for fixing the packaging issue. It does not contain any additional bug fixes from 0.2.0.
📋 Changes
- interface changed !! see README
- can use user dictionary
- dictionary build command
- no test need real system.dic
- apply some format rules
📋 Changes
- parsing character definition file
- parsing dictionary file
- other minor bugs
- add tests following Sudachi-Java
- add CI system
- introduce formatter
