GitPedia
WorksApplications

WorksApplications/SudachiPy

Python version of Sudachi, a Japanese tokenizer.

21 Releases
Latest: 4y ago
v0.5.4Latest
kazuma-tkazuma-t·4y ago·September 27, 2021
GitHub

📋 Changes

  • When multiple user dictionaries with user-defined parts of speech are used, the user-defined POS IDs of the second and subsequent user dictionaries become invalid (`IndexError: list index out of range`)
v0.5.3
kazuma-tkazuma-t·4y ago·September 10, 2021
GitHub

📋 Changes

  • Words containing digits cannot be properly registered in split information
  • Slow to build user dictionary
  • Some katakana words are analyzed as OOV
v0.5.2
kazuma-tkazuma-t·5y ago·March 26, 2021
GitHub

📋 Changes

  • Added option -s to specify dictionary type
  • Added argument to Dictionary class to specify dictionary type
  • Removed the option to create a link
v0.5.1
t-yamamurat-yamamura·5y ago·January 4, 2021
GitHub

📋 Changes

  • https://github.com/WorksApplications/SudachiPy/pull/151 Fix `-a` option (`print all of the fields`) (Error reported in https://github.com/WorksApplications/SudachiPy/issues/150)
v0.5.0Pre-release
kazuma-tkazuma-t·5y ago·December 25, 2020
GitHub

Support for new dictionary format with synonym group IDs

v0.4.9
soramisorami·5y ago·June 19, 2020
GitHub

📋 Changes

  • #134 Fix Morphemelist split (Error reported in #133)
v0.4.8
soramisorami·5y ago·June 18, 2020
GitHub

📋 Changes

  • #131 Fix connection cost lookup (spaCy accuracy degradation reported in #129)
  • #130 Fix `Lattice.dump` error (Reported in #128)
v0.4.7
soramisorami·6y ago·June 15, 2020
GitHub

📋 Changes

  • #127 Don't explicitly release the memoryview in Grammar
v0.4.6
soramisorami·6y ago·June 10, 2020
GitHub

📋 Changes

  • #123 Cython based optimization
  • #124 Add `__str__` functions for Moprheme, MorphemeList
v0.4.5
soramisorami·6y ago·June 2, 2020
GitHub

📋 Changes

  • #121 Fix a bug causing … is converted to "", "", "…"
  • #122 Improve error messages related to dictionary setup
v0.4.4
soramisorami·6y ago·April 30, 2020
GitHub

Speed up execution by re-using unk info #117

v0.4.3
soramisorami·6y ago·February 26, 2020
GitHub

Upgrade a dependent Python library `dartsclone` ([rixwew/darts-clone-python](https://github.com/rixwew/darts-clone-python)) to `v0.9`.

v0.4.2: Bug fixv0.4.2
soramisorami·6y ago·December 6, 2019
GitHub

Fix the runtime error `EOS isn't connected to BOS` when input contains `阿Q[a-zA-Z]+`. Equivalent to [Fix #45 by kazuma-t · Pull Request #113 · WorksApplications/Sudachi](https://github.com/WorksApplications/Sudachi/pull/113) in the original Java Sudachi.

Bug fixv0.4.1
soramisorami·6y ago·November 26, 2019
GitHub

📋 Changes

  • Proper reading forms for Hiragana and Katakana words #104
  • Proper OOV flag #106
v0.4.0
izziiytizziiyt·6y ago·September 7, 2019
GitHub

cythonize dartsclone using https://github.com/rixwew/darts-clone-python

v0.3.7
izziiytizziiyt·6y ago·July 21, 2019
GitHub

In README.md > If you need to apply customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/Sudachi/blob/develop/src/main/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`. not worked properly but fixed. This change maybe **incompatible** if you use your defined `sudachi.json`.

v0.3.1
izziiytizziiyt·6y ago·July 7, 2019
GitHub

📋 Changes

  • delete default dictionary utility, because of PyPI dependency
v0.3.0
izziiytizziiyt·6y ago·July 7, 2019
GitHub

📋 Changes

  • link command, utility to manage dictionary
  • default dictionary, improve utility to install SudachiPy
  • `/resources` moved into `/sudachipy`
  • `tests` module independent from `/resources`
v0.2.1
kazuma-tkazuma-t·6y ago·July 5, 2019
GitHub

This release is just for fixing the packaging issue. It does not contain any additional bug fixes from 0.2.0.

v0.2.0
izziiytizziiyt·6y ago·July 4, 2019
GitHub

📋 Changes

  • interface changed !! see README
  • can use user dictionary
  • dictionary build command
  • no test need real system.dic
  • apply some format rules
v0.1.1
izziiytizziiyt·6y ago·June 19, 2019
GitHub

📋 Changes

  • parsing character definition file
  • parsing dictionary file
  • other minor bugs
  • add tests following Sudachi-Java
  • add CI system
  • introduce formatter