ku-nlp/jumanpp

Juman++ (a Morphological Analyzer Toolkit)

4 Releases

Latest: 2y ago

v2.0.0-rc4LatestPre-release

eiennohito·2y ago·October 3, 2023

📋 Changes

2.0.0-rc3v2.0.0-rc3Pre-release

eiennohito·6y ago·August 19, 2019

📋 Changes

WARNING: models are not compatible with binaries of previous versions. On the other hand, they are compatible with the master branch now.
Check that statically-generated inference code uses compatible model
Protobuf-based output formats (optional, requires protobuf 3.0+ installed)
Use https://github.com/s-yata/darts-clone as trie implementation, trie index size is 2 times smaller now
Can now write definitions for models using using text files, not just C++ DSL

📦 Jumandic-specific

Escape bad characters for JUMAN/lattice output formats
Fix kaomoji problem breaking brackets (#97)
Corpus fixes
Analysis fixes by partial annotations
Added reading field to aliasing set (but don't trust the reading results in analysis very much, our corpora are not clean for those annotations)
For the replaced characters we output 元半角 tag in the feature field.
Lattice output format escapes only tabs. Protobuf output formats don't escape anything.
Example:
+ 8 more

v2.0.0-rc2Pre-release

eiennohito·8y ago·March 14, 2018

✨ New Features

Windows support! Big thanks to @DoumanAsh! Vista+, XP is NOT supported. Builds with MSVC 2017 and gcc-mingw64 (we are testing those platforms on the internal CI), probably should build with MSVC 2015, but I haven't tried. No binaries yet, but you can help us by [creating an installer](https://github.com/ku-nlp/jumanpp/issues/81).
Can now output to file with `-o` or `--output`.
`--segment` now outputs a space-delimited segmentation result without other information. You can also change the delimiter with `--segment-separator` flag.
`--partial-input` treats input as partially annotated and tries to produce analysis result with restrictions specified by partial annotation.
`--auto-nbest` automatically changes beam widths (local, global left) and lattice output size depending on the input length.

📦 Model Stability

Models should be significantly more robust for analyzing random web text than earlier.

v2.0.0-rc1 (First preview)v2.0.0-rc1Pre-release

eiennohito·8y ago·December 2, 2017

📋 Changes

Complete rewrite of Juman++
Improved analysis speed (>100x) versus v1, rnn models should take about ~1.8 as much as plain juman.
Improved model accuracy on Kyoto Corpus and [KWDLC](http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?KWDLC)
Reduced model size
Reduced memory usage at analysis time
Juman++ is now can be used as a library (examples will come later)
Improved emoji support
Improved kaomoji support (thanks to neologd/unidic for this)
+ 7 more