ku-nlp/jumanpp
Juman++ (a Morphological Analyzer Toolkit)
4 Releases
Latest: 2y ago
v2.0.0-rc4LatestPre-release
📋 Changes
- Improved hash function which has better IPC
- Fixes for modern compilers/distributions
2.0.0-rc3v2.0.0-rc3Pre-release
📋 Changes
- WARNING: models are not compatible with binaries of previous versions. On the other hand, they are compatible with the master branch now.
- Check that statically-generated inference code uses compatible model
- Protobuf-based output formats (optional, requires protobuf 3.0+ installed)
- Use https://github.com/s-yata/darts-clone as trie implementation, trie index size is 2 times smaller now
- Can now write definitions for models using using text files, not just C++ DSL
📦 Jumandic-specific
- Escape bad characters for JUMAN/lattice output formats
- Fix kaomoji problem breaking brackets (#97)
- Corpus fixes
- Analysis fixes by partial annotations
- Added reading field to aliasing set (but don't trust the reading results in analysis very much, our corpora are not clean for those annotations)
- For the replaced characters we output 元半角 tag in the feature field.
- Lattice output format escapes only tabs. Protobuf output formats don't escape anything.
- Example:
- + 8 more
v2.0.0-rc2Pre-release
✨ New Features
- Windows support! Big thanks to @DoumanAsh! Vista+, XP is NOT supported. Builds with MSVC 2017 and gcc-mingw64 (we are testing those platforms on the internal CI), probably should build with MSVC 2015, but I haven't tried. No binaries yet, but you can help us by [creating an installer](https://github.com/ku-nlp/jumanpp/issues/81).
- Can now output to file with `-o` or `--output`.
- `--segment` now outputs a space-delimited segmentation result without other information. You can also change the delimiter with `--segment-separator` flag.
- `--partial-input` treats input as partially annotated and tries to produce analysis result with restrictions specified by partial annotation.
- `--auto-nbest` automatically changes beam widths (local, global left) and lattice output size depending on the input length.
📦 Model Stability
- Models should be significantly more robust for analyzing random web text than earlier.
v2.0.0-rc1 (First preview)v2.0.0-rc1Pre-release
📋 Changes
- Complete rewrite of Juman++
- Improved analysis speed (>100x) versus v1, rnn models should take about ~1.8 as much as plain juman.
- Improved model accuracy on Kyoto Corpus and [KWDLC](http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?KWDLC)
- Reduced model size
- Reduced memory usage at analysis time
- Juman++ is now can be used as a library (examples will come later)
- Improved emoji support
- Improved kaomoji support (thanks to neologd/unidic for this)
- + 7 more
