GitPedia
ku-nlp

ku-nlp/jumanpp

Juman++ (a Morphological Analyzer Toolkit)

4 Releases
Latest: 2y ago
v2.0.0-rc4LatestPre-release
eiennohitoeiennohito·2y ago·October 3, 2023
GitHub

📋 Changes

  • Improved hash function which has better IPC
  • Fixes for modern compilers/distributions
2.0.0-rc3v2.0.0-rc3Pre-release
eiennohitoeiennohito·6y ago·August 19, 2019
GitHub

📋 Changes

  • WARNING: models are not compatible with binaries of previous versions. On the other hand, they are compatible with the master branch now.
  • Check that statically-generated inference code uses compatible model
  • Protobuf-based output formats (optional, requires protobuf 3.0+ installed)
  • Use https://github.com/s-yata/darts-clone as trie implementation, trie index size is 2 times smaller now
  • Can now write definitions for models using using text files, not just C++ DSL

📦 Jumandic-specific

  • Escape bad characters for JUMAN/lattice output formats
  • Fix kaomoji problem breaking brackets (#97)
  • Corpus fixes
  • Analysis fixes by partial annotations
  • Added reading field to aliasing set (but don't trust the reading results in analysis very much, our corpora are not clean for those annotations)
  • For the replaced characters we output 元半角 tag in the feature field.
  • Lattice output format escapes only tabs. Protobuf output formats don't escape anything.
  • Example:
  • + 8 more
v2.0.0-rc2Pre-release
eiennohitoeiennohito·8y ago·March 14, 2018
GitHub

New Features

  • Windows support! Big thanks to @DoumanAsh! Vista+, XP is NOT supported. Builds with MSVC 2017 and gcc-mingw64 (we are testing those platforms on the internal CI), probably should build with MSVC 2015, but I haven't tried. No binaries yet, but you can help us by [creating an installer](https://github.com/ku-nlp/jumanpp/issues/81).
  • Can now output to file with `-o` or `--output`.
  • `--segment` now outputs a space-delimited segmentation result without other information. You can also change the delimiter with `--segment-separator` flag.
  • `--partial-input` treats input as partially annotated and tries to produce analysis result with restrictions specified by partial annotation.
  • `--auto-nbest` automatically changes beam widths (local, global left) and lattice output size depending on the input length.

📦 Model Stability

  • Models should be significantly more robust for analyzing random web text than earlier.
v2.0.0-rc1 (First preview)v2.0.0-rc1Pre-release
eiennohitoeiennohito·8y ago·December 2, 2017
GitHub

📋 Changes

  • Complete rewrite of Juman++
  • Improved analysis speed (>100x) versus v1, rnn models should take about ~1.8 as much as plain juman.
  • Improved model accuracy on Kyoto Corpus and [KWDLC](http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?KWDLC)
  • Reduced model size
  • Reduced memory usage at analysis time
  • Juman++ is now can be used as a library (examples will come later)
  • Improved emoji support
  • Improved kaomoji support (thanks to neologd/unidic for this)
  • + 7 more