aadsm/jschardet
Character encoding auto-detection in JavaScript (port of python's chardet)
13 Releases
Latest: yesterday
Version 4.0.0 (RC0)v4.0.0-rc.0LatestPre-release
📦 Comparison
- 99.2% accuracy on 2,517 test files, up from 42.0% in jschardet 3, with ~6× the throughput and ~9× lower peak memory. Language detection for every result. MIME type detection for binary files.
- | | jschardet 4.0.0 | jschardet 3.1.4 | chardet 7.4.3 (Python) |
- |---|---|---|---|
- | Accuracy (2,517 files) | 99.2% | 42.0% | 99.2% |
- | Speed | 945 files/s | 154 files/s | 187 files/s |
- | Language detection | 97.4% | — | 97.4% |
- | Peak memory | 84.5 MiB | 751.4 MiB | 50.7 MiB |
- | Bundle size (min / gzip) | 1,043 / 676 KiB | 334 / 120 KiB | — |
- + 4 more
Version 3.1.0v3.1.0
📋 Changes
- dist/jschardet.js +3135 (465888 -> 469023)
- dist/jschardet.min.js +3460 +0.01% (335803 -> 339263)
Version 3.0.0v3.0.0
📋 Changes
- maccyrillic -> x-mac-cyrillic
- Fixed a bug that was introduced some months ago related to unicode detect of streams with less than 6 chars.
Version 2.3.0v2.3.0
📋 Changes
- New API function: detectAll, returns a list of all confidences found.
- npm audit fix
Version 2.2.1v2.2.1
📋 Changes
- Fix UTF-8 prober full len calculation, ignores basic ASCII characters
Version 2.2.0v2.2.0
📋 Changes
- Improved UTF8 detection for smaller streams of data
Version 2.1.1v2.1.1
📋 Changes
- Add TypeScript types file
Version 2.1.0v2.1.0
📋 Changes
- Add support for UTF-8 emoji
Version 2.0.0v2.0.0
📋 Changes
- Reorganize the code to use proper modules and allow for composability (e.g.: create a package for only one language).
- This breaks backwards compatibility (hence the major version bump) for the Constants but new methods were added to serve the same purpose: enableDebug and .detect(<bytes>, {minimumThreshold: 0.2}).
Version 1.6.0v1.6.0
📋 Changes
- Improve CharSet prober by filtering english characters (@tarnelope)
Version 1.5.0v1.5.0
📋 Changes
- Fix short windows-1252 text misdetected as EUC-JP
- Fixed wrong character ranges check for SJIS
- Fixed SJIS character class table
Version 1.4.2v1.4.2
v1.4.1
