Jschardet
Character encoding auto-detection in JavaScript (port of python's chardet)
jschardet is a character encoding detector for JavaScript. Runs in Node.js and browsers with zero runtime dependencies. The project is written primarily in TypeScript, distributed under the BSD Zero Clause License license, first published in 2010. Key topics include: character-encoding, charset.
jschardet
jschardet is a character encoding detector for JavaScript. Runs in Node.js and browsers with zero runtime dependencies.
jschardet 4 is a ground-up TypeScript port of chardet 7. It's much faster and more accurate than jschardet 3, and a drop-in replacement for its documented API.
The API is detect() and detectAll(), returning encoding, confidence, language, and mimeType.
Features
99.2% accuracy on 2,517 test files, up from 42.0% in jschardet 3, with ~6× the throughput and ~9× lower peak memory. Language detection for every result. MIME type detection for binary files.
| jschardet 4.0.0 | jschardet 3.1.4 | chardet 7.4.3 (Python) | |
|---|---|---|---|
| Accuracy (2,517 files) | 99.2% | 42.0% | 99.2% |
| Speed | 945 files/s | 154 files/s | 187 files/s |
| Language detection | 97.4% | — | 97.4% |
| Peak memory | 84.5 MiB | 751.4 MiB | 50.7 MiB |
| Bundle size (min / gzip) | 1,043 / 676 KiB | 334 / 120 KiB | — |
| Cold start (import + first detect) | 80.3 ms | 25.7 ms | 94.9 ms |
| Runs in browsers | yes | yes | — |
| MIME type detection | yes | no | yes |
| License | 0BSD | LGPL | 0BSD |
Compared to jschardet 3, v4 has a larger bundle and a ~80 ms first-call cost. Both come from shipping a larger detection model; the model decompresses once on first detect() and stays in memory afterwards, so subsequent calls run at full speed.
See docs/performance.md for the full benchmark methodology and per-encoding accuracy.
Installation
npm
bashnpm install jschardet
Browser
Copy and include jschardet.min.js in your page (IIFE, attaches a global jschardet). For ESM, use jschardet.esm.min.js instead. Unminified builds and source maps are in dist/.
The library is also available via jsDelivr:
| Format | URL |
|---|---|
IIFE (<script src>) | https://cdn.jsdelivr.net/npm/jschardet |
ESM (<script type="module">) | https://cdn.jsdelivr.net/npm/jschardet/dist/jschardet.esm.min.js |
Classic script tag (after copying jschardet.min.js next to your HTML):
html<script src="jschardet.min.js"></script> <script> console.log(jschardet.detect("\xc3\xa0\xc3\xad\xc3\xa0\xc3\xa7\xc3\xa3")); </script>
ESM (after copying jschardet.esm.min.js):
html<script type="module"> import { detect } from './jschardet.esm.min.js'; console.log(detect("\xc3\xa0\xc3\xad\xc3\xa0\xc3\xa7\xc3\xa3")); </script>
Quick start
jsimport { detect, detectAll } from 'jschardet'; // string — ASCII detect("Python is a great programming language for beginners and experts alike.") // { encoding: 'ascii', confidence: 1, language: 'en', mimeType: 'text/plain' } // Uint8Array — "The naïve approach doesn't always work in complex systems." in UTF-8 detect(new TextEncoder().encode("The naïve approach doesn't always work in complex systems.")) // { encoding: 'utf-8', confidence: 0.84, language: 'en', mimeType: 'text/plain' } // Uint8Array — "日本語の文字コード検出テストです。" in EUC-JP detect(new Uint8Array([ 0xc6, 0xfc, 0xcb, 0xdc, 0xb8, 0xec, 0xa4, 0xce, 0xca, 0xb8, 0xbb, 0xfa, 0xa5, 0xb3, 0xa1, 0xbc, 0xa5, 0xc9, 0xb8, 0xa1, 0xbd, 0xd0, 0xa5, 0xc6, 0xa5, 0xb9, 0xa5, 0xc8, 0xa4, 0xc7, 0xa4, 0xb9, 0xa1, 0xa3, ])) // { encoding: 'EUC-JP', confidence: 0.56, language: 'ja', mimeType: 'text/plain' } // Buffer (Node.js) — "Le café est une boisson très populaire en France et dans le monde entier." in windows-1252 const results = detectAll(Buffer.from([ 76, 101, 32, 99, 97, 102, 233, 32, 101, 115, 116, 32, 117, 110, 101, 32, 98, 111, 105, 115, 115, 111, 110, 32, 116, 114, 232, 115, 32, 112, 111, 112, 117, 108, 97, 105, 114, 101, 32, 101, 110, 32, 70, 114, 97, 110, 99, 101, 32, 101, 116, 32, 100, 97, 110, 115, 32, 108, 101, 32, 109, 111, 110, 100, 101, 32, 101, 110, 116, 105, 101, 114, 46, ])) for (const r of results.slice(0, 4)) { console.log(r.encoding, r.confidence.toFixed(2)); } // Windows-1252 0.32 // iso8859-15 0.32 // ISO-8859-1 0.32 // MacRoman 0.31
API
detect(buffer, options?)
Accepts a string or Uint8Array (Node Buffer works too). Returns the best match as an object:
| Field | Type | Description |
|---|---|---|
encoding | string | null | Detected encoding name, or null if unknown |
confidence | number | Score from 0.0 to 1.0 |
language | string | null | Language hint when available |
mimeType | string | null | MIME type hint when available |
detectAll(buffer, options?)
Same input types as detect. Returns all candidates above the confidence threshold (default 0.20), sorted by confidence. At least one result is always returned when the built-in threshold applies.
Options
tsinterface IOptionsMap { minimumThreshold?: number; // override default 0.20 for detectAll filtering detectEncodings?: string[]; // allowlist of encoding names to consider excludeEncodings?: string[]; // blocklist of encoding names }
enableDebug()
Logs full candidate lists to the console from detect / detectAll (useful when tuning thresholds or allowlists).
CLI
bashjschardet somefile.txt # somefile.txt: utf-8 with confidence 1 jschardet --minimal somefile.txt # utf-8 # Include detected language jschardet -l somefile.txt # somefile.txt: utf-8 en (English) with confidence 1 # Only consider specific encodings jschardet -i utf-8,windows-1252 somefile.txt # somefile.txt: utf-8 with confidence 1 # Pipe from stdin cat somefile.txt | jschardet # stdin: utf-8 with confidence 1
Supported encodings
Same encodings as chardet (aliases and encoding-era filters are documented there).
Modern Web
ascii, big5hkscs, cp874, cp932, cp949, euc-jis-2004, euc-kr, gb18030, hz-gb-2312, iso-2022-kr, iso2022-jp-2, iso2022-jp-2004, iso2022-jp-ext, koi8-r, koi8-u, shift_jis_2004, tis-620, utf-16, utf-16-be, utf-16-le, utf-32, utf-32-be, utf-32-le, utf-7, utf-8, utf-8-sig, windows-1250, windows-1251, windows-1252, windows-1253, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258
Legacy ISO
iso-8859-1, iso-8859-2, iso-8859-3, iso-8859-4, iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-9, iso-8859-10, iso-8859-13, iso-8859-14, iso-8859-15, iso-8859-16, johab
Legacy Mac
mac-cyrillic, mac-greek, mac-iceland, mac-latin2, mac-roman, mac-turkish
Legacy Regional
cp1006, cp1125, cp720, hp-roman8, koi8-t, kz-1048, ptcp154
DOS
cp437, cp737, cp775, cp850, cp852, cp855, cp856, cp857, cp858, cp860, cp861, cp862, cp863, cp864, cp865, cp866, cp869
Mainframe (EBCDIC)
cp1026, cp1140, cp273, cp424, cp500, cp875
chardet module
The upstream chardet API is available as-is via the chardet named export.
Use UniversalDetector for streaming detection over large files or network streams:
jsimport { chardet } from 'jschardet'; import { createReadStream } from 'node:fs'; const detector = new chardet.UniversalDetector(); for await (const chunk of createReadStream('unknown.txt')) { detector.feed(chunk); if (detector.done) break; } console.log(detector.close()); // { encoding: 'utf-8', confidence: 1, language: 'en', mimeType: 'text/plain' }
License
Contributors
Showing top 12 contributors by commit count.
