GitPedia
fb55

fb55/htmlparser2

The fast & forgiving HTML and XML parser

22 Releases
Latest: 3mo ago
v12.0.0Latest
fb55fb55·3mo ago·March 20, 2026
GitHub

📋 What's Changed

  • This release aligns HTML parsing with the WHATWG spec Almost all changes are to HTML mode only — XML mode is unaffected unless noted.
  • Raw-text & RCDATA tags
  • `<iframe>`, `<noembed>`, `<noframes>`, and `<plaintext>` are now raw-text tags, their content is no longer parsed as HTML
  • `<textarea>` now decodes entities like `<title>` already did
  • Self-closing `<script/>`, `<style/>`, etc. now enter their raw-text state (the `/` is ignored per spec) unless `recognizeSelfClosing` is enabled
  • SVG & MathML
  • Tag names inside `<svg>` are case-adjusted per spec (`foreignObject`, `clipPath`, etc.)
  • CDATA sections inside foreign content are treated as text
  • + 17 more
v11.0.0
fb55fb55·3mo ago·March 19, 2026
GitHub

💥 Breaking Changes

  • The module is now ESM only https://github.com/fb55/htmlparser2/pull/2381
  • CommonJS `require()` is not supported in legacy environment anymore. Use `import` instead.
  • The minimum Node.js version is now 20.19.0.
  • Dependencies have been bumped to their latest major versions: `domhandler` v6, `domutils` v4, `domelementtype` v3, `entities` v8.

Features

  • Added `WebWritableStream` for the Web Streams API, enabling direct piping from `fetch()` response bodies into the parser https://github.com/fb55/htmlparser2/pull/2376

🐛 Bug Fixes

  • Comments now accept `--!>` as a closing sequence per the HTML spec, and `<!-->` is recognized as an empty comment in HTML mode https://github.com/fb55/htmlparser2/pull/2382
  • XML processing instructions (`<?xml ... ?>`) now require the full `?>` closing sequence instead of just `>` https://github.com/fb55/htmlparser2/pull/2382
  • Fixed `reset()` not clearing `isSpecial` and `sequenceIndex` state, which could cause incorrect parsing after reuse https://github.com/fb55/htmlparser2/pull/2382
  • Fixed XML comment parsing: `<!-->` is no longer treated as a complete comment in `xmlMode` https://github.com/fb55/htmlparser2/pull/2383

📋 Other Changes

  • Expanded README with full API reference, parser options, events, and practical examples https://github.com/fb55/htmlparser2/pull/2384

New Contributors

  • @vimzh made their first contribution in https://github.com/fb55/htmlparser2/pull/2376
  • Full Changelog: https://github.com/fb55/htmlparser2/compare/v10.1.0...v11.0.0
v10.1.0
fb55fb55·5mo ago·January 21, 2026
GitHub

📋 What's Changed

  • entities was bumped from 6.0.1 to 7.0.1, bringing size & speed improvements https://github.com/fb55/htmlparser2/pull/2215
  • Test files are no longer shipped in the published module https://github.com/fb55/htmlparser2/commit/72da67183174d6a7e981f4eb5cbff4a4c0bf8ddf

New Contributors

  • @KTibow made their first contribution, bumping us to eslint 9 in https://github.com/fb55/htmlparser2/pull/2204
  • Full Changelog: https://github.com/fb55/htmlparser2/compare/v10.0.0...v10.1.0
v10.0.0
fb55fb55·1y ago·December 24, 2024
GitHub

📋 Changes

  • Breaking: Simplify writable stream import path d5882db
  • feat: Support `xmp` tag parsing (#1790 by @nati-elmaliach) ecdb071
  • Run tests with vitest (#1845) aa0c781
  • Dependency upgrades
v9.1.0
fb55fb55·2y ago·January 5, 2024
GitHub

🐛 Fixes

  • Fixed `onattribend`'s `endIndex` (#1540 by @DimaIT)
  • Treat textarea as special tag (#1719 by @DimaIT)

Features

  • Export `QuoteType` (#1543 by @DimaIT) and `Handler` interface (#1690 by @benkroeger)
v9.0.0
fb55fb55·3y ago·May 10, 2023
GitHub

💥 Breaking Changes

  • The tokenizer now uses the `EntityDecoder` from the `entities` module https://github.com/fb55/htmlparser2/pull/1480
  • Parsing of entities in attributes is now aligned with the HTML spec, and some inputs will produce different results. Eg. in `<a href='&amp=boo'>` the attribute value won't be modified any more.
  • The `ontextentity` tokenizer callback now has an `endIndex` argument; if you use the tokenizer directly, make sure indices are still the same.
  • Stacks inside the parser have been reversed. https://github.com/fb55/htmlparser2/pull/1511

Features

  • Added a `createDocumentStream` function, analogous to `createDomStream` (which is now deprecated) https://github.com/fb55/htmlparser2/pull/1510
  • Full Changelog: https://github.com/fb55/htmlparser2/compare/v8.0.2...v9.0.0
v8.0.2
fb55fb55·3y ago·March 22, 2023
GitHub

🐛 Bug Fixes

  • Reset tokenizer baseState after closing tag name by @KillyMXI in https://github.com/fb55/htmlparser2/pull/1460

📋 Other changes

  • Dependency version bumps
  • GitHub Workflows security hardening by @sashashura in https://github.com/fb55/htmlparser2/pull/1365
  • refactor(lint): Add `eslint-plugin-n` and `-unicorn` by @fb55 in https://github.com/fb55/htmlparser2/pull/1352
  • chore(test): Move from JSON tests to specs by @fb55 in https://github.com/fb55/htmlparser2/pull/1354
  • docs(readme): Use GitHub Actions CI badge by @fb55 in https://github.com/fb55/htmlparser2/pull/1374

New Contributors

  • @sashashura made their first contribution in https://github.com/fb55/htmlparser2/pull/1365
  • @KillyMXI made their first contribution in https://github.com/fb55/htmlparser2/pull/1460
  • Full Changelog: https://github.com/fb55/htmlparser2/compare/v8.0.1...v8.0.2
v8.0.1
fb55fb55·4y ago·April 29, 2022
GitHub

📋 Changes

  • Added missing `WritableStream` export in the `package.json` 6923fca
v8.0.0
fb55fb55·4y ago·April 23, 2022
GitHub

💥 Breaking

  • The deprecated `FeedHandler` class has been removed https://github.com/fb55/htmlparser2/pull/1166
  • See https://github.com/fb55/htmlparser2/pull/1166 for how to migrate.
  • Typescript >= 4.5 is now required; see https://github.com/fb55/htmlparser2/issues/1242
  • The types from [`domhandler`](https://github.com/fb55/domhandler/releases/tag/v5.0.0) and [`domutils`](https://github.com/fb55/domutils/releases/tag/v3.0.0) have changed, the deprecated `normalizeWhitespace` option was removed https://github.com/fb55/htmlparser2/pull/1164
  • The parser was updated to no longer concatenate strings. This led to several changes of internal interfaces. https://github.com/fb55/htmlparser2/pull/1045
  • This reduces the memory overhead when parsing streams, and avoids copying memory.
  • Breaking if you were previously extending internals.
  • `Parser.write()` and `Parser.end()` now only accept string arguments. If you were previously
  • + 2 more

Features

  • `htmlparser2` is now a dual CommonJS & ESM module https://github.com/fb55/htmlparser2/pull/1165

📋 Other changes

  • Updated for `entities`' updated decoding tree structure https://github.com/fb55/htmlparser2/pull/1146
  • Highlight special close-implies-open logic by @vassudanagunta in https://github.com/fb55/htmlparser2/pull/1047
  • Update Events/07 test to clarify interpretation of tag end slashes by @vassudanagunta in https://github.com/fb55/htmlparser2/pull/1046
  • Suggest `parse5` for HTML compliance by @vassudanagunta in https://github.com/fb55/htmlparser2/pull/1147

New Contributors

  • @vassudanagunta made their first contribution in https://github.com/fb55/htmlparser2/pull/1047
  • Full Changelog: https://github.com/fb55/htmlparser2/compare/v7.2.0...v8.0.0
v7.2.0
fb55fb55·4y ago·November 11, 2021
GitHub

📋 What's Changed

  • __Fixes:__
  • Decode entities after < by @fb55 in https://github.com/fb55/htmlparser2/pull/1008
  • Stringify non-string chunks by @fb55 in https://github.com/fb55/htmlparser2/pull/1010
  • __Docs__
  • docs(readme): make `parseDocument()` example clearer by @cameronsteele in https://github.com/fb55/htmlparser2/pull/998
  • __Refactors:__
  • Introduce sequences & fast forwarding by @fb55 in https://github.com/fb55/htmlparser2/pull/1007
  • Emit text before entities once entity is confirmed by @fb55 in https://github.com/fb55/htmlparser2/pull/1009
  • + 1 more

New Contributors

  • @cameronsteele made their first contribution in https://github.com/fb55/htmlparser2/pull/998
  • Full Changelog: https://github.com/fb55/htmlparser2/compare/v7.1.2...v7.2.0
v7.1.2
fb55fb55·4y ago·September 11, 2021
GitHub

📋 Changes

  • Fix indices of self-closing tags in XML (#949, reported in #941) 3287ef2
  • Bump domhandler from 4.2.0 to 4.2.2 (#935) 45b2cfe
v7.1.1
fb55fb55·4y ago·August 29, 2021
GitHub

📋 Changes

  • Fixed a bug where implied close tags would be misreported (#933) 903fb43
  • Fixed `endIndex` of text events being off by 1 (#932) 78ef1b7
v7.1.0
fb55fb55·4y ago·August 27, 2021
GitHub

📋 Changes

  • Added an `isImplied` flag to the `onopentag`/`onclosetag` events (#930) f917004
  • This allows consumers to set start/end indices more correctly. Inspired by https://github.com/posthtml/posthtml-parser/pull/80.
  • It is now possible to get indices for attributes (#929) 28c162b
  • `htmlparser2@7.0.0` changed how indices were computed. Unfortunately, a lot of edge-cases weren't handled correctly. This version fixes this.
  • refactor: Fix how indices are computed, add attrib indices (#929) 28c162b
  • fix(parser): Fix indices for end, CDATA, add indices to tests (#928) 4e25252
  • fix(parser): Don&#39;t override position for implied opening tags (#917) fac221d
  • fix(parser): Index of closing tag was misaligned (#913) 04c411c
  • + 10 more
v7.0.0
fb55fb55·4y ago·August 20, 2021
GitHub

📋 Changes

  • Fixed how start & end index positions are calculated (#910) 5ab080e
  • Some indices, especially end indices, will now have changed. Most importantly, end indices will now always be greater or equal than start indices (whoops!).
  • Added an `isVoidElement` method to the parser (#785) 00ce57a
  • Use a trie to decode HTML & XML entities in the tokenizer (#863) 9a47a55
  • Leads to large speed-ups when dealing with entities.
  • Iterate over char codes in the tokenizer (#894) f5aed75
  • Improved tokenizer performance by ~40%.
  • Use `Map` for `openImpliesClose` in the parser (#911) 39a8109
  • + 1 more
v6.1.0
fb55fb55·5y ago·April 8, 2021
GitHub

📋 Changes

  • Export tokenizer callback interface from main module (#751) ab0b3fc f59473a
  • Allow XML tags to start with any character (#778) 0b94ab5
  • Bump domhandler from 4.0.0 to 4.1.0 e64e8e5
  • Bump domelementtype from 2.1.0 to 2.2.0 8bc1719
  • Bump domutils from 2.4.4 to 2.5.2 8b91d97 cf77476 7c233de
v6.0.1
fb55fb55·5y ago·March 7, 2021
GitHub

📋 Changes

  • Fix parsing special closing tags (#746) 214ab08
  • Thanks to @BenoitZugmeyer for the report (#745)!
v6.0.0
fb55fb55·5y ago·December 8, 2020
GitHub

📋 Changes

  • Bump domhandler, domutils 4dd4233 0d278fd
  • The new version of domhandler now comes with an actual root element for the document. This might break tests in a few cases. See [the domhandler release notes](https://github.com/fb55/domhandler/releases/tag/v4.0.0) for more details.
  • Make some private properties actually private 1c71e60
  • Add a `parseDocument` method 4653f23
  • This returns the root node of the document, instead of an array of the first nodes. You likely want to use this instead of the now deprecated `getDOM` method.
  • Improve docs df7ea98 1ce1d3b 0437d9c
  • FeedHandler: Slightly restructure code b6b4382
v5.0.1
fb55fb55·5y ago·October 26, 2020
GitHub

📋 Changes

  • Fix: Parse entities in `<title>` tags (#614, #615 by @billneff79) 3295a8b
  • Fix: Remove @types/node as a peer dependency 1ace384
v5.0.0
fb55fb55·5y ago·October 3, 2020
GitHub

📋 Changes

  • Default the `decodeEntities` option to `true` 8ac01e0
  • Removes underscores in front of many private properties & methods. 6e296d2
  • Removes `EVENTS`, `WritableStream` and `CollectingHandler` exports from module import. The latter two are still part of the module, but now have to be imported explicitly. 6e296d2
  • The parser no longer extends `EventEmitter` f30f13c
  • HTML `<title>` tag content is now processed as text (#483 by @billneff79) 0189e56
  • Add media content parsing to FeedHandler (#560 by @gcandal) a85e4e0
  • Expose the quotes that were used in the `onattribute` event 3c86256
  • Add "sideEffects: false" to package.json (#474 by @ericjeney) d90dd64
  • + 7 more
v4.1.0
fb55fb55·6y ago·February 23, 2020
GitHub

📋 Changes

  • Don't fail when parsing `<__proto__>` (fixes #387)
  • Add `types` field to package.json
  • Update dependencies
v4.0.0
fb55fb55·6y ago·August 3, 2019
GitHub

📋 Changes

  • Port to TypeScript, Jest
  • Remove the `Stream` and `ProxyHandler` exports
  • Order some conditionals in Tokenizer by their likelihood to be hit
  • Fix implicit closing of certain tags — @voithos
  • Fix: options.Tokenizer modified outer scope — @thorn0
3.3.0
fb55fb55·12y ago·September 11, 2013
GitHub