fb55/htmlparser2
The fast & forgiving HTML and XML parser
22 Releases
Latest: 3mo ago
v12.0.0Latest
📋 What's Changed
- This release aligns HTML parsing with the WHATWG spec Almost all changes are to HTML mode only — XML mode is unaffected unless noted.
- Raw-text & RCDATA tags
- `<iframe>`, `<noembed>`, `<noframes>`, and `<plaintext>` are now raw-text tags, their content is no longer parsed as HTML
- `<textarea>` now decodes entities like `<title>` already did
- Self-closing `<script/>`, `<style/>`, etc. now enter their raw-text state (the `/` is ignored per spec) unless `recognizeSelfClosing` is enabled
- SVG & MathML
- Tag names inside `<svg>` are case-adjusted per spec (`foreignObject`, `clipPath`, etc.)
- CDATA sections inside foreign content are treated as text
- + 17 more
v11.0.0
💥 Breaking Changes
- The module is now ESM only https://github.com/fb55/htmlparser2/pull/2381
- CommonJS `require()` is not supported in legacy environment anymore. Use `import` instead.
- The minimum Node.js version is now 20.19.0.
- Dependencies have been bumped to their latest major versions: `domhandler` v6, `domutils` v4, `domelementtype` v3, `entities` v8.
✨ Features
- Added `WebWritableStream` for the Web Streams API, enabling direct piping from `fetch()` response bodies into the parser https://github.com/fb55/htmlparser2/pull/2376
🐛 Bug Fixes
- Comments now accept `--!>` as a closing sequence per the HTML spec, and `<!-->` is recognized as an empty comment in HTML mode https://github.com/fb55/htmlparser2/pull/2382
- XML processing instructions (`<?xml ... ?>`) now require the full `?>` closing sequence instead of just `>` https://github.com/fb55/htmlparser2/pull/2382
- Fixed `reset()` not clearing `isSpecial` and `sequenceIndex` state, which could cause incorrect parsing after reuse https://github.com/fb55/htmlparser2/pull/2382
- Fixed XML comment parsing: `<!-->` is no longer treated as a complete comment in `xmlMode` https://github.com/fb55/htmlparser2/pull/2383
📋 Other Changes
- Expanded README with full API reference, parser options, events, and practical examples https://github.com/fb55/htmlparser2/pull/2384
✨ New Contributors
- @vimzh made their first contribution in https://github.com/fb55/htmlparser2/pull/2376
- Full Changelog: https://github.com/fb55/htmlparser2/compare/v10.1.0...v11.0.0
v10.1.0
📋 What's Changed
- entities was bumped from 6.0.1 to 7.0.1, bringing size & speed improvements https://github.com/fb55/htmlparser2/pull/2215
- Test files are no longer shipped in the published module https://github.com/fb55/htmlparser2/commit/72da67183174d6a7e981f4eb5cbff4a4c0bf8ddf
✨ New Contributors
- @KTibow made their first contribution, bumping us to eslint 9 in https://github.com/fb55/htmlparser2/pull/2204
- Full Changelog: https://github.com/fb55/htmlparser2/compare/v10.0.0...v10.1.0
v10.0.0
📋 Changes
- Breaking: Simplify writable stream import path d5882db
- feat: Support `xmp` tag parsing (#1790 by @nati-elmaliach) ecdb071
- Run tests with vitest (#1845) aa0c781
- Dependency upgrades
v9.1.0
🐛 Fixes
- Fixed `onattribend`'s `endIndex` (#1540 by @DimaIT)
- Treat textarea as special tag (#1719 by @DimaIT)
✨ Features
- Export `QuoteType` (#1543 by @DimaIT) and `Handler` interface (#1690 by @benkroeger)
v9.0.0
💥 Breaking Changes
- The tokenizer now uses the `EntityDecoder` from the `entities` module https://github.com/fb55/htmlparser2/pull/1480
- Parsing of entities in attributes is now aligned with the HTML spec, and some inputs will produce different results. Eg. in `<a href='&=boo'>` the attribute value won't be modified any more.
- The `ontextentity` tokenizer callback now has an `endIndex` argument; if you use the tokenizer directly, make sure indices are still the same.
- Stacks inside the parser have been reversed. https://github.com/fb55/htmlparser2/pull/1511
✨ Features
- Added a `createDocumentStream` function, analogous to `createDomStream` (which is now deprecated) https://github.com/fb55/htmlparser2/pull/1510
- Full Changelog: https://github.com/fb55/htmlparser2/compare/v8.0.2...v9.0.0
v8.0.2
🐛 Bug Fixes
- Reset tokenizer baseState after closing tag name by @KillyMXI in https://github.com/fb55/htmlparser2/pull/1460
📋 Other changes
- Dependency version bumps
- GitHub Workflows security hardening by @sashashura in https://github.com/fb55/htmlparser2/pull/1365
- refactor(lint): Add `eslint-plugin-n` and `-unicorn` by @fb55 in https://github.com/fb55/htmlparser2/pull/1352
- chore(test): Move from JSON tests to specs by @fb55 in https://github.com/fb55/htmlparser2/pull/1354
- docs(readme): Use GitHub Actions CI badge by @fb55 in https://github.com/fb55/htmlparser2/pull/1374
✨ New Contributors
- @sashashura made their first contribution in https://github.com/fb55/htmlparser2/pull/1365
- @KillyMXI made their first contribution in https://github.com/fb55/htmlparser2/pull/1460
- Full Changelog: https://github.com/fb55/htmlparser2/compare/v8.0.1...v8.0.2
v8.0.1
📋 Changes
- Added missing `WritableStream` export in the `package.json` 6923fca
v8.0.0
💥 Breaking
- The deprecated `FeedHandler` class has been removed https://github.com/fb55/htmlparser2/pull/1166
- See https://github.com/fb55/htmlparser2/pull/1166 for how to migrate.
- Typescript >= 4.5 is now required; see https://github.com/fb55/htmlparser2/issues/1242
- The types from [`domhandler`](https://github.com/fb55/domhandler/releases/tag/v5.0.0) and [`domutils`](https://github.com/fb55/domutils/releases/tag/v3.0.0) have changed, the deprecated `normalizeWhitespace` option was removed https://github.com/fb55/htmlparser2/pull/1164
- The parser was updated to no longer concatenate strings. This led to several changes of internal interfaces. https://github.com/fb55/htmlparser2/pull/1045
- This reduces the memory overhead when parsing streams, and avoids copying memory.
- Breaking if you were previously extending internals.
- `Parser.write()` and `Parser.end()` now only accept string arguments. If you were previously
- + 2 more
✨ Features
- `htmlparser2` is now a dual CommonJS & ESM module https://github.com/fb55/htmlparser2/pull/1165
📋 Other changes
- Updated for `entities`' updated decoding tree structure https://github.com/fb55/htmlparser2/pull/1146
- Highlight special close-implies-open logic by @vassudanagunta in https://github.com/fb55/htmlparser2/pull/1047
- Update Events/07 test to clarify interpretation of tag end slashes by @vassudanagunta in https://github.com/fb55/htmlparser2/pull/1046
- Suggest `parse5` for HTML compliance by @vassudanagunta in https://github.com/fb55/htmlparser2/pull/1147
✨ New Contributors
- @vassudanagunta made their first contribution in https://github.com/fb55/htmlparser2/pull/1047
- Full Changelog: https://github.com/fb55/htmlparser2/compare/v7.2.0...v8.0.0
v7.2.0
📋 What's Changed
- __Fixes:__
- Decode entities after < by @fb55 in https://github.com/fb55/htmlparser2/pull/1008
- Stringify non-string chunks by @fb55 in https://github.com/fb55/htmlparser2/pull/1010
- __Docs__
- docs(readme): make `parseDocument()` example clearer by @cameronsteele in https://github.com/fb55/htmlparser2/pull/998
- __Refactors:__
- Introduce sequences & fast forwarding by @fb55 in https://github.com/fb55/htmlparser2/pull/1007
- Emit text before entities once entity is confirmed by @fb55 in https://github.com/fb55/htmlparser2/pull/1009
- + 1 more
✨ New Contributors
- @cameronsteele made their first contribution in https://github.com/fb55/htmlparser2/pull/998
- Full Changelog: https://github.com/fb55/htmlparser2/compare/v7.1.2...v7.2.0
v7.1.2
📋 Changes
- Fix indices of self-closing tags in XML (#949, reported in #941) 3287ef2
- Bump domhandler from 4.2.0 to 4.2.2 (#935) 45b2cfe
v7.1.1
📋 Changes
- Fixed a bug where implied close tags would be misreported (#933) 903fb43
- Fixed `endIndex` of text events being off by 1 (#932) 78ef1b7
v7.1.0
📋 Changes
- Added an `isImplied` flag to the `onopentag`/`onclosetag` events (#930) f917004
- This allows consumers to set start/end indices more correctly. Inspired by https://github.com/posthtml/posthtml-parser/pull/80.
- It is now possible to get indices for attributes (#929) 28c162b
- `htmlparser2@7.0.0` changed how indices were computed. Unfortunately, a lot of edge-cases weren't handled correctly. This version fixes this.
- refactor: Fix how indices are computed, add attrib indices (#929) 28c162b
- fix(parser): Fix indices for end, CDATA, add indices to tests (#928) 4e25252
- fix(parser): Don't override position for implied opening tags (#917) fac221d
- fix(parser): Index of closing tag was misaligned (#913) 04c411c
- + 10 more
v7.0.0
📋 Changes
- Fixed how start & end index positions are calculated (#910) 5ab080e
- Some indices, especially end indices, will now have changed. Most importantly, end indices will now always be greater or equal than start indices (whoops!).
- Added an `isVoidElement` method to the parser (#785) 00ce57a
- Use a trie to decode HTML & XML entities in the tokenizer (#863) 9a47a55
- Leads to large speed-ups when dealing with entities.
- Iterate over char codes in the tokenizer (#894) f5aed75
- Improved tokenizer performance by ~40%.
- Use `Map` for `openImpliesClose` in the parser (#911) 39a8109
- + 1 more
v6.1.0
📋 Changes
- Export tokenizer callback interface from main module (#751) ab0b3fc f59473a
- Allow XML tags to start with any character (#778) 0b94ab5
- Bump domhandler from 4.0.0 to 4.1.0 e64e8e5
- Bump domelementtype from 2.1.0 to 2.2.0 8bc1719
- Bump domutils from 2.4.4 to 2.5.2 8b91d97 cf77476 7c233de
v6.0.1
📋 Changes
- Fix parsing special closing tags (#746) 214ab08
- Thanks to @BenoitZugmeyer for the report (#745)!
v6.0.0
📋 Changes
- Bump domhandler, domutils 4dd4233 0d278fd
- The new version of domhandler now comes with an actual root element for the document. This might break tests in a few cases. See [the domhandler release notes](https://github.com/fb55/domhandler/releases/tag/v4.0.0) for more details.
- Make some private properties actually private 1c71e60
- Add a `parseDocument` method 4653f23
- This returns the root node of the document, instead of an array of the first nodes. You likely want to use this instead of the now deprecated `getDOM` method.
- Improve docs df7ea98 1ce1d3b 0437d9c
- FeedHandler: Slightly restructure code b6b4382
v5.0.1
📋 Changes
- Fix: Parse entities in `<title>` tags (#614, #615 by @billneff79) 3295a8b
- Fix: Remove @types/node as a peer dependency 1ace384
v5.0.0
📋 Changes
- Default the `decodeEntities` option to `true` 8ac01e0
- Removes underscores in front of many private properties & methods. 6e296d2
- Removes `EVENTS`, `WritableStream` and `CollectingHandler` exports from module import. The latter two are still part of the module, but now have to be imported explicitly. 6e296d2
- The parser no longer extends `EventEmitter` f30f13c
- HTML `<title>` tag content is now processed as text (#483 by @billneff79) 0189e56
- Add media content parsing to FeedHandler (#560 by @gcandal) a85e4e0
- Expose the quotes that were used in the `onattribute` event 3c86256
- Add "sideEffects: false" to package.json (#474 by @ericjeney) d90dd64
- + 7 more
v4.1.0
📋 Changes
- Don't fail when parsing `<__proto__>` (fixes #387)
- Add `types` field to package.json
- Update dependencies
v4.0.0
📋 Changes
- Port to TypeScript, Jest
- Remove the `Stream` and `ProxyHandler` exports
- Order some conditionals in Tokenizer by their likelihood to be hit
- Fix implicit closing of certain tags — @voithos
- Fix: options.Tokenizer modified outer scope — @thorn0
3.3.0
