wolfgarbe/SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
📋 Changes
- TargetFrameworks changed from `netstandard2.0;net461;net47;netcoreapp3.0` to `netstandard2.0;net9.0`.
- PackageReferences updated.
- In SymSpell.Test all Assert.AreEqual changed to Assert.That
- Incorporates PR #126 that fixes null reference exception in CommitStaged (#139).
FIX: Exception fixed in WordSegmentation CHANGE: Framework target removed netcoreapp2.1 - end of support
CHANGE: Framework target changed from net472 to net47 CHANGE: Framework target added netcoreapp3.0 IMPROVEMENT: More common contractions added to frequency_dictionary_en_82_765.txt
FIX: WordSegmentation did not work correctly if input string contained words in uppercase. IMPROVEMENT: WordSegmentation now retains/preserves case. IMPROVEMENT: WordSegmentation now keeps punctuation or apostrophe adjacent to previous word. IMPROVEMENT: WordSegmentation now normalizes ligatures: "scientific" -> "scientific". IMPROVEMENT: WordSegmentation now removes hyphens prior to word segmentation (as they might be caused by syllabification). IMPROVEMENT: American English word forms added to dictionary in addition to British English e.g. favourable -> favorable.
LoadDictionary and LoadBigramDictionary now have an optional separator parameter, which defines the separator characters (e.g. '\t') between term(s) and count. This allows the dictionaries to contain space separated phrases.
IMPROVEMENT: Better SymSpell.LookupCompound correction quality with existing single term dictionary by using Naive Bayes probability for selecting best word splitting. IMPROVEMENT: Even better SymSpell.LookupCompound correction quality, when using the optional bigram dictionary in order to use sentence level context information for selecting best spelling correction. IMPROVEMENT: English bigram frequency dictionary included
NEW: Stream support for LoadDictioary() and CreateDictionary() methods added
NEW: WordSegmentation added: Divides a string into words by inserting missing spaces. Misspelled words are corrected and do not prevent segmentation. NEW: CommandLine added. Parameter LookupType: lookup, lookupcompound, wordsegment. Allows pipes and redirects for Input & Output. IMPROVEMENT: Lookup with maxEditDistance=0 faster. IMPROVEMENT: DamerauOSA edit distance updated,
IMPROVEMENT: SymSpellCompound integrated into SymSpell. IMPROVEMENT: demo, demoCompound, Benchmark now target .Net Core instead of .Net Framework. CHANGE: The testdata directory has been moved from the demo folder into the benchmark folder. CHANGE: License changed from LGPL 3.0 to the more permissive MIT license.
IMPROVEMENT: SymSpell internal dictionary has been refactored. 2x faster dictionary precalculation and 2x lower memory consumption.
IMPROVEMENT: Refactored from static to instantiated class. IMPROVEMENT: Added benchmarking project. IMPROVEMENT: Added unit test project. IMPROVEMENT: Separate maxEditDistance for dictionary precalculation and for lookup. CHANGE: Removed language feature, use separate SymSpell instances instead. CHANGE: Verbosity parameter changed fom Int to Enum FIX: Count overflow protection fixed. FIX: Suggestions not always complete, if maxEditDistance=1 AND input.Length>prefixLength.
FIX: Suggestions were not always complete for input.Length <= editDistanceMax. FIX: Suggestions were not always complete/best for verbose < 2. IMPROVEMENT: Prefix indexing implemented: more than 90% memory reduction. IMPROVEMENT: Faster algorithm for Damerau-Levenshtein-Distance
Generates a SymSpell NuGet package; symspell and symspelldemo splitted into two separate projects
Bug fixes, improvements & new frequency dictionary
Comments cleaned up
<b>2...7 times less memory consumption</b> compared to version 2.0 .
While the basic idea of the Symmetric Delete spelling correction algorithm remains unchanged the implementation has been significantly improved to unleash the full potential of the algorithm. This results in a **10 times faster spelling correction** and **5 times faster dictionary generation** and a less memory consuming dictionary **compared to version 1.6** . Compared to [Peter Norvig's algorithm](http://norvig.com/spell-correct.html) it is now **1,000,000 times faster** for edit distance=3 and 10,000 times faster for edit distance=2.
stable release
