GitPedia

Stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..

From vickumar1981·Updated March 29, 2026·View on GitHub·

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more. The project is written primarily in Scala, distributed under the Other license, first published in 2017. Key topics include: cosine-similarity, cosine-similarity-scores, dice-coefficient, fuzzy-matching, hacktoberfest.

Logo

StringDistance

Build Status Coverage Status Read the Docs Maven metadata URI License

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more.

Works with generalized arrays.

For more detailed information, please refer to the API Documentation.

Requires: Java 8+ or Scala 2.11+


Contents

  1. Add it to your project
  2. Using in Scala
  3. Using in Scala with implicits
  4. Using in Java
  5. Using with Arrays
  6. Adding your own algorithm
  7. Reporting an Issue
  8. Contributing
  9. License

1. Add it to your project ...

Using sbt:

In build.sbt:

scala
libraryDependencies += "com.github.vickumar1981" %% "stringdistance" % "1.2.7"

Using gradle:

In build.gradle:

groovy
dependencies { compile 'com.github.vickumar1981:stringdistance_2.13:1.2.7' }

Using Maven:

In pom.xml:

xml
<dependency> <groupId>com.github.vickumar1981</groupId> <artifactId>stringdistance_2.13</artifactId> <version>1.2.7</version> </dependency>

Notes:

  • For Scala 2.12, please use the stringdistance_2.12 artifact as a dependency instead.
  • For Scala 2.11, please use the stringdistance_2.11 artifact as a dependency instead.

2. Scala Usage

Example.scala:

scala
// Scala example import com.github.vickumar1981.stringdistance.StringDistance._ import com.github.vickumar1981.stringdistance.StringSound._ import com.github.vickumar1981.stringdistance.impl.{ConstantGap, LinearGap} // Cosine Similarity val cosSimilarity: Double = Cosine.score("hello", "chello") // 0.935 // Damerau-Levenshtein Distance val damerauDist: Int = Damerau.distance("martha", "marhta") // 1 val damerau: Double = Damerau.score("martha", "marhta") // 0.833 // Dice Coefficient val diceCoefficient: Double = DiceCoefficient.score("martha", "marhta") // 0.4 // Hamming Distance val hammingDist: Int = Hamming.distance("martha", "marhta") // 2 val hamming: Double = Hamming.score("martha", "marhta") // 0.667 // Jaccard Similarity val jaccard: Double = Jaccard.score("karolin", "kathrin", 1) // Jaro and Jaro Winkler val jaro: Double = Jaro.score("martha", "marhta") // 0.944 val jaroWinkler: Double = JaroWinkler.score("martha", "marhta", 0.1) // 0.961 // Levenshtein Distance val levenshteinDist: Int = Levenshtein.distance("martha", "marhta") // 2 val levenshtein: Double = Levenshtein.score("martha", "marhta") // 0.667 // Longest Common Subsequence val longestCommonSubSeq: Int = LongestCommonSeq.distance("martha", "marhta") // 5 // Needleman Wunsch val needlemanWunsch: Double = NeedlemanWunsch.score("martha", "marhta", ConstantGap()) // 0.667 // N-Gram Similarity and Distance val ngramDist: Int = NGram.distance("karolin", "kathrin", 1) // 5 val bigramDist: Int = NGram.distance("karolin", "kathrin", 2) // 2 val ngramSimilarity: Double = NGram.score("karolin", "kathrin", 1) // 0.714 val bigramSimilarity: Double = NGram.score("karolin", "kathrin", 2) // 0.333 // N-Gram tokens, returns a List[String] val tokens: List[String] = NGram.tokens("martha", 2) // List("ma", "ar", "rt", "th", "ha") // Overlap Similarity val overlap: Double = Overlap.score("karolin", "kathrin", 1) // 0.286 val overlapBiGram: Double = Overlap.score("karolin", "kathrin", 2) // 0.667 // Smith Waterman Similarities val smithWaterman: Double = SmithWaterman.score("martha", "marhta", (LinearGap(gapValue = -1), Integer.MAX_VALUE)) val smithWatermanGotoh: Double = SmithWatermanGotoh.score("martha", "marhta", ConstantGap()) // Tversky Similarity val tversky: Double = Tversky.score("karolin", "kathrin", 0.5) // 0.333 // Phonetic Similarity val metaphone: Boolean = Metaphone.score("merci", "mercy") // true val soundex: Boolean = Soundex.score("merci", "mercy") // true

3. Scala: Use with Implicits

  • To use implicits and extend the String class: import com.github.vickumar1981.stringdistance.StringConverter._

Example.scala

scala
// Scala example using implicits import com.github.vickumar1981.stringdistance.StringConverter._ // Scores between two strings val cosSimilarity: Double = "hello".cosine("chello") val damerau: Double = "martha".damerau("marhta") val diceCoefficient: Double = "martha".diceCoefficient("marhta") val hamming: Double = "martha".hamming("marhta") val jaccard: Double = "karolin".jaccard("kathrin") val jaro: Double = "martha".jaro("marhta") val jaroWinkler: Double = "martha".jaroWinkler("marhta") val levenshtein: Double = "martha".levenshtein("marhta") val needlemanWunsch: Double = "martha".needlemanWunsch("marhta") val ngramSimilarity: Double = "karolin".nGram("kathrin") val bigramSimilarity: Double = "karolin".nGram("kathrin", 2) val overlap: Double = "karolin".overlap("kathrin") val overlapBiGram: Double = "karolin".overlap("kathrin", 2) val smithWaterman: Double = "martha".smithWaterman("marhta") val smithWatermanGotoh: Double = "martha".smithWatermanGotoh("marhta") val tversky: Double = "karolin".tversky("kathrin", 0.5) // Distances between two strings val damerauDist: Int = "martha".damerauDist("marhta") // 1 val hammingDist: Int = "martha".hammingDist("marhta") val levenshteinDist: Int = "martha".levenshteinDist("marhta") val longestCommonSeq: Int = "martha".longestCommonSeq("marhta") val ngramDist: Int = "karolin".nGramDist("kathrin") val bigramDist: Int = "karolin".nGramDist("kathrin", 2) // N-Gram tokens, returns a List[String] val tokens: List[String] = "martha".tokens(2) // List("ma", "ar", "rt", "th", "ha") // Phonetic similarity of two strings val metaphone: Boolean = "merci".metaphone("mercy") val soundex: Boolean = "merci".soundex("mercy")

4. Java Usage

  • To use in Java: import com.github.vickumar1981.stringdistance.util.StringDistance

Example.java

java
// Java example import com.github.vickumar1981.stringdistance.util.StringDistance; import com.github.vickumar1981.stringdistance.util.StringSound; // Scores between two strings Double cosSimilarity = StringDistance.cosine("hello", "chello"); Double damerau = StringDistance.damerau("martha", "marhta"); Double diceCoefficient = StringDistance.diceCoefficient("martha", "marhta"); Double hamming = StringDistance.hamming("martha", "marhta"); Double jaccard = StringDistance.jaccard("karolin", "kathrin"); Double jaro = StringDistance.jaro("martha", "marhta"); Double jaroWinkler = StringDistance.jaroWinkler("martha", "marhta"); Double levenshtein = StringDistance.levenshtein("martha", "marhta"); Double needlemanWunsch = StringDistance.needlemanWunsch("martha", "marhta"); Double ngramSimilarity = StringDistance.nGram("karolin", "kathrin"); Double bigramSimilarity = StringDistance.nGram("karolin", "kathrin", 2); Double overlap = StringDistance.overlap("karolin", "kathrin"); Double overlapBiGram = StringDistance.overlap("karolin", "kathrin", 2); Double smithWaterman = StringDistance.smithWaterman("martha", "marhta"); Double smithWatermanGotoh = StringDistance.smithWatermanGotoh("martha", "marhta"); Double tversky = StringDistance.tversky("karolin", "kathrin", 0.5); // Distances between two strings Integer damerauDist = StringDistance.damerauDist("martha", "marhta"); Integer hammingDist = StringDistance.hammingDist("martha", "marhta"); Integer levenshteinDist = StringDistance.levenshteinDist("martha", "marhta"); Integer longestCommonSeq = StringDistance.longestCommonSeq("martha", "marhta"); Integer ngramDist = StringDistance.nGramDist("karolin", "kathrin"); Integer bigramDist = StringDistance.nGramDist("karolin", "kathrin", 2); // N-Gram tokens, returns a List<String> List<String> tokens = StringDistance.nGramTokens(2) // List("ma", "ar", "rt", "th", "ha") // Phonetic similarity of two strings Boolean metaphone = StringSound.metaphone("merci", "mercy"); Boolean soundex = StringSound.soundex("merci", "mercy");

5. Using with Arrays

  • You can use the ArrayDistance
    class just like the StringDistance class,
    except using a generic array - Array[T] for Scala and T[] for Java.

  • Make sure your classes are comparable using == for Scala or .equals for Java

Scala Sample Code:

scala
import com.github.vickumar1981.stringdistance.ArrayDistance._ // Example Levenshtein Distance and Score val levenshteinDist = Levenshtein.distance(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 2 val levenshtein = Levenshtein.score(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 0.667

Java Example Code:


6. Adding your own Distance or Scoring Algorithm

  1. Create a marker trait that extends StringMetricAlgorithm:
scala
trait CustomAlgorithm extends StringMetricAlgorithm
  1. Create an implementation for that algorithm using an implicit object. Override either the score or the distance method, depending upon whether the object extends DistanceAlgorithm or ScoringAlgorithm.
scala
implicit object CustomDistance extends DistanceAlgorithm[CustomAlgorithm] { override def distance(s1: String, s2: String): Int = { // Implement distance between s1 and s2 } } implicit object CustomScore extends ScoringAlgorithm[CustomAlgorithm] { override def score(s1: String, s2: String): Double = { // Implement fuzzy score between s1 and s2 } }
  1. Create an object that extends StringMetric using your algorithm as the type parameter, and use the score and distance methods defined in the implicit object.
scala
object CustomMetric extends StringMetric[CustomAlgorithm] val customScore: Double = CustomMetric.score("hello", "hello2") val customDist: Int = CustomMetric.distance("hello", "hello2")

7. Reporting an Issue

Please report any issues or bugs to the Github issues page.


8. Contributing

Please view the contributing guidelines


9. License

This project is licensed under the Apache 2 License.

Contributors

Showing top 4 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from vickumar1981/stringdistance via the GitHub API.Last fetched: 6/17/2026