Languages
[Edit]
EN

Java - check words similarity (fuzzy compare with bigrams)

8 points
Created by:
Root-ssh
178180

In this article, we would like to show how to check words similarity in Java.

Below logic:

  1. calculates words bigrams,
  2. counts bigram hits to find similarity,
  3. divides hits by bigrams to calculate final words similarity.

Below checkSimilarity() function result indicates how two words are similar.

Similarity measured is from 0, where:

  • 0 - means: the worlds are totally different,
  • >=1 - means: the words are the same or contain similar part.

That kind of approach is may be applied in fuzzy search.

Practical example

Edit

Program.java file:

Output:

 

FuzzyUtils.java file:

Note: do not compare sentences or whole texts using the above function - it may lead to comparison mistakes.

References

Edit
  1. Bigram - Wikipedia
  2. Approximate string matching - Wikipedia

Alternative titles

  1. Java - fuzzy word compare
1
Donate to Dirask
Our content is created by volunteers - like Wikipedia. If you think, the things we do are good, donate us. Thanks!
Join to our subscribers to be up to date with content, news and offers.

Java - string metrics algorithms

Native Advertising
🚀
Get your tech brand or product in front of software developers.
For more information Contact us
Dirask - we help you to
solve coding problems.
Ask question.

❤️💻 🙂

Join