EN
JavaScript - check words similarity (fuzzy compare with bigrams)
11 points
In this article, we would like to show how to check words similarity in JavaScript.
Below logic:
- calculates words bigrams,
- counts bigram hits to find similarity,
- divides hits by bigrams to calculate final words similarity.
Below checkSimilarity()
function result indicates how two words are similar.
Similarity measured is from 0
where:
0
- means: the worlds are totally different,>=1
- means: the words are the same or contain similar part.
That kind of approach is may be applied in fuzzy search.
Practical example:
xxxxxxxxxx
1
const createBigram = word => {
2
const input = word.toLowerCase();
3
const vector = [];
4
for (let i = 0; i < input.length; ++i) {
5
vector.push(input.slice(i, i + 2));
6
}
7
return vector;
8
};
9
10
const checkSimilarity = (a, b) => {
11
if (a.length > 0 && b.length > 0) {
12
const aBigram = createBigram(a);
13
const bBigram = createBigram(b);
14
let hits = 0;
15
for (let x = 0; x < aBigram.length; ++x) {
16
for (let y = 0; y < bBigram.length; ++y) {
17
if (aBigram[x] === bBigram[y]) {
18
hits += 1;
19
}
20
}
21
}
22
if (hits > 0) {
23
const union = aBigram.length + bBigram.length;
24
return (2.0 * hits) / union;
25
}
26
}
27
return 0;
28
};
29
30
31
// Usage example:
32
33
console.log(checkSimilarity('Chris', 'Chris')); // 1
34
console.log(checkSimilarity('John1', 'John2')); // 0.6
35
console.log(checkSimilarity('Google', 'Gogle')); // 0.9090909090909091
36
console.log(checkSimilarity('Ann', 'Matt' )); // 0
Note: do not compare sentences or whole texts using the above function - it may lead to comparison mistakes.