EN
TypeScript - check words similarity (fuzzy compare with bigrams)
0 points
In this article, we would like to show how to check words similarity in TypeScript.
Below logic:
- calculates words bigrams,
- counts bigram hits to find similarity,
- divides hits by bigrams to calculate final words similarity.
Below checkSimilarity()
function result indicates how two words are similar.
Similarity measured is from 0
where:
0
- means: the worlds are totally different,>=1
- means: the words are the same or contain similar part.
That kind of approach is may be applied in fuzzy search.
Practical example:
xxxxxxxxxx
1
// ONLINE-RUNNER:browser;
2
3
const createBigram = (word: string): string[] => {
4
const input = word.toLowerCase();
5
const vector = [];
6
for (let i = 0; i < input.length; ++i) {
7
vector.push(input.slice(i, i + 2));
8
}
9
return vector;
10
};
11
12
const checkSimilarity = (a: string, b: string): number => {
13
if (a.length > 0 && b.length > 0) {
14
const aBigram = createBigram(a);
15
const bBigram = createBigram(b);
16
let hits = 0;
17
for (let x = 0; x < aBigram.length; ++x) {
18
for (let y = 0; y < bBigram.length; ++y) {
19
if (aBigram[x] === bBigram[y]) {
20
hits += 1;
21
}
22
}
23
}
24
if (hits > 0) {
25
const union = aBigram.length + bBigram.length;
26
return (2.0 * hits) / union;
27
}
28
}
29
return 0;
30
};
31
32
33
// Usage example:
34
35
console.log(checkSimilarity('Chris', 'Chris')); // 1
36
console.log(checkSimilarity('John1', 'John2')); // 0.6
37
console.log(checkSimilarity('Google', 'Gogle')); // 0.9090909090909091
38
console.log(checkSimilarity('Ann', 'Matt')); // 0
Note:
Don't compare sentences or whole texts using the above function - it may lead to comparison mistakes.