Search engines for my application - intro
NOTE: this is blog post, do not edit me this post. Thank you.
Search engines:
Free lightweight search engines:
Databases full text search:
Other:
Search engines on github via topics:
Â
Interesting posts
- How to Implement Fuzzy Search (Google's Autocomplete Search) in Java - DZone Java
- java - Full Text Search like Google - Stack Overflow
- How to Write a Spelling Corrector -Â Peter Norvig
- Google AI Blog: All Our N-gram are Belong to You
- An Approach To Highly Intuitive Fuzzy Search In Elasticsearch With Typo Handling - Medium
Â
Elastic
More:Â Search engines for my application - Elasticsearch - Dirask
Â
solr
The Standard Query Parser | Apache Solr Reference Guide 7.0
Fuzzy Searches
Solr’s standard query parser supports fuzzy searches based on the Damerau-Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term.
lucene - How to configure Solr to use Levenshtein approximate string matching? - Stack Overflow
Â
Apache Lucene
todo
Â
typesense
More:Â Search engines for my application - Typesense - Dirask
Â
MeiliSearch
Â
Sonic search engine
- Sonic search engine - github
- Sonic, the leightweight search engine backend with a low CPU footprint - jaxenter.com
- Sonic: A Lightweight, Schema-Less Search - infoq.com
Sonic search engine online running examples / live demos:
Â
Algolia
todo
Â
MariaDB - full text search
Â
PostgreSQL -Â full text search
- PostgreSQL: Documentation: 9.5: Full Text Search
- PostgreSQL(Full Text Search) vs ElasticSearch - Stack Overflow
Â
Compare databases to search engines
- PostgreSQL(Full Text Search) vs ElasticSearch - Stack Overflow
- Is it worth migrating from MySQL to Elasticsearch? - Quora
- Elasticsearch vs. MySQL Comparison - db-engines.com
- performance - Why is Solr so much faster than Postgres? - Stack Overflow
- Solr Vs Elasticsearch: Which Search Engine is Better? - serverguy.com
- PostgreSQL(Full Text Search) vs ElasticSearch - Stack Overflow
Â
Wikiepdia links:
- Letter frequency - Wikipedia
- Bigram - Wikipedia
- n-gram - Wikipedia
- Soundex - Wikipedia
- Full-text search - Wikipedia
- String-searching algorithm - Wikipedia
Â
Algorithms
- Levenshtein distance - Wikipedia
- Damerau–Levenshtein distance - Wikipedia
- Edit distance - Wikipedia
- Soundex - Wikipedia
- Approximate string matching - Wikipedia
Types of edit distance
Different types of edit distance allow different sets of string operations. For instance:
- The Levenshtein distance allows deletion, insertion and substitution.
- The Longest common subsequence (LCS) distance allows only insertion and deletion, not substitution.
- The Hamming distance allows only substitution, hence, it only applies to strings of the same length.
- The Damerau–Levenshtein distance allows insertion, deletion, substitution, and the transposition of two adjacent characters.
- The Jaro distance allows only transposition.
source:Â Edit distance - Wikipedia
Â
String metrics
- Levenshtein distance, or its generalization edit distance
- Damerau–Levenshtein distance
- Sørensen–Dice coefficient
- Block distance or L1 distance or City block distance
- Hamming distance
- Jaro–Winkler distance
- Simple matching coefficient (SMC)
- Jaccard similarity or Jaccard coefficient or Tanimoto coefficient
- Tversky index
- Overlap coefficient
- Variational distance
- Hellinger distance or Bhattacharyya distance
- Information radius (Jensen–Shannon divergence)
- Skew divergence
- Confusion probability
- Tau metric, an approximation of the Kullback–Leibler divergence
- Fellegi and Sunters metric (SFS)
- Maximal matches
- Grammar-based distance
- TFIDFÂ distance metric
source:Â String metric - Wikipedia
- language agnostic - What are some algorithms for comparing how similar two strings are? - Stack Overflow
- compare two string in java result in percentage - Stack Overflow
- string - Java library for free-text diff - Stack Overflow
- GitHub - java-diff-utils/java-diff-utils: Diff Utils library
- NLP/Machine Learning text comparison - Stack Overflow
- GitHub - kpdecker/jsdiff: A javascript text differencing implementation.
Â
Approximate string matching -Â fuzzy search
Approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly).
- insertion: co*t → coat
- deletion: coat → co*t
- substitution: coat → cost
Â
Implementations
In JavaScript:
- JavaScript - calculate Levenshtein distance between strings - Dirask
- JavaScript - check words similarity (fuzzy compare with bigrams) - Dirask
- JavaScript - Soundex algorithm implementation - Dirask
In Java:
- Java - calculate Levenshtein distance between strings - Dirask
- Java - check words similarity (fuzzy compare with bigrams) - Dirask
String metrins / search algorithms - cross technology:
- String - metrics algorithms (Cross technology) - Dirask
- String - search algorithms (Cross technology) - Dirask
Java libs with string metrics algorithms:
Â
Miscs
Google search keywords in this area
google for:
java search like google
recommendation
keywords
synonyms
Full Text Search like Google
Implement Fuzzy Search (Google's Autocomplete Search)
Autocomplete Search in java
programming synonyms for search engine
similarity search algorithms string
best fuzzy search alternative to elastic
string matching algorithm
Elastic:
You Complete Me | Elastic Blog
keywords:
suggestions
Improving Relevance
Synonyms
suggest
weights
Use weights with a reasonable logic behind them
Â
Common parts:
- predictions, suggestions and autocomplete - eg jav -> Java, JavaScript
- synonyms eg js -> JavaScript, remove -> delete
- recommendation engine based on user preferences / interests
- typo and error fixer / correction - JvaScrit -> JavaScript