Nearest Words Finder is provided for the CLARIN VLO. The service allows a user to process sequences of characters separated by indentation characters and compares them with user dictionary sequences. The result of the service is an HTML table with “word – match” pairs, where the word is the original sequence of characters and the match is the dictionary sequence of characters closest to the word according to the Levenshtein distance.
Levenshtein’s distance (edit distance) is a metric that measures the difference between two sequences of characters. It is defined as the minimum number of one-character operations (such as insert, delete, replace) required to convert one sequence of characters to another. In the general case, the operations used in this transformation can be assigned different prices (within the service the operations are equivalent). It is widely used in information theory and computational linguistics.
The practical value of the “Nearest Words Finder” service depends largely on the scope of Levenshtein’s distance calculation. This calculation is actively used:
- in search engines to find objects or records by name;
- in databases when searching with incomplete or incorrect name;
- to correct text input errors;
- to correct errors caused by automatic recognition of scanned text or speech;
- in other applications related to automatic word processing.
The details are presented here.
Direct link is here.