Levenshtein Distance

The string column comparisons are made using the Levenshtein distance (opens in a new tab) algorithm. Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits.

kitten → sitten (substitution of "s" for "k")
sitten → sittin (substitution of "i" for "e")
sittin → sitting (insertion of "g" at the end)

Consider the following tables:

ID	Another ID	Yet Another ID	Name	Age
1	101	201	Jhn	20
2	102	202	Mry	21
2	103	203	Jne	22
3	103	203	Jck	23
3	104	205	Jll	24

Expected Frame

ID	Another ID	Yet Another ID	Name	Age
1	101	201	John	28
2	102	202	Mary	31
2	103	203	Jane	32
3	103	203	Jack	33
3	104	205	Jill	34

You can immediately identify that the Name column in the Tested frame has one character missing at the 2nd index in each row. So the Levenshtein distance between Jhn and John is 1 and so on. The total sum of all the Levenshtein distances is 5. In the comparion report, the total of all the Levenshtein distances is reported. The comparison fails if the total is greater than zero.

Euclidean Distance Keys

ID	Another ID	Yet Another ID	Name	Age
1	101	201	Jhn	20
2	102	202	Mry	21
2	103	203	Jne	22
3	103	203	Jck	23
3	104	205	Jll	24

ID	Another ID	Yet Another ID	Name	Age
1	101	201	Jhn	20
2	102	202	Mry	21
2	103	203	Jne	22
3	103	203	Jck	23
3	104	205	Jll	24

ID	Another ID	Yet Another ID	Name	Age
1	101	201	Jhn	20
2	102	202	Mry	21
2	103	203	Jne	22
3	103	203	Jck	23
3	104	205	Jll	24