Euclidean Distance
The comparison for the numeric columns is done using Euclidean distance. Euclidean distance is like a measure of how different two collection of numbers are. Imagine you have pairs of numbers (coordinates) like (20, 21, 22, 23, 24) and (28, 31, 32, 33, 34). The Euclidean distance between them would be the square root of the sum of the squares of the differences between each pair of coordinates. In this case, it would be 22.36.
Formally, Euclidean distance is a metric on a vector space where the distance between two vectors is the length of the line segment connecting them. It's named after Euclid, the Greek mathematician who invented it.
Example
Let's say we have tables:
Tested Frame
ID | Another ID | Yet Another ID | Name | Age |
---|---|---|---|---|
1 | 101 | 201 | John | 20 |
2 | 102 | 202 | Mary | 21 |
2 | 103 | 203 | Jane | 22 |
3 | 103 | 203 | Jack | 23 |
3 | 104 | 205 | Jill | 24 |
Expected Frame
ID | Another ID | Yet Another ID | Name | Age |
---|---|---|---|---|
1 | 101 | 201 | John | 28 |
2 | 102 | 202 | Mary | 31 |
2 | 103 | 203 | Jane | 32 |
3 | 103 | 203 | Jack | 33 |
3 | 104 | 205 | Jill | 34 |
In this example, the ages are different. The Euclidean distance is 22.36, showing that the age columns is not identical.