ISOMAP algorithm for Nonlinear Dimensionality Reduction

By | June 3, 2023

Elaboration of ISOMAP steps

Step 1 – Neighborhood Graph Construction:

the Neighborhood Graph Construction. I’ll illustrate with an example and break it down with a mathematical interpretation.

Consider a dataset with five 2D data points: A = (1, 1), B = (2, 2), C = (5, 5), D = (6, 7), and E = (8, 8).

We would start by creating a 5×5 matrix of Euclidean distances, which we’ll call “dist_matrix”. This matrix would be defined such that the element in the i-th row and j-th column is the Euclidean distance between the i-th and j-th data points. The Euclidean distance between two points (x1, y1) and (x2, y2) is given by the formula sqrt((x2 – x1)^2 + (y2 – y1)^2).

For example, the distance between A and B is sqrt((2-1)^2 + (2-1)^2) = sqrt(2), so the element in the first row and second column of “dist_matrix” would be sqrt(2). You would do this for every pair of points to fill out the matrix.

Next, you would choose a radius, r, for your neighborhoods. If you’re using the “k nearest neighbors” variant of ISOMAP, you would instead choose a number of neighbors, k. For simplicity, let’s say you’re using the radius variant and choose r = 3.

You would then create a 5×5 adjacency matrix “adj_matrix”, where the element in the i-th row and j-th column is 1 if the distance between the i-th and j-th points is less than or equal to r (meaning they’re “neighbors”), and 0 otherwise.

In our case, A and B would be neighbors (since sqrt(2) <= 3), but A and C would not be (since their distance is greater than 3). You’d do this for every pair of points to fill out “adj_matrix”.

Finally, you would create a 5×5 weight matrix “weight_matrix” that is defined such that the element in the i-th row and j-th column is the Euclidean distance between the i-th and j-th points if they’re neighbors (i.e., if the corresponding element in “adj_matrix” is 1), and infinity otherwise. This represents the “cost” of the “edge” connecting the i-th and j-th points in the graph.

The resulting “weight_matrix” serves as a representation of the neighborhood graph in matrix form. Note that all of the above matrices would be symmetric, since the distance from point i to point j is the same as the distance from point j to point i.

This entire process can be summarized mathematically as follows. Let “dist” denote the Euclidean distance function, let X_i and X_j denote the i-th and j-th data points, and let I(x) be the indicator function that returns 1 if the condition x is true and 0 otherwise. Then the elements of “weight_matrix” are defined by:

weight_matrix[i, j] = I(dist(X_i, X_j) <= r) * dist(X_i, X_j) + I(dist(X_i, X_j) > r) * infinity

The interpretation of this is: if points X_i and X_j are close enough (i.e., their distance is less than or equal to r), then the cost of the edge connecting them is their Euclidean distance; otherwise, the cost is infinity (representing the fact that there is no edge between them).

Leave a Reply

Your email address will not be published. Required fields are marked *