(See original post: http://cnx.org/contents/HV-RsdwL@23/Molecular-Distance-Measures)
Comparing Molecular Conformations
Molecules are not rigid. On the contrary, they are highly flexible objects, capable of changing shape dramatically through the rotation of dihedral angles. We need a measure to express how much a molecule changes going from one conformation to another, or alternatively, how different two conformations are from each other. Each distinct shape of a given molecule is called a conformation. Although one could conceivably compute the volume of the intersection of the alpha shapes for two conformations (see Molecular Shapes and Surfaces for an explanation of alpha shapes) to measure the shape change, this is prohibitively computationally expensive. Simpler measures of distance between conformations have been defined, based on variables such as the Cartesian coordinates for each atom, or the bond and torsion angles within the molecule. When working with Cartesian coordinates, one can represent a molecular conformation as a vector whose components are the Cartesian coordinates of the molecule’s atoms. Therefore, a conformation for a molecule with N atoms can be represented as a 3N-dimensional vector of real numbers.
RMSD and lRMSD
One of the most widely accepted difference measures for conformations of a molecule is least root mean square deviation (lRMSD). To calculate the RMSD of a pair of structures (say x and y), each structure must be represented as a 3N-length (assuming N atoms) vector of coordinates. The RMSD is the square root of the average of the squared distances between corresponding atoms of x and y. It is a measure of the average atomic displacement between the two conformations:
However, when molecular conformations are sampled from molecular dynamics or other forms of sampling, it is often the case that the molecule drifts away from the origin and rotates in an arbitrary way. The lRMSD distance aims at compensating for these facts by representing the minimum RMSD over all possible relative positions and orientations of the two conformations under consideration. Calculating the lRMSD consists of first finding an optimal alignment of the two structures, and then calculating their RMSD. Note that aligning two conformations may require both a translation and rotation. In other words, before computing the RMSD distance, it is necessary to remove the translation of the centroid of both conformations and to perform an “optimal alignment” or “optimal rotation” of them, since these two factors artificially increase the RMSD distance between them.
Finding the optimal rotation to minimize the RMSD between two point sets is a well-studied problem, and several algorithms exist. The Kabsch Algorithm , which is implemented in several molecular modeling packages, solves a matrix equation for the three dimensional rotation matrix corresponding to the optimal rotation. An alternative approach, discussed in detail after the matrix method, uses a compact representation of rotational transformations called quaternions . Quaternions are currently the preferred representation for global rotation in calculating lRMSD, since they require less numbers to be stored and are easy to re-normalize. In contrast, re-normalization of orthonormal matrices is quite expensive and potentially numerically unstable. Both quaternions and their application to global alignment of conformations will be presented after the next section.
Optimal Alignment for lRMSD Using Rotation Matrices
This section presents a method for computing the optimal rotation between 2 datasets as an orthonormal rotation matrix. As stated earlier, this approach is slightly more numerically unstable (since guaranteeing the orthonormality of a matrix is harder than the unit length of a quaternion) and requires taking care of the special case when the resulting matrix may not be a proper rotation, as discussed below.
As stated earlier, the optimal alignment requires both a translation and a rotation. The translational part of the alignment is easy to calculate. It can be proven that the optimal alignment is obtained by translating one set so that its centroid coincides with the other set’s centroid (see section 2-C of  for proof). The centroid of a point set a is simply the average position of all its points:
We can then redefine each point in two sets A and B as a deviation from the centroid:
Given this notation relative to the centroid, we can explicitly set the centroids to be equal and proceed with the rotational part of the alignment.
One of the first references to the solution of this problem in matrix form is from Kabsch . The Kabsch method uses Lagrange multipliers to solve a minimization problem to find the optimal rotation. Here, we present a slightly more intuitive method based on matrix algebra and properties, that achieves the same result. This formulation can be found in  and . Imagine we wish to align two conformations composed of N atoms each, whose Cartesian coordinates are given by the vectors x and y. The main idea behind this approach is to find a 3×3 orthonormal matrix U such that the application of U to the atom positions of one of the data vectors, x, aligns it as best as possible with the other data vector, y, in the sense that the quantity to minimize is the distance d(Ux,y), where x and y are assumed to be centered, that is, both their centroids coincide at the origin (centering both conformations is the first step). Mathematically, this problem can be stated as the minimization of the following quantity:
When E is a minimum, the square root of its value becomes the least RMSD (lRMSD) between x and y. Being an orthonormal rotation matrix, U needs to satisfy the orthonormality property UUT=I , where I is the identity matrix. The orthonormality contraint ensures that the rows and columns are mutually orthogonal, and that their length (as vectors) is one. Any orthonormal matrix represents a rigid orientation (transformation) in space. The only problem with this approach as is, is that all orthonormal matrices encode a rigid transformation, but if the rows/columns of the matrix do not constitute a right handed system, then the rotation is said to be improper. In an improper rotation, one of the three directions may be “mirrored”. Fortunately, this case can be detected easily by computing the determinant of the matrix U, and if it is negative, correcting the matrix. Denoting Ux as x’, and moving the constant factor N to the left, the formula for the error becomes:
An alternative way to represent the two point sets, rather than a one-dimensional vector or as separate atom coordinates, is using two 3xN matrices (N atoms, 3 coordinates for each). Using this scheme, x is represented by the matrix X and y is represented by the matrix Y. Note that column 1≤i≤N in these matrices stands for point (atom) xi and yi, respectively. Using this new representation, we can write:
where X’=UX and Tr(A) stands for the trace of matrix A, the sum of its diagonal elements. It is easy to see that that the trace of the matrix to the right amounts precisely to the sum on the left (simply carrying out the multiplication of the first row/column should convince the reader). The right-hand side of the equation can be expanded into:
Which follows from the properties of the trace operator, namely: Tr(A+B)=Tr(A)+Tr(B), Tr(AB)=Tr(BA), Tr(AT)=Tr(A), and Tr(kA)=kTr(A). Furthermore, the first two terms in the expansion above represent the sum of the squares of the components xi and yi, so it can be rewritten as:
Note that the x components do not need to be primed (i.e., x’) since the rotation U around the origin does not change the length of xi. Note that the summation above does not depend on U, so minimizing E is equivalent to maximizing Tr(YTX’). For this reason, the rest of the discussion focuses on finding a proper rotation matrix U that maximizes Tr(YTX’). Remembering that X’=UX, the quantity to maximize is then Tr((YTU)X). From the property of the trace operator, this is equivalent to Tr((XYT)U). Since XYT is a square 3×3 matrix, it can be decomposed through the Singular Value Decomposition technique (SVD) into XYT=VSWT, where V and WT are the matrices of left and right eigenvectors (which are orthonormal matrices), respectively, and S is a diagonal 3×3 matrix containing the eigenvalues s1, s2, s3 in decreasing order. Again from the properties of the trace operator, we obtain that:
If we introduce the 3×3 matrix T as the product T=WTUV , we can rewrite the above expression as:
Since T is the product of orthonormal matrices, it is itself an orthonormal matrix and det(T)=+/-1. This means that the absolute value of each element of this matrix is no more than one, from where the last equality follows. It is obvious that the maximum value of the left hand side of the equation is reached when the diagonal elements of T are equal to 1, and since it is an orthonormal matrix, all other elements must be zero. This results in T=I. Moreover, since T=WTUV , we can write that WTUV=I, and because W and V are orthonormal, WWT=I and VVT=I. Multiplying WTUV by W to the left and VT to the right yields a solution for U:
Where V and WT are the matrices of left and right eigenvectors, respectively, of the covariance matrix C=XYT. This formula ensures that U is orthonormal (the reader should carry out the high-level matrix multiplication and verify this fact).
The only remaining detail to take care of is to make sure that U is a proper rotation, as discussed before. It could indeed happen that det(U)=-1 if its rows/columns do not make up a right-handed system. When this happens, we need to compromise between two goals: maximizing Tr(YTX’) and respecting the constraint that det(U)=+1. Therefore, we need to settle for the second largest value of Tr(YTX’). It is easy to see what the second largest value is; since:
then the second largest value occurs when T11=T22=+1 and T33=-1. Now, we have that T cannot be the identity matrix as before, but instead it has the lower-right corner set to -1. Now we finally have a unified way to represent the solution. If det(C)>0, T is the identity; otherwise, it has a -1 as its last element. Finally, these facts can be expressed in a single formula for the optimal rotation U by stating:
where d=sign(det(C)). In the light of the preceding derivation, all the facts that have been presented as a proof can be succinctly put as an algorithm for computing the optimal rotation to align two data sets x and y:
Optimal Alignment for lRMSD Using Quaternions
Another way of solving the optimal rotation for the purposes of computing the lRMSD between two conformations is to use quaternions. These provide a very compact way of representing rotations (only 4 numbers as compared to 9 or 16 for a rotation matrix) and are extremely easy to normalize after performing operations on them. Next, a general introduction to quaternions is given, and then they will be used to compute the optimal rotation between two point sets.
Introduction to Quaternions
Quaternions are an extension of complex numbers. Recall that complex numbers are numbers of the form a + bi, where a and b are real numbers and i is the canonical imaginary number, equal to the square root of -1. Quaternions add two more imaginary numbers, j and k. These numbers are related by the set of equalities in the following figure:
These equalities give rise to some unusual properties, especially with respect to multiplication.
Given this definition of i, j, and k, we can now define a quaternion.
Based on the definitions of i, j and k, we can also derive rules for addition and multiplication of quaternions. Assume we have two quaternions, p and q, defined as follows:
Addition of p and q is fairly intuitive:
The dot product and magnitude of a quaternion also closely resemble those operations for vectors. Note that a unit quaternion is a quaternion with magnitude 1 under this definition:
Multiplication, however, is not, due to the definitions of i, j, and k:
Quaternion multiplication also has two equivalent matrix forms which will become relevant later in the derivation of the alignment method:
These useful properties of quaternion multiplication can be derived easily using the matrix form for multiplication, or they can be proved by carrying out the products:
Quaternions and Three-Dimensional Rotations
A number of different methods exist for denoting rotations of rigid objects in three-dimensional space. These are introduced in a module on protein kinematics. Unit quaternions represent a rotation of an angle around an arbitrary axis. A rotation by the angle theta about an axis represented by the unit vector v = [x, y, z] is represented by a unit quaternion:
Like rotation matrices, quaternions may be composed with each other via multiplication. The major advantage of the quaternion representation is that it is more robust to numerical instability than orthonormal matrices. Numerical instability results from the fact that, because computers use a finite number of bits to represent real numbers, most real numbers are actually represented by the nearest number the computer is capable of representing. Over a series of floating point operations, the error caused by this inexact representation accumulates, quite rapidly in the case of repeated multiplications and divisions. In manipulating orthonormal transformation matrices, this can result in matrices that are no longer orthonormal, and therefore not valid rigid transformations. Finding the “nearest” orthonormal matrix to an arbitrary matrix is not a well-defined problem. Unit-length quaternions can accumulate the same kind of a numerical error as rotation matrices, but in the case of quaternions, finding the nearest unit-length quaternion to an arbitrary quaternion is well defined. Additionally, because quaternions correspond more directly to the axis-angle representation of three-dimensional rotations, it could be argued that they have a more intuitive interpretation than rotation matrices. Quaternions, with four parameters, are also more memory efficient than 3×3 matrices. For all of these reasons, quaternions are currently the preferred representation for three-dimensional rotations in most modeling applications.
Vectors can be represented as purely imaginary quaternions, that is, quaternions whose scalar component is 0. The quaternion corresponding to the vector v = [x, y, z] is q = xi + yj + zk.
We can perform rotation of a vector in quaternion notation as follows:
Quaternion-derived rotation matrix (From Wiki )
A quaternion rotation can be algebraically manipulated into a quaternion-derived rotation matrix. By simplifying the quaternion multiplications q p q*, they can be rewritten as a rotation matrix given an axis–angle representation:
where s and c are shorthand for sin θ and cos θ, respectively. So to get the rotation of a vector p about an arbitrary axis we get
Although care should be taken (due to degeneracy as the quaternion approaches the identity quaternion (1) or the sine of the angle approaches zero) the axis and angle can be extracted via:
Note that the θ equality holds only when qr is non-negative.
Alternatively, the rotation matrix can be expressed as
As with other schemes to apply rotations, the centre of rotation must be translated to the origin before the rotation is applied and translated back to its original position afterwards.
Optimal Alignment with Quaternions
The method presented here is from Berthold K. P. Holm, “Closed-form solution of absolute orientation using unit quaternions.” Journal of the Optical Society of America A, 4:629-642.
The alignment problem may be stated as follows:
As for the case of rotation matrices, the translational part of the alignment consists of making the centroids of the two data sets coincide. To find the optimal rotation using quaternions, recall that the dot product of two vectors is maximized when the vectors are in the same direction. The same is true when the vectors are represented as quaternions. Using this property, we can define a quantity that we want to maximize (proof here):
Equivalently, using the last property from the section “Introduction to quaternions”, we get:
Now, recall that quaternion multiplication can be represented by matrices, and that the quaterions a and b have a 0 real component:
Using these matrices, we can derive a new form for the objective function:
The quaternion that maximizes this product is the eigenvector of N that corresponds to its most positive eigenvalue (proof here). The eigenvalues can be found by solving the following equation, which is quartic in lambda:
This quartic equation can be solved by a number of standard approaches. Finally, given the maximum eigenvalue lambda-max, the quaternion corresponding to the optimal rotation is the eigenvector v:
A closed-form solution to this equation for v can be found by applying techniques from linear algebra. One possible algorithm, based on constructing a matrix of cofactors, is presented in appendix A5 of the source paper .
In summary, the alignment algorithm works as follows:
This method appears computationally intensive, but has the major advantage over other approaches of being a closed-form, unique solution.
Intramolecular Distance and Related Measures
RMSD and lRMSD are not ideally suited for all applications. For example, consider the case of a given conformation A, and a set S of other conformations generated by some means. The goal is to estimate which conformations in S are closest in potential energy to A, making the assumption that they will be the conformations most structurally similar to A. The lRMSD measure will find the conformations in which the overall average atomic displacement is least. The problem is that if the quantity of interest is the potential energy of conformations, not all atoms can be treated equally. Those on the outside of the protein can often move a fair amount without dramatically affecting the energy. In contrast, the core of the molecule tends to be more compact, and therefore a slight change in the relative positions of a pair of atoms could lead to overlap of the atoms, and therefore a completely infeasible structure and high potential energy. A class of distance measures and pseudo-measures based on intramolecular distances have been developed to address this shortcoming of RMSD-based measures.
Assume we wish to compare two conformations P and Q of a molecule with N atoms. Let pij be the distance between atom i and atom j in conformation P, and let qij be the same distance for conformation Q. Then the intramolecular distance is defined as
One of the main computational advantages of this class of approaches is that we do not have to compute the alignment between P and Q. On the other hand, for this metric we need to sum over a quadratic number of terms, whereas for RMSD the number of terms is linear in the number of atoms. Approximations can be made to speed up this computation, as shown in . Also, the intramolecular distance measure given above, which is sometimes referred to as the dRMSD, is subject to the problem that pairs of atoms most distant from each other are the ones that contribute the greatest amount to their measured difference.
An interesting open problem is to come up with physically meaningful molecular distance metric that allows for fast nearest neighbor computations. This can be useful for, for example, clustering conformations. One proposed method is the contact distance. Contact distance requires constructing a contact map matrix for each conformation indicating which pairs of atoms are less than some threshold separation. The distance measure is then a measure of the difference of the contact maps.
Other distance measures attempt to weight each pair in the dRMSD based on how close the atoms are, with closer pairs given more weight, in keeping with the intuition that small changes in the relative positions of nearby atoms are more likely to result in collisions. One such measure is the normalized Holm and Sander Score.
This score is technically a pseudo-measure rather than a measure because it does not necessarily obey the triangle inequality.
The definition of distance measures remains an open problem. For reference on ongoing work, see articles that compare several methods, such as .
Recommended Reading: The first two papers are the original descriptions of the Kabsch Algorithm, and use rotations represented as orthonormal matrices to find the correct rotational transformation. Many software packages use this alignment method. The third and fourth papers use quaternions. The alignment method presented in the previous section comes from the third paper: