Molecules are not rigid. On the contrary, they are highly flexible objects, capable of changing shape dramatically through the rotation of dihedral angles. We need a measure to express how much a molecule changes going from one conformation to another, or alternatively, how different two conformations are from each other. Each distinct shape of a given molecule is called a conformation. Although one could conceivably compute the volume of the intersection of the alpha shapes for two conformations (see Molecular Shapes and Surfaces for an explanation of alpha shapes) to measure the shape change, this is prohibitively computationally expensive. Simpler measures of distance between conformations have been defined, based on variables such as the Cartesian coordinates for each atom, or the bond and torsion angles within the molecule. When working with Cartesian coordinates, one can represent a molecular conformation as a vector whose components are the Cartesian coordinates of the molecule’s atoms. Therefore, a conformation for a molecule with N atoms can be represented as a 3N-dimensional vector of real numbers.
RMSD and lRMSD
One of the most widely accepted difference measures for conformations of a molecule is least root mean square deviation (lRMSD). To calculate the RMSD of a pair of structures (say x and y), each structure must be represented as a 3N-length (assuming N atoms) vector of coordinates. The RMSD is the square root of the average of the squared distances between corresponding atoms of x and y. It is a measure of the average atomic displacement between the two conformations:
However, when molecular conformations are sampled from molecular dynamics or other forms of sampling, it is often the case that the molecule drifts away from the origin and rotates in an arbitrary way. The lRMSD distance aims at compensating for these facts by representing the minimum RMSD over all possible relative positions and orientations of the two conformations under consideration. Calculating the lRMSD consists of first finding an optimal alignment of the two structures, and then calculating their RMSD. Note that aligning two conformations may require both a translation and rotation. In other words, before computing the RMSD distance, it is necessary to remove the translation of the centroid of both conformations and to perform an “optimal alignment” or “optimal rotation” of them, since these two factors artificially increase the RMSD distance between them.
Finding the optimal rotation to minimize the RMSD between two point sets is a well-studied problem, and several algorithms exist. The Kabsch Algorithm , which is implemented in several molecular modeling packages, solves a matrix equation for the three dimensional rotation matrix corresponding to the optimal rotation. An alternative approach, discussed in detail after the matrix method, uses a compact representation of rotational transformations called quaternions . Quaternions are currently the preferred representation for global rotation in calculating lRMSD, since they require less numbers to be stored and are easy to re-normalize. In contrast, re-normalization of orthonormal matrices is quite expensive and potentially numerically unstable. Both quaternions and their application to global alignment of conformations will be presented after the next section.
Optimal Alignment for lRMSD Using Rotation Matrices
This section presents a method for computing the optimal rotation between 2 datasets as an orthonormal rotation matrix. As stated earlier, this approach is slightly more numerically unstable (since guaranteeing the orthonormality of a matrix is harder than the unit length of a quaternion) and requires taking care of the special case when the resulting matrix may not be a proper rotation, as discussed below.
As stated earlier, the optimal alignment requires both a translation and a rotation. The translational part of the alignment is easy to calculate. It can be proven that the optimal alignment is obtained by translating one set so that its centroid coincides with the other set’s centroid (see section 2-C of  for proof). The centroid of a point set a is simply the average position of all its points:
We can then redefine each point in two sets A and B as a deviation from the centroid:
Given this notation relative to the centroid, we can explicitly set the centroids to be equal and proceed with the rotational part of the alignment.
One of the first references to the solution of this problem in matrix form is from Kabsch . The Kabsch method uses Lagrange multipliers to solve a minimization problem to find the optimal rotation. Here, we present a slightly more intuitive method based on matrix algebra and properties, that achieves the same result. This formulation can be found in  and . Imagine we wish to align two conformations composed of N atoms each, whose Cartesian coordinates are given by the vectors x and y. The main idea behind this approach is to find a 3×3 orthonormal matrix U such that the application of U to the atom positions of one of the data vectors, x, aligns it as best as possible with the other data vector, y, in the sense that the quantity to minimize is the distance d(Ux,y), where x and y are assumed to be centered, that is, both their centroids coincide at the origin (centering both conformations is the first step). Mathematically, this problem can be stated as the minimization of the following quantity:
When E is a minimum, the square root of its value becomes the least RMSD (lRMSD) between x and y. Being an orthonormal rotation matrix, U needs to satisfy the orthonormality property UUT=I , where I is the identity matrix. The orthonormality contraint ensures that the rows and columns are mutually orthogonal, and that their length (as vectors) is one. Any orthonormal matrix represents a rigid orientation (transformation) in space. The only problem with this approach as is, is that all orthonormal matrices encode a rigid transformation, but if the rows/columns of the matrix do not constitute a right handed system, then the rotation is said to be improper. In an improper rotation, one of the three directions may be “mirrored”. Fortunately, this case can be detected easily by computing the determinant of the matrix U, and if it is negative, correcting the matrix. Denoting Ux as x’, and moving the constant factor N to the left, the formula for the error becomes:
An alternative way to represent the two point sets, rather than a one-dimensional vector or as separate atom coordinates, is using two 3xN matrices (N atoms, 3 coordinates for each). Using this scheme, x is represented by the matrix X and y is represented by the matrix Y. Note that column 1≤i≤N in these matrices stands for point (atom) xi and yi, respectively. Using this new representation, we can write:
where X’=UX and Tr(A) stands for the trace of matrix A, the sum of its diagonal elements. It is easy to see that that the trace of the matrix to the right amounts precisely to the sum on the left (simply carrying out the multiplication of the first row/column should convince the reader). The right-hand side of the equation can be expanded into:
Which follows from the properties of the trace operator, namely: Tr(A+B)=Tr(A)+Tr(B), Tr(AB)=Tr(BA), Tr(AT)=Tr(A), and Tr(kA)=kTr(A). Furthermore, the first two terms in the expansion above represent the sum of the squares of the components xi and yi, so it can be rewritten as:
Note that the x components do not need to be primed (i.e., x’) since the rotation U around the origin does not change the length of xi. Note that the summation above does not depend on U, so minimizing E is equivalent to maximizing Tr(YTX’). For this reason, the rest of the discussion focuses on finding a proper rotation matrix U that maximizes Tr(YTX’). Remembering that X’=UX, the quantity to maximize is then Tr((YTU)X). From the property of the trace operator, this is equivalent to Tr((XYT)U). Since XYT is a square 3×3 matrix, it can be decomposed through the Singular Value Decomposition technique (SVD) into XYT=VSWT, where V and WT are the matrices of left and right eigenvectors (which are orthonormal matrices), respectively, and S is a diagonal 3×3 matrix containing the eigenvalues s1, s2, s3 in decreasing order. Again from the properties of the trace operator, we obtain that:
If we introduce the 3×3 matrix T as the product T=WTUV , we can rewrite the above expression as:
Since T is the product of orthonormal matrices, it is itself an orthonormal matrix and det(T)=+/-1. This means that the absolute value of each element of this matrix is no more than one, from where the last equality follows. It is obvious that the maximum value of the left hand side of the equation is reached when the diagonal elements of T are equal to 1, and since it is an orthonormal matrix, all other elements must be zero. This results in T=I. Moreover, since T=WTUV , we can write that WTUV=I, and because W and V are orthonormal, WWT=I and VVT=I. Multiplying WTUV by W to the left and VT to the right yields a solution for U:
Where V and WT are the matrices of left and right eigenvectors, respectively, of the covariance matrix C=XYT. This formula ensures that U is orthonormal (the reader should carry out the high-level matrix multiplication and verify this fact).
The only remaining detail to take care of is to make sure that U is a proper rotation, as discussed before. It could indeed happen that det(U)=-1 if its rows/columns do not make up a right-handed system. When this happens, we need to compromise between two goals: maximizing Tr(YTX’) and respecting the constraint that det(U)=+1. Therefore, we need to settle for the second largest value of Tr(YTX’). It is easy to see what the second largest value is; since:
then the second largest value occurs when T11=T22=+1 and T33=-1. Now, we have that T cannot be the identity matrix as before, but instead it has the lower-right corner set to -1. Now we finally have a unified way to represent the solution. If det(C)>0, T is the identity; otherwise, it has a -1 as its last element. Finally, these facts can be expressed in a single formula for the optimal rotation U by stating:
where d=sign(det(C)). In the light of the preceding derivation, all the facts that have been presented as a proof can be succinctly put as an algorithm for computing the optimal rotation to align two data sets x and y:
Build the 3xN matrices X and Y containing, for the sets x and y respectively, the coordinates for each of the N atoms after centering the atoms by subtracting the centroids.
Compute the covariance matrix C=XYT
Compute the SVD (Singular Value Decomposition) of C=VSWT
Compute the optimal rotation U as
Optimal Alignment for lRMSD Using Quaternions
Another way of solving the optimal rotation for the purposes of computing the lRMSD between two conformations is to use quaternions. These provide a very compact way of representing rotations (only 4 numbers as compared to 9 or 16 for a rotation matrix) and are extremely easy to normalize after performing operations on them. Next, a general introduction to quaternions is given, and then they will be used to compute the optimal rotation between two point sets.
Introduction to Quaternions
Quaternions are an extension of complex numbers. Recall that complex numbers are numbers of the form a + bi, where a and b are real numbers and i is the canonical imaginary number, equal to the square root of -1. Quaternions add two more imaginary numbers, j and k. These numbers are related by the set of equalities in the following figure:
These equalities give rise to some unusual properties, especially with respect to multiplication.
Given this definition of i, j, and k, we can now define a quaternion.
Based on the definitions of i, j and k, we can also derive rules for addition and multiplication of quaternions. Assume we have two quaternions, p and q, defined as follows:
Addition of p and q is fairly intuitive:
The dot product and magnitude of a quaternion also closely resemble those operations for vectors. Note that a unit quaternion is a quaternion with magnitude 1 under this definition:
Multiplication, however, is not, due to the definitions of i, j, and k:
Quaternion multiplication also has two equivalent matrix forms which will become relevant later in the derivation of the alignment method:
These useful properties of quaternion multiplication can be derived easily using the matrix form for multiplication, or they can be proved by carrying out the products:
Quaternions and Three-Dimensional Rotations
A number of different methods exist for denoting rotations of rigid objects in three-dimensional space. These are introduced in a module on protein kinematics. Unit quaternions represent a rotation of an angle around an arbitrary axis. A rotation by the angle theta about an axis represented by the unit vector v = [x, y, z] is represented by a unit quaternion:
Like rotation matrices, quaternions may be composed with each other via multiplication. The major advantage of the quaternion representation is that it is more robust to numerical instability than orthonormal matrices. Numerical instability results from the fact that, because computers use a finite number of bits to represent real numbers, most real numbers are actually represented by the nearest number the computer is capable of representing. Over a series of floating point operations, the error caused by this inexact representation accumulates, quite rapidly in the case of repeated multiplications and divisions. In manipulating orthonormal transformation matrices, this can result in matrices that are no longer orthonormal, and therefore not valid rigid transformations. Finding the “nearest” orthonormal matrix to an arbitrary matrix is not a well-defined problem. Unit-length quaternions can accumulate the same kind of a numerical error as rotation matrices, but in the case of quaternions, finding the nearest unit-length quaternion to an arbitrary quaternion is well defined. Additionally, because quaternions correspond more directly to the axis-angle representation of three-dimensional rotations, it could be argued that they have a more intuitive interpretation than rotation matrices. Quaternions, with four parameters, are also more memory efficient than 3×3 matrices. For all of these reasons, quaternions are currently the preferred representation for three-dimensional rotations in most modeling applications.
Vectors can be represented as purely imaginary quaternions, that is, quaternions whose scalar component is 0. The quaternion corresponding to the vector v = [x, y, z] is q = xi + yj + zk.
We can perform rotation of a vector in quaternion notation as follows:
A quaternion rotation can be algebraically manipulated into a quaternion-derived rotation matrix. By simplifying the quaternion multiplications q p q*, they can be rewritten as a rotation matrix given an axis–angle representation:
where s and c are shorthand for sin θ and cos θ, respectively. So to get the rotation of a vector p about an arbitrary axis we get
Although care should be taken (due to degeneracy as the quaternion approaches the identity quaternion (1) or the sine of the angle approaches zero) the axis and angle can be extracted via:
Note that the θ equality holds only when qr is non-negative.
Alternatively, the rotation matrix can be expressed as
As with other schemes to apply rotations, the centre of rotation must be translated to the origin before the rotation is applied and translated back to its original position afterwards.
Optimal Alignment with Quaternions
The method presented here is from Berthold K. P. Holm, “Closed-form solution of absolute orientation using unit quaternions.” Journal of the Optical Society of America A, 4:629-642.
The alignment problem may be stated as follows:
We have two sets of points (atoms) A and B for which we wish to find an optimal alignment, defined as the alignment for which the root mean square difference between each point in A and its corresponding point in B is minimized.
We know which point in A corresponds to which point in B. This is necessary for any RMSD-based method.
As for the case of rotation matrices, the translational part of the alignment consists of making the centroids of the two data sets coincide. To find the optimal rotation using quaternions, recall that the dot product of two vectors is maximized when the vectors are in the same direction. The same is true when the vectors are represented as quaternions. Using this property, we can define a quantity that we want to maximize (proof here):
Equivalently, using the last property from the section “Introduction to quaternions”, we get:
Now, recall that quaternion multiplication can be represented by matrices, and that the quaterions a and b have a 0 real component:
Using these matrices, we can derive a new form for the objective function:
The quaternion that maximizes this product is the eigenvector of N that corresponds to its most positive eigenvalue (proof here). The eigenvalues can be found by solving the following equation, which is quartic in lambda:
This quartic equation can be solved by a number of standard approaches. Finally, given the maximum eigenvalue lambda-max, the quaternion corresponding to the optimal rotation is the eigenvector v:
A closed-form solution to this equation for v can be found by applying techniques from linear algebra. One possible algorithm, based on constructing a matrix of cofactors, is presented in appendix A5 of the source paper .
In summary, the alignment algorithm works as follows:
Recalculate atom coordinates as displacements from the centroid of each molecule. The optimal translation superimposes the centroids.
Construct the matrix N based on matrices A and B for each atom.
Find the maximum eigenvalue by solving the quartic eigenvalue equation.
Find the eigenvector corresponding to this eigenvalue. This vector is the quaternion corresponding to the optimal rotation.
This method appears computationally intensive, but has the major advantage over other approaches of being a closed-form, unique solution.
Intramolecular Distance and Related Measures
RMSD and lRMSD are not ideally suited for all applications. For example, consider the case of a given conformation A, and a set S of other conformations generated by some means. The goal is to estimate which conformations in S are closest in potential energy to A, making the assumption that they will be the conformations most structurally similar to A. The lRMSD measure will find the conformations in which the overall average atomic displacement is least. The problem is that if the quantity of interest is the potential energy of conformations, not all atoms can be treated equally. Those on the outside of the protein can often move a fair amount without dramatically affecting the energy. In contrast, the core of the molecule tends to be more compact, and therefore a slight change in the relative positions of a pair of atoms could lead to overlap of the atoms, and therefore a completely infeasible structure and high potential energy. A class of distance measures and pseudo-measures based on intramolecular distances have been developed to address this shortcoming of RMSD-based measures.
Assume we wish to compare two conformations P and Q of a molecule with N atoms. Let pij be the distance between atom i and atom j in conformation P, and let qij be the same distance for conformation Q. Then the intramolecular distance is defined as
One of the main computational advantages of this class of approaches is that we do not have to compute the alignment between P and Q. On the other hand, for this metric we need to sum over a quadratic number of terms, whereas for RMSD the number of terms is linear in the number of atoms. Approximations can be made to speed up this computation, as shown in . Also, the intramolecular distance measure given above, which is sometimes referred to as the dRMSD, is subject to the problem that pairs of atoms most distant from each other are the ones that contribute the greatest amount to their measured difference.
An interesting open problem is to come up with physically meaningful molecular distance metric that allows for fast nearest neighbor computations. This can be useful for, for example, clustering conformations. One proposed method is the contact distance. Contact distance requires constructing a contact map matrix for each conformation indicating which pairs of atoms are less than some threshold separation. The distance measure is then a measure of the difference of the contact maps.
Other distance measures attempt to weight each pair in the dRMSD based on how close the atoms are, with closer pairs given more weight, in keeping with the intuition that small changes in the relative positions of nearby atoms are more likely to result in collisions. One such measure is the normalized Holm and Sander Score.
This score is technically a pseudo-measure rather than a measure because it does not necessarily obey the triangle inequality.
The definition of distance measures remains an open problem. For reference on ongoing work, see articles that compare several methods, such as .
Recommended Reading: The first two papers are the original descriptions of the Kabsch Algorithm, and use rotations represented as orthonormal matrices to find the correct rotational transformation. Many software packages use this alignment method. The third and fourth papers use quaternions. The alignment method presented in the previous section comes from the third paper:
If you want a VPN option that has over 10,000 servers and has been created by students then VPN Gate will be your cup of tea. Branding itself as an academic experiment, it works on Windows, Mac, iPhone, iPad and Android, with some of the best results coming when using the L2TP/IPsec option.
Using L2TP/IPsec is the fastest way to connect with VPN Gate as it doesn’t require any extra software to be installed. Unfortunately it’s slightly tricky if you’ve never used it before, though thankfully the makers have provided an easy guide for every platform. Once you’re over that hump it’s plain sailing with speeds that competitors struggle to match.
L2TP/IPsec VPN Client is built-in on Windows, Mac, iOS and Android. It is easier to configure than using OpenVPN. L2TP/IPsec VPN is recommended before you try to use OpenVPN. However, some networks or firewalls block L2TP/IPsec packets. If L2TP/IPsec fails, try OpenVPN.
VPN options don’t come any easier than TunnelBear with a friendly website and interface that will appeal to many users. Getting this VPN is as simple as downloading the relevant app for PC, Mac, iOS or Android and following the on-screen instructions.
TunnelBear is split into different options depending on whether you want to pay. In our case, the free ‘Little’ option gives you 500MB of data free every month and if you tweet them every month you get another 1GB. Take special care to create a ‘bear themed’ tweet!
Privacy options are well covered. By turning on ‘Maul Trackers’ you can take a peek through the trackers that TunnelBear has tackled in order to protect your privacy. For ease of use and a VPN you can truly trust, TunnelBear is up there among the best in the paid-for category. The fact it’s also available for free makes it even more attractive.
Privacy is one of the primary concerns for anyone deciding to go for a VPN and SurfEasy’s product offers some of the best safety features out there. The Canadian company prides itself on bank-grade encryption to keep your data away from any prying eyes, and that’s alongside the private network that allows you to spoof your IP to look like it’s from various locations across the globe.
SurfEasy is very similar to TunnelBear in that it offers the same 500MB limit for free and will then charge you for anything over that. You can complete tasks or refer your friends to earn extra MBs. The speed is what you would expect from your regular connection and for light monthly use this is a more than adequate option.
Finding a VPN that doesn’t limit the amount of data you can use isn’t difficult and Hotspot Shield shows that choosing one of the 100% free optionsdoesn’t mean you’re scrimping on the privacy side of things. Even though it has a lot of popups during installation and some ads, Hotspot Shield is worth its place on this list.
It offers the usual array of countries such as the US, UK, Canada, Japan and a handful of others, and works on Windows, Mac, iOS and Android. You can upgrade to Hotspot Shield Elite but it’s by no means a prerequisite for using this as a VPN, as is the case with some of those limited by data.
Another of the 500MB limit brigade, Private Tunnel works on the same ‘refer a friend’ principle for those that want to get more data without having to shell out for one of the (affordable) paid-for options on the table. The interface of PrivateTunnel, although it looks simple at first, actually offers a higher level of statistical analysis compared to some of the others, with a detailed list of how you used up your precious month’s worth of data.
Again it works on all the major platforms (Windows, OS X, iOS and Android) and the speeds experienced are almost on par with the regular internet connection on your PC.
With unlimited bandwidth and traffic, CyberGhost stands out from many of the other VPNs due to the fact it doesn’t throttle you to just 500MB of usage per month. The free plan is offered without charge across Windows, Mac and Android, but there are a number of expected limitationsthanks to it costing nothing. This manifests itself in the ‘free user slots’ employed specifically for anyone not paying for the service.
When we were online the queue was at over 2,200 users. It took just over a minute for that number to reach zero and then connect. By default it will connect you to a random location across the world (we got Romania) in order to spoof your IP. You can choose the location by clicking the relevant options inside the program. For a free program it offers a surprisingly high amount of free servers, which is very commendable.
Branding yourself as something with ‘TV’ on the end seems very appropriate in the VPN game and ZoogTV promotes itself as the place to access TV streaming services anywhere. With testimonials calling them, and we quote, “absolute legends”, they certainly have some backing.
It actually works in a slightly similar manner to a certain facet of VPN Gate in that you must install TunnelBlick to be able to use it. Once installed it offers free VPNs in the UK, Canada and US, and it is very easy to connect or disconnect when you need to protect your browsing. There’s even a high 2GB data transfer limit per month on one device. There are plenty of subscription options for those that want to pay but the free ZoogTV option is one of the more generous ones out there right now.
This loving VPN makes some bold claims from the outset such as being ‘free for lifetime’ and both ‘fast and secure’. These statements both ring true but it is one of the hardest VPNs on this entire list to actually configure, though that said, there’s no doubting that in the end it is worth it.
Even though it is tricky to set up, there’s a guide on the SecurityKiss website to get you through the process and once you’re done there are plenty of servers to choose from. You can either use Tunnelblick to connect or the L2TP/IPsec method mentioned above, which works on iOS and Android as well as Windows or Mac. All in all it’s a good program just so long as you don’t mind the learning curve.