Loopy: functionality and algorithm

Loopy is a program which tries to find likely loops to connect fragments of a partial protein structure based on the expected structure and the density map.
In the next section the algorithm used by Loopy is explained in some detail, to help determine the settings to change to help build the loop, when the defaults fail.

Building loops using structural and density information

Loopy builds the loops in three phases. First, starting from one of the anchor fragment of the loop, a tree of possible traces of CAs is build. Next unlikely branches of the tree (for example branches, or CA-traces) that end far away from the other anchor fragment, or paths that cross themselves, are removed. For the reduced set of possible CA-traces, an attempt is made to determine the missing main-chain atoms. A further reduction of the main-chain traces for the loop is performed, before finally, after building the side chains. The best loop(s) are selected based on the density fit of the loop model. The tree can be build either towards the C-terminus of the N-terminus of the protein, or both. It is advised to let Loopy build the tree in both directions, to obtain the best possible loop for the gap.

Protocol

Job title
Title for the current experiment
Experimental data
Select whether to use a map or an mtz file. In the case of an mtz file, the program will use fft to compute the corresponding map.
Mode for loop building
Select whether to build a single, specific loop in the model, or to try to determine all loops in the model

Files

Input map
(Mode 'map' only) Input map to use
MTZ
(Mode 'mtz' only) Mtz file to use. F and PHI are used to compute the corresponding map using fft. We need to save this file, since we need to reread the map more than once.
Input pdb
Input pdb for your protein. Please, remove residues which you would like to rebuild from this file. This frontend of loopy will not rebuild any residues.
Sequence
(Mode 'Single loop' only) Select whether to type the sequence of the loop of interest, or a pir-file is used to determine the sequence. In the mode that all loops are built, the pir-file is obligotary
Name for first loop pdb
(Mode 'Single loop' only) The name of this file is used as a format to determine the names of the other loops to save
Name for first model of the original pdb plus the loops built
(Mode 'Single loop' only) The name of this file is used as a format to determine the names of the other original pdbs with the built loops inserted
Output file
(Mode 'All loops' only) The filename for the final model of the original pdb plus the loops built

Definition of the loop(s)

In this folder you define the loops(s) to build and the sequence of the protein or loop.
Multiplicity
(Mode 'pir-file' only) Set the number of molecules in the asymmetric unit corresponding to the specified pir-file
Methionine
(Mode 'pir-file' only) Set whether Methionine residues in the specified pir-file should be considered as Seleno-Methionines or not
Use all chains
Default all the chains in the pdb are considered for loop building. Uncheck this box to select the chains in the pdb to consider
Use chains
Select the chains in the pdb to consider, based on the chain ID. The format expected is (A|B|...)
N-term anchor
(Mode 'Single loop' only) Anchor residue of a fragment on the N terminus side of the protein. Note that if you want to rebuild a selection of existing residues, please remove them from the pdb file
C-term anchor
(Mode 'Single loop' only) Anchor residue of a fragment on the C terminus side of the protein. Note that if you want to rebuild a selection of existing residues, please remove them from the pdb file
Loop length
(Mode 'Single loop' only) Number of residues in the loop including the two anchor points Since the time needed by Loopy is approximately exponential with the loop length, values larger than 15 are discouraged
Loop sequence
(Only when no pir-file is used) Sequence of amino acids (one letter code) of the residues in the loop including the two anchor points.
Number of loops
Select the number of loops you'd like the program to save. It might very well be that the number of loops left after pruning is less, than this number. In that case the number of loops saved, will be less than you asked for. If no loops are found at all, twiddle with the parameters, specifically those in the folder "Selecting best CAs"
Build towards C-terminus
If you didn't select to build both ways, you can indicate whether you want to build the tree towards the C terminus of the protein, or towards the N terminus

Setting for generating loops

In this folder you select how the loops are generated.
Maximum loop length
(Mode 'All loops' only) Select the maximum length of loops in the protein structure, that the program should try to build (including anchors!). Note that the time Loopy needs to build loops is approximately exponential with the length of the loop: a small change in this value might have a strong effect on the time span.
Extend loop
(Mode 'All loops' only) If detected loops in the pdb are smaller than this number, the anchors themselves are rebuild as well. It is set default to 5, since smaller gaps in the model can indicate that the anchors wer not properly determined
Override the number of CA's
Default, the number of CA's selected from each generated CA-shell, is set according to the loop length, based on a time/performance test. Sometimes it can be usefull for a single loop to override this value.
Number of CA's
Set the number of CA's selected from each generated CA-shell. Note, that the time loopy takes to find the loop tree is approximately (CA-number) to the power (loop length)
Force the minimum number of CA's
Set the minimum number of CA's selected from each CA-shell. When the density is in some regions of the loop low, it might be especially usefull to set this value, so that the loop might bridge the area
loop alternatives
(Mode 'Single loop' only) Set the number of alternative loops saved to file. This can be usefull when multiple conformations of the protein region are expected

Selecting best loops

In this folder you can set the thresholds used to prune the tree from incorrect loops and the weights used to select the best loops.
Maximum loop length (in A)
The default -1. means that the maximum loop length is simply the CA-CA distance times the CA-CA bonds in the loop. However, if the loop is expected to curve a lot, it might be usefull to set the maximum loop length apprioprately, so that during the build of the tree of loops, the longer loops are already removed from the set.
Deviation distance loop connection
The distance between the end CA of the loop and the connecting CA of the structure should be approximately equal to CA-CA distance. Set the allowed error in the distance.
Threshold density correlation CAs
After pruning on the distance, the next step is to select the best trees based on the density fit of the CAs. This number sets the number of best loops kept based on the density correlation of the CAs only, removing the loops deemed unlikely based on the density fit of the CA-trace. Set the value to -1 to keep the full tree.
Structural threshold
Select a further subset by determining the structure of the end CA of the loop and the connecting quatrapeptide. Set the threshold for the minimal value for the log likelihood of this structure
Minimum for this stage
Set this value, if you want to ensure to keep at least a certain number of loops after pruning on the structure... overruling the structural threshold if necessary
Maximum for this stage
Set this value, if you want to ensure that the number of loops doesn't exceed a certain amount after structural pruning... keeping only those with the highest structural likelihood
Loops kept after building the main-chain atoms
The main-chain atoms of the loops are found by looking for the best plane through successive CAs, based on the density. After building the main-chain atoms for each CA-CA pair, the loops are pruned based on the scores of the planes.
Main-chain density fit
After building loops both towards the N, and the C Term, the loops are sorted to the best density fit of the main-chain atoms (including Cb for non-GLY). This threshold sets the number of best loops kept before the side-chain atoms are determined.

Selecting best CAs

During the building of the tree of possible paths, shells of CAs are generated (see top). In this folder you can set the thresholds etc. which determine how to select the best CAs from all the CAs in one such a shell. Note: generated CAs with a negative density correlation will be removed immediately.
Likelihood threshold
This is the threshold for the log likelihood of a CA to represent the fifth CA of a peptapeptide, based on density correlation, CA-CA distance, and structure.
Weight distance
Weight for the distance likelihood
Weight density
Weight for the likelihood of the density correlation
Weight structure
Weight for the structural likelihood
Structure table to C
Filename for the probability table for the angles and dihedral angles of a pentapeptide in the direction of the C terminus
Structure table to N
Filename for the probability table for the angles and dihedral angles of a pentapeptide in the direction of the N terminus
Minimum distance CA
Measure for the minimal distance between CAs from the same shell. Several CA's with a reasonable likelihood might lie close together, without resulting in a significantly different CA-trace for the loop. For this reason, the CA with the best likelihood of the sub-group is kept.
Maximum number of CAs
Maximum number of CAs from each shell to keep. Note: The CAs kept will all be used as a new suggestion for the current residue in the loop, and thus as a new node in the tree. The number of possible loops generated will expand exponentially with this number.
Force minimum number of CAs
Force a minimum number of CAs in a shell to be kept, even if the likelihood is less than the threshold set. This makes the loop building a bit more flexible in low density areas, or for pentapeptide structures which occur less often.

Generating CAs

This folder describes how the shells of CAs are generated.
Select generation CA shell
Default a shell with a uniform and regular distribution of CAs at exactly CA-CA distance is generated. You can also choose for a uniform and random distribution of the CAs. In that case the shell is generated with a given thickness.
Number of CAs
Number of CAs generated within a shell. In the case of a regular distribution this number is rounded downwards to the closest Fibonacci number.
CA-CA distance
Distance to use between successive CAs.
Shell thickness
(random shell only) Thickness of the generated shell of CAs.
SD CA-CA distance
(random shell only) We assume that the probability for the CA-CA distance is described by a Gaussian. With this value you can set the standard deviation fo the Gaussian function.
Keep CAs with negative density halfway
Due to the structure of a peptide, we expect the density correlation halfway between successive CAs to be positive. A quick first selection of CAs from the shell is thus (apart from the density correlation at the generated CA) based on the density correlation midpoint. Default for this option is false.

Density Handling

In this folder you can set the details of the density handling of the map in detail.
Interpolation method
Choose the interpolation method used to determine the density fit of the loop model. The option 'weighted means' is implemented very efficiently, however, it will give values dependent on the actual mean of the map. The option 'correlation' is slower, but will give a value independent of the mean of the map. The value will be between 0 and 1, where 1 indicates a perfect fit of the model with the density
Atom radius
Radius used to determine the density correlation
B factor
At the moment the values for the b-factor in the pdb are ignored. The value set, will be used for all atoms
Remove atoms by factor
To avoid overlaps between the generated loop and the protein structure in the pdb, atoms in the pdb (apart from dummies, or residues in chains consisting only of main chain atoms) are removed from the map. This is done by flipping the density in the map to negative values at the position of these atoms. With this factor you can set the factor with which the density is changed
Density threshold residues
Threshold for the density correlation of residues after loop building. This is used to check overlap between the loop and possible fragments of main chain atoms in the pdb.
Density threshold dummies
Threshold for the density correlation of dummies after loop building. This is used to check overlap between the loop and possible dummy atoms in the pdb.

Crystal Parameters

The spacegroup name and cell dimensions are extracted from the map/mtz file.

Real-space refinement

The underlying algorithm behind loopy, means, that the loops built present likely guesses for the loop in question. A further real-space refinement of the loop region fine tunes the loops to the density.
Define in this folder the properties for real-space refinement of the loop regions
Real-space refinement
Turn the real-space refinement by loopfit (V. Lamzin) on the loop regions off or on
Extend region
Set the number to more than zero residues, to extend the real-space refinement outside the loop region itself
Loopfit executable
Set the loopfit executable to use. Default the one in the warpbin directory is used.
Loopfit log
Set the location of the logfile for loopfit.

Log files of Loopy

Loopy writes is own logs to file. The extend of messages depends on the levels you set in this folder.
Message level
Level of the messages to be written to file. (Value from 0 till 9)
Abort level
If a message of this level is encountered, terminate the program. Standard values are 7 or 8
Message file
Name for the message file (plain text) of Loopy.
XML output file
Name for the XML message file (xml format) of Loopy.

Krista Joosten
Last modified: Tue Aug 15 13:34:54 CEST 2006