Loopy is a program which tries to find likely loops to connect
fragments of a partial protein structure based on the expected
structure and the density map.
In the next section the algorithm used by Loopy is explained in some detail, to help
determine the settings to change to help build the loop, when the
defaults fail.
Building loops using structural and density information
Loopy builds the loops in three phases. First, starting from one
of the anchor fragment of the loop, a tree of possible traces of
CAs is build. Next unlikely branches of the tree
(for example branches, or CA-traces) that end far away from the
other anchor fragment, or paths that cross themselves, are removed.
For the reduced set of possible
CA-traces, an attempt is made to determine the missing main-chain
atoms. A further reduction of the main-chain traces for the loop
is performed, before finally,
after building the side chains. The best loop(s) are selected based
on the density fit of the loop model. The tree can be
build either towards the C-terminus of the N-terminus of the
protein, or both. It is advised to let Loopy build the tree in
both directions, to obtain the best possible loop for the gap.
- First Loopy tries to find likely
candidates for only the CAs of the residues, creating several
possible CA-traces for the missing loop structure. To find these likely
CAs, it takes an existing quatrapeptide (from the fragment
ends of the model) and generates a large number of
possible positions in a shell of CA-CA distance to produce a
pentapeptide. Next it uses a likelihood table for the angles in
the resulting pentapeptides, and the density at the generated
positions to determine the set of best CAs. By iterating over
the number of missing residues, this procedure builds a tree
of possible CA-CA paths which would connect the
fragments. During this iteration a mild restraint on the CA path
is added, clipping paths, which at some point wander so far from
the connecting anchor fragment, that it is certain the gap never
will be bridged.
- The tree of possible CA paths is then pruned to remove unlikely
paths and keep the most likely ones. By alternating pruning
the paths and slowly extending the CA-trace to a full loop model,
the algorithm is fast, without losing possible correct CA-traces
in a too early stage. This part of the algorithm is done in the
following steps:
- Since little restriction was placed on the end position of
the loop, the first pruning is done on the distance
between the loop and the connecting fragment. Loops are
kept, if the distance
between the end CA of the loop and the connecting CA of
the fragment is approximately equal to the CA-CA
distance.
- The freedom when creating the tree of loop paths, also means,
that some paths will cross themselves. To avoid these paths, a
minimal distance od 3.0 A between all CAs in one path is
required. Paths that violate this restraint are removed.
- (obsolete) Depending on the direction in which the
loop was build, the N or the C of the connecting fragment
is known. We use this information to check the
CA_fragment-CA_loop_N angle or CA_fragment-CA_loop_C
respectively.
- Though the structural likelihood is used in the
direction of loop building, no information was used on the
structural likelihood of the loop and the connecting
fragment. In this step the most likely loops according to
the structure are kept.
- the tree can be pruned even further by
keeping only those loops with a high average density at
the suggested CA positions
- Next the other main-chain atoms of the CA-path are
determined, by searching for the best peptide planes (based on the
density fit) between
successive CA's. We use the fact,
that the atoms
between CA and CA lie in a plane, and the relative
position of N, C, and O atom are known. By rotating around
the CA-CA bond, the plane with the best density
correlation for the main-
chain atoms (and worst density correlation outside the plane) is
determined. For non-GLY residues, the density correlation
at the CB is used as well.
- Finally, loops that not comply with the
(residue-dependent) ramachandran
plot are removed. (We used the tables as given by D. C. and
J. S. Richardson
- After all the loops are build (if chosen, in both
directions), the side-chains of the loops are
determined. Finally the loops are ordered (in descending order)
according to the density fit of the full loops. If the number
of loops exceeds the chosen number, or the user has chosen the
mode 'all loops' is chosen, only
the best are saved to file.
- Default, a real-space refinement by loopfit (V. Lamzin) is performed
on the loop regions of all the pdb's saved.
- Job title
- Title for the current experiment
- Experimental data
- Select whether to use a map or an mtz file. In the case of
an mtz file, the program will use fft to compute the
corresponding map.
- Mode for loop building
- Select whether to build a single, specific loop in the
model, or to try to determine all loops in the model
- Input map
- (Mode 'map' only) Input map to use
- MTZ
- (Mode 'mtz' only) Mtz file to use. F and PHI are used to compute the
corresponding map using fft. We need to save this file, since
we need to reread the map more than once.
- Input pdb
- Input pdb for your protein. Please, remove residues which
you would like to rebuild from this file. This frontend of
loopy will not rebuild any residues.
- Sequence
- (Mode 'Single loop' only) Select whether to type the sequence of
the loop of interest, or a pir-file is used to
determine the sequence. In the mode that all loops are built, the
pir-file is obligotary
- Name for first loop pdb
- (Mode 'Single loop' only) The name of this file is used as a format to determine the
names of the other loops to save
- Name for first model of the original pdb plus the loops built
- (Mode 'Single loop' only) The name of this file is used as a format to determine the
names of the other original pdbs with the built loops inserted
- Output file
- (Mode 'All loops' only) The filename for the final model of the original pdb plus the loops built
In this folder you define the loops(s) to build and the sequence of the protein or loop.
- Multiplicity
- (Mode 'pir-file' only) Set the number of molecules in the asymmetric unit corresponding
to the specified pir-file
- Methionine
- (Mode 'pir-file' only) Set whether Methionine residues in the specified pir-file should
be considered as Seleno-Methionines or not
- Use all chains
- Default all the chains in the pdb are considered for loop building. Uncheck this box
to select the chains in the pdb to consider
- Use chains
- Select the chains in the pdb to consider, based on the chain ID. The format expected
is (A|B|...)
- N-term anchor
- (Mode 'Single loop' only) Anchor residue of a fragment on the N terminus side of the
protein. Note that if you want to rebuild a selection of existing
residues, please remove them from the pdb file
- C-term anchor
- (Mode 'Single loop' only) Anchor residue of a fragment on the C terminus side of the
protein. Note that if you want to rebuild a selection of existing
residues, please remove them from the pdb file
- Loop length
- (Mode 'Single loop' only) Number of residues in the loop including the two anchor points
Since the time needed by Loopy is approximately exponential with the loop length, values larger
than 15 are discouraged
- Loop sequence
- (Only when no pir-file is used) Sequence of amino acids (one letter code) of the residues in the loop
including the two anchor points.
- Number of loops
- Select the number of loops you'd like the program to
save. It might very well be that the number of loops left
after pruning is less, than this number. In that case the
number of loops saved, will be less than you asked for. If no
loops are found at all, twiddle with the parameters,
specifically those in the folder "Selecting best CAs"
- Build towards C-terminus
- If you didn't select to build both ways, you can indicate
whether you want to build the tree towards the C terminus of
the protein, or towards the N terminus
In this folder you select how the loops are generated.
- Maximum loop length
- (Mode 'All loops' only) Select the maximum length of loops in the protein structure,
that the program should try to build (including anchors!). Note that the time Loopy needs to build loops is approximately
exponential with the length of the loop: a small change in this value might have a strong
effect on the time span.
- Extend loop
- (Mode 'All loops' only) If detected loops in the pdb are
smaller than this number, the anchors themselves are rebuild as
well. It is set default to 5, since smaller gaps in the model
can indicate that the anchors wer not properly determined
- Override the number of CA's
- Default, the number of CA's selected from each generated CA-shell, is set
according to the loop length, based on a time/performance test. Sometimes it can
be usefull for a single loop to override this value.
-
- Number of CA's
- Set the number of CA's selected from each generated CA-shell. Note, that the time
loopy takes to find the loop tree is approximately (CA-number) to the power (loop length)
-
- Force the minimum number of CA's
- Set the minimum number of CA's selected from each CA-shell. When the density is in some
regions of the loop low, it might be especially usefull to set this value, so that the loop
might bridge the area
- loop alternatives
- (Mode 'Single loop' only) Set the number of alternative loops saved to file. This can be
usefull when multiple conformations of the protein region are expected
In this folder you can set the thresholds used to prune the tree
from incorrect loops and the weights used to select the best loops.
- Maximum loop length (in A)
- The default -1. means that the maximum loop length is simply the CA-CA distance
times the CA-CA bonds in the loop. However, if the loop is expected to curve a lot,
it might be usefull to set the maximum loop length apprioprately, so that during the
build of the tree of loops, the longer loops are already removed from the set.
- Deviation distance loop connection
- The distance between the end CA of the loop and the
connecting CA of the structure should be approximately equal
to CA-CA distance. Set the allowed error in the distance.
- Threshold density correlation CAs
- After pruning on the distance, the next step is to select
the best trees based on the density fit of the
CAs. This number sets the number of best loops kept based on
the density correlation of the CAs only, removing the loops deemed unlikely
based on the density fit of the CA-trace. Set the value to -1 to keep the full tree.
- Structural threshold
- Select a further subset by determining the structure of the end CA of the loop
and the connecting quatrapeptide. Set the threshold for the
minimal value for the log likelihood of this structure
- Minimum for this stage
- Set this value, if you want to ensure to keep at least a
certain number of loops after pruning on the
structure... overruling the structural threshold if
necessary
- Maximum for this stage
- Set this value, if you want to ensure that the number of
loops doesn't exceed a certain amount after structural
pruning... keeping only those with the highest structural
likelihood
- Loops kept after building the main-chain atoms
- The main-chain atoms of the loops are found by looking for
the best plane through successive CAs, based on the density. After
building the main-chain atoms for each CA-CA pair, the loops are
pruned based on the scores of the planes.
- Main-chain density fit
- After building loops both towards the N, and the C Term, the loops are
sorted to the best density fit of the main-chain atoms
(including Cb for non-GLY). This threshold sets the number of
best loops kept before the side-chain atoms are determined.
During the building of the tree of possible paths, shells of
CAs are generated (see top). In this folder
you can set the thresholds etc. which determine how to select the
best CAs from all the CAs in one such a shell. Note:
generated CAs with a negative density correlation will be removed immediately.
- Likelihood threshold
- This is the threshold for the log likelihood of a CA to
represent the fifth CA of a peptapeptide, based on density
correlation, CA-CA distance, and structure.
- Weight distance
- Weight for the distance likelihood
- Weight density
- Weight for the likelihood of the density correlation
- Weight structure
- Weight for the structural likelihood
- Structure table to C
- Filename for the probability table for the angles and
dihedral angles of a pentapeptide in the direction of the C terminus
- Structure table to N
- Filename for the probability table for the angles and
dihedral angles of a pentapeptide in the direction of the N terminus
- Minimum distance CA
-
- Measure for the minimal distance between CAs from the same
shell. Several CA's with a reasonable likelihood might lie close together, without
resulting in a significantly different CA-trace for the loop. For this reason, the CA
with the best likelihood of the sub-group is kept.
- Maximum number of CAs
- Maximum number of CAs from each shell to keep. Note:
The CAs kept will all be used as a new suggestion for the
current residue in the loop, and thus as a new node in the
tree. The number of possible loops generated will expand
exponentially with this number.
- Force minimum number of CAs
- Force a minimum number of CAs in a shell to be kept, even
if the likelihood is less than the threshold set. This makes
the loop building a bit more flexible in low density areas, or
for pentapeptide structures which occur less often.
This folder describes how the shells of CAs are generated.
- Select generation CA shell
- Default a shell with a uniform and regular distribution of
CAs at exactly CA-CA distance is generated. You can also choose for a uniform and random
distribution of the CAs. In that case the shell is generated
with a given thickness.
- Number of CAs
- Number of CAs generated within a shell. In the case of a
regular distribution this number is rounded downwards to the
closest Fibonacci number.
- CA-CA distance
- Distance to use between successive CAs.
- Shell thickness
- (random shell only) Thickness of the generated
shell of CAs.
- SD CA-CA distance
- (random shell only) We assume that the
probability for the CA-CA distance is described by a
Gaussian. With this value you can set the standard deviation
fo the Gaussian function.
- Keep CAs with negative density halfway
- Due to the structure of a peptide, we expect the density
correlation halfway between successive CAs to be positive. A
quick first selection of CAs from the shell is thus (apart
from the density correlation at the generated CA) based on
the density correlation midpoint. Default for this option is false.
In this folder you can set the details of the density handling of the map in detail.
- Interpolation method
- Choose the interpolation method used to determine the
density fit of the loop model. The option 'weighted means' is implemented very efficiently,
however, it will give values dependent on the actual mean of the map. The option
'correlation' is slower, but will give a value independent of the mean of the map. The value
will be between 0 and 1, where 1 indicates a perfect fit of the model with the density
- Atom radius
- Radius used to determine the density correlation
- B factor
- At the moment the values for the b-factor in the pdb are
ignored. The value set, will be used for all atoms
- Remove atoms by factor
- To avoid overlaps between the generated loop and the
protein structure in the pdb, atoms in the pdb (apart from
dummies, or residues in chains consisting only of main chain
atoms) are removed from the map. This is done by flipping the
density in the map to negative values at the position of these
atoms. With this factor you can set the factor with which the
density is changed
- Density threshold residues
- Threshold for the density correlation of residues after
loop building. This is used to check overlap between the loop
and possible fragments of main chain atoms in the pdb.
- Density threshold dummies
- Threshold for the density correlation of dummies after
loop building. This is used to check overlap between the loop
and possible dummy atoms in the pdb.
The spacegroup name and cell dimensions are extracted from the
map/mtz file.
The underlying algorithm behind loopy, means, that the loops built
present likely guesses for the loop in question. A further
real-space refinement of the loop region fine tunes the loops to
the density.
Define in this folder the properties for real-space refinement of the loop regions
- Real-space refinement
- Turn the real-space refinement by loopfit (V. Lamzin) on
the loop regions off or on
- Extend region
- Set the number to more than zero residues, to extend the real-space refinement outside the loop region itself
- Loopfit executable
- Set the loopfit executable to use. Default the one in the warpbin directory is used.
- Loopfit log
- Set the location of the logfile for loopfit.
Loopy writes is own logs to file. The extend of messages depends
on the levels you set in this folder.
- Message level
- Level of the messages to be written to file. (Value
from 0 till 9)
- Abort level
- If a message of this level is encountered, terminate the
program. Standard values are 7 or 8
- Message file
- Name for the message file (plain text) of Loopy.
- XML output file
- Name for the XML message file (xml format) of Loopy.
Krista Joosten
Last modified: Tue Aug 15 13:34:54 CEST 2006