CRANK2 (CCP4 Supported Program) NAME CRANK2 - automatic structure determination of proteins of unknown fold from X-ray data SYNOPSIS crank2 [Options] [Keyworded input] DESCRIPTION 'CRANK2' is successor of the CRANK program for automatic X-ray structure solution of proteins of unknown fold. Currently, the following experiments are supported: SAD, SIRAS, MAD (with data from 2,3,4 wavelength), SAD+native, MAD+native. CRANK2 incorporates the most recent algorithmic advancements, especially for the SAD and SIRAS experiments, enabling it to also automatically build 'difficult' structures, such as low resolutions or weak anomalous signal data sets. This manual provides information for command line run of CRANK2; please refer to CCP4 (http://www.ccp4.ac.uk) webpages for information how to run the program from the CCP4 graphical interface. HOW TO RUN CRANK2 Specification of the input data and the requested pipeline is compulsory to run crank2. This can be done in the usual CCP4 way: a keyworded input specifying the pipeline and the input data is typed in or redirected to the standard input of crank2. Alternatively, a text file containing the keyworded input can be supplied by the crank2 --keyin option or a crank2 XML file can be supplied by the crank2 --xmlin option. KEYWORDED INPUT: PIPELINE SPECIFICATION CRANK2 can be used to define and run a custom pipeline of supported processes and programs to solve the structure. A typical pipeline may consist of the following steps: FA estimation Estimation of FA values from the measured data Substructure detection Heavy atom substructure is detected using the FA values Substructure phasing and refinement The heavy atom substructure is refined and initial phases are calculated Hand determination Determination of hand of the structure Iterative density modification Improvement of the initial phases by iterative restraining of real and reciprocal space Model building and refinement The model of the structure is built and iteratively refined A pipeline is constructed using a keyworded input defining the requested processes. The --list-processes option can be used to list all the available processes keywords (along with their description and their supported subprocesses and programs). For example, the following keywords can be used to define the pipeline above for a structure solution from anomalous data: faest substrdet phas handdet dmfull comb_phdmmb Each process could be further customized by specification of a program, a subprocess, a combination of programs and subprocesses and/or their parameters - please see the description in the Advanced Keyworded Input section of this manual. KEYWORDED INPUT: DATA SPECIFICATION The input crystallographic information required by crank2 depends on the requested pipeline. It is supplied using crank2 data objects. A data object is usually associated with a data file and some (meta)information about it. The --list-data-objects option lists all the available data objects (and provides a basic information about them). For example, for structure solution from merged SAD data, the Friedel pair amplitudes/ intensities are needed. Furthermore, information about expected substructure (atomtype and anomalous scattering coefficients) and sequence are typically known and should be provided. The following keyworded input can be used to specify this information: fsigf typ=plus f=F+ sigf=SIGF+ file=/path/to/mtz/file.mtz fsigf typ=minus f=F- sigf=SIGF- model typ=substr atomtype=I fp=-0.6 fpp=7.0 sequence file=/path/to/sequence/file.seq Here, the 'fsigf', 'sequence' and 'model' keywords are the data objects, followed by specification of their attributes. There is no file associated with the model object at this stage since the substructure model is not known; the model keyword is now only used to specify the (expected) substructure model properties. All the 'file' attributes can be skipped if the corresponding input files are specified using the --hklin, --xyzin, --seqin options. ADVANCED KEYWORDED INPUT Subprocesses and Programs. The CRANK2 pipeline can be further customized by specification of programs, subprocesses and parameters for each process in the pipeline. Each process in the pipeline may be composed of its own subprocesses and/or programs and the same holds for each subprocess. Furthermore, there is a main crank process consisting of the processes of the pipeline. Thus, each crank2 pipeline definition can be viewed as a process/program tree with the main crank2 process being the root of this tree. The --list-processes option provides the information about which subprocesses and programs are supported by a process (the main crank process is skipped in the list as by definition, it supports all other processes and no programs). Syntactically, a keyword defining a subprocess or a program of a process P needs to be specified in a scope of the process P. The scope of the process P starts by the keyword defining the process P. Any keyword appearing after this process keyword that is not supported by the process P - as well as the end of the line - ends the scope permanently. Furthermore, a keyword defining a subprocess or program of P switches the scope to the subprocess/program's scope and after the scope of this subprocess/program ends, the scope switches back to that of the (parent) process P. Each new line in the keyworded input starts by the main crank scope. A line containing a single '-' or a single '\' character at its end is not considered ended - the line directly following it is considered its continuation (this enables breaking long lines for a better readability). For example, the example pipeline from the "Keyworded Input: Pipeline Specification" section could be further customized like this: faest shelxc substrdet shelxd phas refmac handdet dmfull dm parrot phcomb refmac comb_phdmmb mb buccaneer dmfull dm parrot ref refmac corresponding to the following tree (processes are typed in lowercase and programs in uppercase): crank faest substrdet phas handdet dmfull comb_phdmmb SHELXC SHELXD REFMAC dm phcomb mb dmfull PARROT REFMAC BUCCANEER dm ref PARROT REFMAC The last two steps in this example use a combination of subprocesses: the iterative density modification ('dmfull') will be performed using the program Parrot for crystal space density modification (subprocess 'dm') and the program Refmac for reciprocal space phase combination ('phcomb'). Parrot also includes the functionality to perform the full iterative density modification in both spaces. The following input can be used to use this functionality instead (replacing the 'dmfull' line above): dmfull parrot This alternative would mean that not only the phase combination but also the entire DM iteration algorithm will be performed by Parrot. Thus, the 'dm' and 'phcomb' CRANK2 subprocesses are not defined here as the 'dmfull' iterative algorithm is defined by Parrot rather than by CRANK2. Similarly, the last step in the example (the recent advanced algorithm for iterative combination of model building with phase and model improvement by imposing experimental and dm phase restraints - 'comb_phdmmb') is composed of a phased iterative density modification 'dmfull' and model building 'mb' subprocesses. Furthermore, the 'dmfull' subprocess is composed of its own subprocesses similarly as in the previous step. Process and program parameters. Each process can be further customized by specification of its parameters. The list of parameters supported by a process can be obtained using the --list-params option. The keyword parameter for a process P needs to be specified in the scope of the process P. The syntax is as follows: process_P_keyword parameter_keyword::parameter_value Similarly to specification of parameters for a process, it is possible to specify program arguments and program keywords for any program in the pipeline. Program arguments are passed as command line arguments of the program (for example, the usual CCP4 hklin argument) while program keywords are passed to the standard input of the program (usual CCP4 keywords). The program keywords and/or arguments must be specified in the scope of the program. The following syntax is used for program keywords and arguments, respectivelly: process_keyword program_keyword program_keyword_keyword:program_keyword_value process_keyword program_keyword program_argument_keyword;program_argument_value Multiple argument or keyword values can be specified either by repeated use of the syntax above (preferred) or by separating the values by spaces and wrapping the entire expression in double quotes ("): "keyword:value1 value2 value3" Any keywords/arguments supported by the program can be passed this way. For example, either of the following lines can be used to specify the number of refinement cycles for Refmac - using either a process parameter or an equivalent refmac keyword: ref cycles::10 refmac ref refmac NCYC:10 Data objects passing. Each of the processes and programs defined has its own input of data objects and outputs its output data objects. A data object input to a process/program can be specified in the scope of this program/process directly. Furthermore, Crank2 employs the following four default rules for automatic passing of data objects between the processes and programs: 1. Data objects inputted to a process P are passed to input of any subprocesses or programs of the process P and their subprocesses and programs etc (ie to the entire tree branch below the process P). 2. Data objects outputted by a process/program P are passed to any process "parents" of P as well as their parents in the tree etc (ie to the entire tree above the process P, ending by the crank root of the tree). 3. Data objects outputted by a step of the pipeline (ie by a first level subprocess of the main crank process) are passed to the input of the next step of the pipeline (ie the next first level subprocess of the main crank process). 4. If multiple data objects with the required properties are available at input to the process/program, the most recently passed object is used. These rules in combination with the direct specification of input data objects are sufficient in majority of cases. Sometimes there is a need to pass some output data object(s) to a pipeline step P from a more distant previous step. This can be achieved by the "obj_from" data object argument. The syntax is as follows: process_P_keyword data_object_keyword obj_from=order_num[,additional_argument=value_of_the_argument,...] where order_num is the order number of the step P in the pipeline, with the numbering starting from 0. Similarly, sometimes it is desirable to supress the rule 3 propagation from step to the next step which can be achieved by 'no_output_to_next_step' process parameter set for the process from which the propagation should be supressed. For example, let's assume we have previously managed to build a partial model of the structure (saved as partial.pdb) in the previously obtained "best" map (phases of which are stored in best.mtz). The following keyworded input can be used to attempt to improve the partial model and/or to remove bias from it by first attempting to density modify the previous "best" map using available SAD information (only) and then build a new model using the combined algorithm for phased and density modified model building: target::SAD fsigf typ=plus f=F+ sigf=F+ file=input.mtz fsigf typ=minus f=F- sigf=F- model substr atomtype=SE fp=-4 fpp=4 monomers_asym=1 sequence file=input.pir sepsubstrprot model typ=partial+substr file=partial.pdb dmfull mapcoef typ=best ph=PHIB fom=FOMB file=best.mtz dm parrot phcomb refmac comb_phdmmb minbigcyc::20 Here, the first 5 input lines define the target (experiment) parameter for the main crank process (not compulsory) and the four input data objects of the main crank process (which are all automatically passed to the input of all the processes and programs due to the rule 1). As we set the number of monomers in the asymmetric unit to one for the (substructure) model, this will be also inputted to each process in the crank run. The following lines define the 3 steps of the requested crank pipeline. Step 0 takes the inputted partial.pdb model and separates the substructure (SE atoms) from the partial protein model. The substructure (and the protein model) is then passed to the next step 1 (rule 3) where it is used for the multivariate Refmac SAD phase combination in density modification of the best.mtz map. Finally, the improved map from this step is passed to the next step 2 (rule 3) - the "combined" model building. The "combined" model building also needs the substructure model in order to generate the SAD phase restraints: however, since the substructure model is outputted by the refmac phase combination of the dmfull step 1, it is also passed to the output of the step 1 (rule 2) from which it is passed to the input of step 2 as well (rule 3). The default parameters and subprocesses/programs are used for the "combined" model building, except of an increased number of minimal model building cycles specified (20). Note that if we did not specify the partial.pdb model directly in the scope of the step 0 but in the crank scope (together with the other 4 input objects specified there), it would be automatically passed (rule 1) to the input of the step 2 as well and a rebuilding of this model would be attempted rather than building from scratch. In contrary, the best.mtz phases currently specified in the scope of the step 1 could have been specified in the crank scope without any change of the behaviour: although these phases would be passed to the steps 0 and 2 (rule 1), they are not used by step 0 (as separation of a substructure from a protein only uses a model input) and they are superseded at the input of the step 2 by the output phases from step 1 (rule 4). OPTIONS --help Print the crank2 help message and exit --version Print the crank2 version and exit --keyin KEYIN_FILE Read keyword input from the specified KEYIN_FILE file --xmlin XMLIN_FILE Read input from the specified XMLIN_FILE file --hklin Input MTZ file containing the data for the unknown, work structure. The required columns are F, sigF, and a set of HL coefficients from phasing improvement. --seqin Input sequence file in pir or fasta format. --xyzin Input PDB file containing an initial model. --xyzout Output PDB file containing the output model. --hklout Output MTZ file containing the output reflection data. NOTE: None of these options is compulsory. However, some of these options can be used to provide the compulsory specification of requested pipeline and input data. Problems: AUTHORS Pavol Skubak & Navraj S. Pannu Leiden University