Allows user to extract individual structures and/or SAS profiles from a larger input structure file and/or folder containing SAS profiles.
The Extract Utilities module is accessible from the Tools section of the main menu.
The purpose of the module is create new files with coordinates extracted from larger files and/or a new folder with a subset of SAS profiles. There are six possible modes of use. In the following the "/" separator is used to indicate "either or both".
An example of each use case is shown below.
Typical usage is to extract structures from DCD and SAS files and save them to new, smaller DCD files and SAS folders.
Data are copied from the original files / folders to a new file / folder contained in a folder called "extract_utilities/" in the user supplied run name directory.
The default output file format is DCD. If you wish to save the extracted coordinates as a PDB file then type in a filename with .pdb at the end. For large numbers of frames we recommend saving your data in the DCD format (~seven fold smaller file size).
The "text" file format merely lists the frame numbers (one number per line).
Frames are counted starting at 1. Some processing and visualization programs start counting at 0.
The "weights" file format is that generated (Chi-Square Filter) and used (Density Plot) in this and other modules. One can copy the "weights" file format and use this instead of a "text" file but it isn't necessary if one only needs to generate the "text" file for the sole purpose of extracting structures / SAS profiles.
The "weights" file format is used in this module to extract specific strucutres that fit the cutoff used to generate these files in (Chi-Square Filter). For example, one can use the "weights" file to find the subset of structures / SAS profiles with the reduced chi-square values less than a certain value.
The sixth option is useful to convert a PDB to DCD or DCD to PDB if you use the "all" option. Note that there is no practical use to extract all SAS profiles.
To extract structures and/or SAS profiles one selects the appropriate checkbox(es).
If you are extracting structures and SAS profiles they should be a matched set. In other words, the SAS profiles were calcualted from the structures in your input trajectory file. The program will not check if this is true or not.
SAS profiles are selected by their location on the server. They must exist prior to running this module as there is no utility to upload a directory of SAS profiles to the server. Thus, the SAS profiles must have been previously generated using one of the supported SAS modules in Calculate, i.e., SasCalc, Xtal2Sas, Cryson and Crysol. NOTE that retired SAS calculators may not be supported in the future.
SAS profiles calculated using SasCalc can be extracted simulaneously for multiple contrasts by selecting the top level sascalc directory. See Case 7 below for details.
This example extracts structures and SAS profiles using each of the use cases mentioned above, assuming that the SAS profiles were calculated using SasCalc. If a different SAS module was used to calculate the SAS profiles, the SAS type and SAS data path must be selected accordingly. Usage where only structures or only SAS profiles are extracted are not shown, but the input fields for the individual cases does not change in these conditions.
This example extracts a structure from frame 7 and its corresponding SAS profile. Note that the sascalc/neutron_D2Op_100 directory is selected to extract SAS profiles only for that particular contrast.
run name: user defined name of folder that will contain the results.
extract trajectory selecting this checkbox enables the Trajectory Input fields.
trajectory filename: DCD or PDB file with coordinates that will be extracted from.
output file name (pdb or dcd): Name of output PDB or DCD file with the extracted coordinates.
extract SAS selecting this checkbox enables the SAS Input fields
SAS type select the name of the SAS calculator that was used to generate the SAS profiles that you are going to extract from. Supported types are: sascalc, xtal2sas, cryson and crysol.
SAS data path select the directory that contains the SAS profiles.
select option Pull down menu with options for 1. single frame, 2. range, 3. text file, 4. weights file, 5. sampling frequency, or 6. all. The option choice will spawn appropriate input field.
This example extracts frames 3 through 22 inclusive.
The sample input text file contains a list of six frames to extract. This file must have at least one line. The final frame number must not exceed the number of rames in the input PDB/DCD file.
6
7
8
9
10
88
The sample input weights file contains a list of frames to extract. In the box below only the first few lines are shown. The number of lines in this file must match the number of rames in the input PDB/DCD file.
# file generated on Sun Apr 12 18:05:32 2015
# structure, X2, weight
1 10.394808 0.000000
2 8.041527 0.000000
3 8.792317 0.000000
4 8.217490 0.000000
5 6.896897 1.000000
6 7.555859 1.000000
7 8.220753 0.000000
. . .
The sample input directs the program to save every tenth structure and it's associated SAS profile. Note that the first frame is saved. So selecting a frequency of 10 frames will result in a list such as [1, 11, 21, etc.].
The example is shown to convert an input PDB file to DCD. One can also use this option to convert an input DCD to PDB by reading in a DCD trajectory file and indicating a file name with the suffix ".dcd". Note that longer trajectories should be maintained in DCD files.
The example is shown to extract frame 7 from a dcd file and its corresonding SAS profiles from multiple directories (from SasCalc calculations at different contrasts). Note the difference in SAS data path from Case 1 above. The top level /sascalc directory is selected rather than the contrast-specific directory, /sascalc/neutron_D2Op_100.
Results will be written to a new directory within the given "run name". For example, in the figure it is noted that the structures were saved files within the current project directory within the chosen "run name" directory. SAS profile(s) are saved to a new directory named by the input SAS profile type as shown below:
./run_0/extract_utilities/hiv1_gag_frame_7.pdb
./run_0/extract_utilities/sascalc/neutron_D2Op_100
./run_0/extract_utilities/hiv1_gag_frames_3_22.dcd
./run_0/extract_utilities/sascalc/neutron_D2Op_100
./run_0/extract_utilities/hiv1_gag_frames_text_file.dcd
./run_0/extract_utilities/sascalc/neutron_D2Op_100
./run_0/extract_utilities/hiv1_gag_frames_weights_file.dcd
./run_0/extract_utilities/sascalc/neutron_D2Op_100
./run_0/extract_utilities/hiv1_gag_frames_periodic.dcd
./run_0/extract_utilities/sascalc/neutron_D2Op_100
./run_0/extract_utilities/hiv1_gag.dcd
Note the difference between this output and that for Case 1 above. The SAS files corresponding to frame 7 are extracted from all directories below the top level /sascalc directory, i.e., /neutron_D2Op_100 and /neutron_D2Op_0.
./run_0/extract_utilities/hiv1_gag_frame_7.pdb
./run_0/extract_utilities/sascalc/neutron_D2Op_100
./run_0/extract_utilities/sascalc/neutron_D2Op_0
None
input files
hiv1_gag.pdb
hiv1_gag_200_frames.dcd
text_file.txt
weights_file.txt
output files
hiv1_gag_frame_7.pdb
hiv1_gag_frames_3_22.dcd
hiv1_gag_frames_text_file.dcd
hiv1_gag_frames_weights_file.dcd
hiv1_gag_periodic.dcd
hiv1_gag.dcd
NOTE: input and output SAS profiles are not available for download since they cannot be uploaded to test the module.
Only PDB and DCD file formats are supported. SAS profiles must exist on the server as there is no option to upload a folder containg such files.
Not published.