Return to Main Documents Page

SASSIE-web: Quick Start

Introduction

SASSIE-web is an online simulation and analysis tool for the modelling of biomolecular structures using small angle scattering data. It is based on the original, standalone, program SASSIE (Curtis et al. 2012) and retains all of its core features. This guide is designed to get you familiar with the basic features of the program as quickly as possible. The features covered will be:

Important Note: Before you start this tutorial you will need to register for an account for and login to SASSIE-web. Instructions on how to register can be found here.

The only data needed to work through this tutorial are:

You should download these files to your computer now. A good idea is to create a directory called SASSIE-web-tutorial and save all downloads from the tutorial in this location.

You will also need to familiarize yourself with a molecular viewing program that can display PDB and DCD files. We recommend VMD and provide a quick tutorial here.

SASSIE-web Interface

Once logged in to SASSIE-web the page should look something like the figure below.

SASSIE-web initial menu when logged in

During this tutorial, when instructed to select something from the Main Menu but no menu is visible on the left hand side of the page you must click on the Main Menu toggle to reveal it.

To choose a project name where your work will be stored and to access SASSIE modules that are still in Beta status, click on the Head icon. The User Configuration menu will appear.

SASSIE-web User Configuration Menu

Choose an existing project or create a new project name. In addition, select the Beta checkbox. (You can also choose to select the Retired checkbox to access retired SASSIE modules. In addition, you can update the background and foreground screen colors by selecting the Update colors checkbox.) Click the 'Submit' button to connect to the project and to access the Beta modules. The project name should now appear at the top of the web page. Here, we have chosen project name 'test1'.

SASSIE-web page with project name

Starting Structure

In this tutorial we will model the conformation of the HIV-1 Gag protein following the study of Datta et al. 2007. HIV Gag is a long polyprotein which is cleaved to form the functional proteins required by the virus. The viral proteins which form the domains in Gag are the matrix (MA), capsid (CA), p2 and nucleocapsid (NC). A structure stitched together from evidence based on crystal structures and models of the individual domains is shown below. A similar structure will be used as the starting structure for our simulations.

Regions of the Gag polyprotein

Datta et al. 2007 identified 5 flexible regions (labelled I-V), which are highlighted in the figure above. The table alongside the picture shows the residues which make up each region (we are going to need this information to select regions to be varied when we run Monte Carlo simulations).

Data Interpolation

Data interpolation is necessary to create a new data file that is spaced on a uniform grid from the experimental data file. More information on the module is available in the Data Interpolation documentation. Here we interpolate the SANS data from a HIV-1 Gag protein at a concentration of 1 mg/ml in a 100% D2O buffer.

Select the 'Tools' button at the top of the Main Menu. Click on Main Menu Toggle if necessary.

Buttons in the Main Menu

This will reveal a list of buttons for each tool running across the top of the page (just below the top bar with the Session Management and Main Menu Toggle icons).

Select the 'Data Interpolation' button from this menu.

Menu of Tools options within SASSIE

You should now see a page like the one below. This page is used to enter all of the information needed to do the data interpolation.

Input for Data Interpolation

The figure shows the values for each field as required for data interpolation.

Edit the values on your screen to match the screenshot. An explanation of the field and how to edit it can be found below.

run name: user defined name of folder that will contain the results.

experimental data file: Name of input file with experimental data with at least three columns: q, I(q), and error in I(q). Here we use the sans_data.sub file.

output file name: Name of file that will contain the interpolated data. Here we choose the name sans_data.dat.

I(0): Experimentally determined value of scattering intensity at q = 0. Here we used the value of 0.04 that was derived from a Guinier fit to the data.

I(0) error: Experimentally determined value of the error of the scattering intensity at q = 0. Here we use the value of 0.001 that was obtained from the Guinier fit to the data.

new delta q: Desired spacing of q-values (1/Angstrom). This should be chosen so that your first interpolated data point falls within the q-range of the experimental data. For this tutorial, the value has been set to 0.02 since the first data point occurs at a value of ~0.013.

number of new q-values: Integer number of desired q-values. For this tutorial, the value has been set to 16 to that the maximum q value is 0.3.

Once you have understood the input fields and made sure that your values agree with the figure click on the 'Submit' button to start simulation.

As the run continues the progress bar beneath the submit button should update. A graph beneath this should will show the variation of the radius of gyration over the steps of the Monte Carlo simulation. Once complete the output should look similar to the figure below.

Output for Data Interpolation

The output will show a plot of the original and interpolated data, the name of the input file and the name of the interpolated data file as well as the directory in which it is located.

Note that roll-over help will indicate options to resize, zoom and reset the view of the plot.

What have we generated:

test1/run_0/data_interpolation

PDB Scan

PDB Scan is used to assess whether an input PDB is ready for simulation and where possible to provide files enabling CHARMM forecfield parameterization. Information on missing atoms and residues and those not covered as standard by the CHARMM 27 forcefield are reported. PDB files do not need to have header information. At this time, only PDB files of proteins are supported. More information on the module is available in the PDB Scan documentation. Here we examine the PDB file that describes the starting HIV-1 Gag protein structure.

Select the 'Build' button from the Main Menu of SASSIE-web and then click on the PDB Scan button.

You should now see a page like th one below. This page is used to enter all of the information needed to check the PDB file.

Input for the PDB Scan

pdb file input: The PDB file that we want to examine. Here we use the gag_start.pdb file.

Once you have entered the file name, click on the 'Submit' button to start the file scan.

As the run continues the progress bar beneath the submit button should update. Once complete the output should look similar to the figure below.

Output1 for the PDB Scan

The text output region provides a brief summary of the PDB Scan report.

What have we generated:

test1/run_0/pdbscan

Visualization

A JSmol vizualization of the protein is produced and is shown below the text output region. Holding down the left mouse button and moving the cursor over the picture allows you to rotate the view, the scroll wheel facilitates zooming in and out. Right clicking on the image allows you to access all of the JSmol options.

Output2 for the PDB Scan

The full PDB scan report can be found below the image of the structure.

Output3 for the PDB Scan

Output4 for the PDB Scan

This PDB file is ready for simulation so we can proceed to create an ensemble of structures for comparison to the SANS data.

Structure Variation - Monte Carlo (Monomer)

The primary way to vary structures in SASSIE is via Monte Carlo simulations which rotate the backbone dihedral angles of flexible regions within proteins to sample a wide range of structures. More information can be found in the Monomer Monte Carlo documentation. Here we setup and run such a simulation before visualizing the range of stuctures produced.

Select the 'Simulate' button from the Main Menu of SASSIE-web and then click on the 'Monomer Monte Carlo' button.

You should now see a page like the one below. This page is used to enter all of the information needed to run a Monte Carlo simulation.

Input for the Monomer Monte Carlo simulation

The figure shows the values for each field as required for our simulation. An explanation of some of the fields can be found below.

reference pdb: The starting structure for the simulation. Here we use the gag_start.pdb file.

number of trial attempts: Number of times the simulation will try to vary the structure (some structures will be discarded by the Monte Carlo algorithm) For this tutorial set the value to 1000. For real studies tens of thousands of structures are needed.

return to previous structure: Number of discarded structures in a row that are considered before returning to a randomly-selected structure that was previously accepted

number of flexible regions to vary: single number

residue range for each flexible region: comma-separated list of the range of residues to vary for each flexible region

maximum angle(s): comma-separated list of the maximum angle sampled in a single Monte Carlo step for each flexible region

structure alignment region: a single range of residues for structural alignment of all the flexible segments. This makes it easy to make visual comparisons of each frame in the output trajectory.

overlap basis: Select either heavy atoms, all, backbone or enter atom name. The atom name option will spawn futher inputs:

Once you have understood the input fields and made sure that your values agree with the figure click on the 'Submit' button to start simulation.

As the run continues the progress bar beneath the submit button should update. A graph beneath this should will show the variation of the radius of gyration over the steps of the Monte Carlo simulation. Once complete the output should look similar to the figure below.

Output from completed SASSIE Monte Carlo simulation

What have we generated:

test1/run_0/monomer_monte_carlo

Visualization

You should now download the output trajectory using the file browser.

Note: the 'Configurations and statistics saved in' line in the output gives a relative path under the project directory.

A progress bar will appear monitoring the upload of your files to the server. Once complete a link will appear beneath the download button.

Once the download is complete uncompress the file in a location of your choice.

You should now load the PDB gag_start.pdb (you will find this in the run_0 directory you just downloaded) and DCD (run_0/monomer_monte_carlo/run_0.dcd) into VMD to observe the variation produced even in our very short Monte Carlo simulation. Remember the DCD file contains coordinates alone, you need to load the PDB first so that the visualization software knows about the atoms they represent and how they are connected. You can also download and visualize the DCD file run_0.dcd generated from this particular Quick Start run for comparison.

Initial SAS Curve Calculation

Next we calculate a theoretical scattering curve for each of the trial structures we have generated. The SASSIE workflow operates by calculating the scattering intensities at evenly spaced Q values and matching these against interpolated experimental values.

The file sans_data.dat contains our previously interpolated experimental data. In order to create the correct data points in our theoretical curves we need three pieces of information:

A number of scattering calculators are available in SASSIE. Here we use SasCalc. More information can be found in the SasCalc documentation. The starting structure must be a complete structure without missing residues or atoms (including hydrogen atoms) in order to obtain accurate scattering profiles. Atom and residue naming must be compatiable with those defined in the CHARMM force field.

Now you need to enter the information to run the scattering calculator. SasCalc can be used to calculate the scattering for SAXS and SANS and/or for several SANS contrasts at the same time.

The module is first run using the "converged number of golden vectors" option on just one structure. Choose this option from the SasCalc method menu in the Advanced Input section of the page.

Other than the values listed above you can keep the default values for this tutorial (see figure below).

Input values for theoretical curve calculation using SasCalc

The single structure that we used to start the simulation is used as both the reference pdb and the trajectory file filename (PDB in this case) so it is already uploaded to the SASSIE-web server. Thus, you can either upload it again from your local computer or locate it on the server and read it from there.

To read the file from the server:

Once you have understood the input fields and made sure that your values agree with the figure click on the 'Submit' button to start the calculation.

A scattering curve will be calculated for the starting structure (the progress bar should reach 100% and a message stating the run finished appear in the window beneath when the job has completed). Note that the files are written to a sub-directory of sascalc/ that is named according to the D2O percentage in the solvent. This is useful when calculating the scattering curves for more than one contrast.

Output values for theoretical curve calculation using SasCalc

What have we generated:

test1/run_0/sascalc/neutron_D2Op_100

SECOND SAS Curve Calculation

The run_0_00001.log file from the inital SAS Curve calculation indicates that 35 golden vectors were required for convergence to the desired tolerance (0.01 in this case).

Output values first SAS curve calculation

We now use this information to calculate the scattering curves for all of the generated structures using the ""fixed number of golden vectors"" option from the SasCalc method menu as shown below.

Input values for second theoretical curve calculation using SasCalc

The reference pdb is the starting structure and the trajectory file filename (DCD) comes from the result of the Monte Carlo simulation, so both are already on the SASSIE-web server. Thus, you can either upload them again from your local computer or locate them on the server and read them from there.

When all input fields are complete:

Output values for second theoretical curve calculation using SasCalc

A scattering curve will be calculated for all of the structures generated by the Monte Carlo simulation (the progress bar should reach 100% and a message stating the run finished appear in the window beneath when the job has completed). Note that the files written during the initial SAS curve calculation will be overwritten since we chose the same run name (run_0) in both cases. If you wish to save the files from the initial calculation, use a different run name.

What have we generated:

test1/run_0/sascalc/neutron_D2Op_100

Initial SAS Curve Comparison

Now we compare our theoretical curves to the experimental data to see which of our structures are plausible models of the real protein using Chi-Square Filter. More information can be found in the Chi-Square Filter documention.

We now need to select the path containing the theoretical scattering curves and the file containing the experimental data. In addition we need to input the value of I(0) to enable comparison of the two curves (see the picture below).

Input for Chi-Square Filter

To set the path to the scattering curves generated in the previus step:

Selecting the sascalc folder in the file browser

interpolated data file

I(0)

We eventually may want to create 'weight files' that record which frames meet criteria that make them successful models of our data. This means those with low chi square values. However, we don't know the range of chi square values we have at this stage. So, we set the 'number of weight files' to 0 at this time.

Sas type

number of weight files

Note: There are list boxes that allow the selection of the format of the input theoretical curves and the metric used to compare the curves. Here we wish to use the defaults of 'SasCalc' and 'reduced chi-square'.

Click 'Submit'.

Once complete you the run you should see outputs similar to those below.

Output from comparing theoretical and experimental scattering curves using the Chi-Square Filter module Output from comparing theoretical and experimental scattering curves using the Chi-Square Filter module

In the text output you will see the minimum chi square (X2) values is given.

The top plot shows the variation of chi squared (y-axis) with the radius of gyration (x-axis). Chi squared is a measure of the quality of fit of the theoretical curve to the experimental one. It is a percentage and the lower the value the better.

The bottom plot shows a direct comparison of the best, worst and average theoretical curves with experiment (goal).

What have we generated:

test1/run_0/chi_square_filter/neutron_D2Op_100

/spectra

Second SAS Curve Comparison

Now that we know the range of chi square values that we have, we can compare the theoretical curves to the data a second time and create a weight file that flags all structures with chi square values below a certain number. Now, we set the 'number of weight files' to 1.

Input for Chi-Square Filter

run name:

number of weight files

Weight files contain information on which frames in our simulation meet specific criteria provided in the expression box.

enter expression


x2 < 3.0

This selects all frames with a chi square less than 3.0. Adjust this value if necessary to suit the results from your simulation.

weight file name

low Rg cutoff

Once complete you the run you should see outputs like those below.

Output from comparing theoretical and experimental scattering curves using the Chi-Square Filter module Output from comparing theoretical and experimental scattering curves using the Chi-Square Filter module

These results are essentially the same as those from our first comparison above except that we have now generated a weight file.

What have we generated:

test1/run_1/chi_square_filter/neutron_D2Op_100

/spectra

Trajectory Filtering

Now we can filter out the best fit structures and vizualize them using the Extract Utilities. More information can be found in the Extract Utilities documentation.

Input for the Extract Utilities module

In this module we can select structures from the DCD we created from the Monte Carlo simulation using the weight files generated in the Chi-Square Filter module.

In this case, we chose to select the weight file from the server.

When the process is finished your output should look like the one below.

Output for the Extract Utilities module

In the event that none of your frames pass the filter then you can download these preprepared files and try the filtering process:

What have we generated:

test1/run_1/extract_utilities

Visualization

Download the 'best_gag.dcd' file as you did the unfiltered DCD and then vizualize the structure again in VMD (you will need to load a suitable PDB first as before). You should see that the filtered structures are all noticeably more compact that the starting structure and the majority of those in the unfiltered DCD.

Another way to visualize the structures sampled in the 'run_0.dcd' and 'best_gag.dcd' files for comparison is to use the Density Plot module. The density plot below shows the envelope sampled by all of the accepted structures as well as that sampled by only the best fit structures. The black region is the envelope represented by residues 283-353, which is approximately the alignment region that we defined in the Monomer Monte Carlo module. The blue and yellow regions represent the envelope sampled by residues 1-282 for all accepted structures (blue) and for the best fit structures (yellow). The red and green regions represent the envelope sampled by residues 354-431 for all accepted structures (red) and the best fit structures (green). This representation makes it easier to see that the envelope represented by the best fit structures is significantly smaller than that represented by all of the accepted structures. Remember that our sample has only 692 accepted structures and 27 best fit structures. For a real study, several thousand accepted structures would be needed to determine if this observation holds true.

Density Plot

More information on how to use the Density Plot module can be found in the Density Plot documentation.

Minimization of Best Structures

Now we can minimize the best fit gag structures using the Energy Minimization module. More information can be found in the Energy Minimization documentation.

Input for the Energy Minimization module

When the process is finished your output should look like the one below.

Input for the Energy Minimization module

What have we generated:

test1/run_1/energy_minimization

Visualization

Download the 'min_best_gag.dcd' and 'min_best_gag.dcd.pdb' files and then vizualize the structures in VMD. You can load 'best_gag.dcd' (along with a suitable PDB file) again as well for comparison. Go through the two structures frame by frame. You should notice very little difference in the structures.

Final SAS Curve Comparison

If desired, you can compare the minimized structures to the SANS data by calculating their theoretical SANS curves and comparing them to the SANS data again to see how different the best fit chi square values are after the minimization.

First, calculate the theoretical SANS curves using SasCalc.

Input values for final theoretical curve calculation using SasCalc

run name:

Set the run name to run_2. Once the inputs have been entered, click 'Submit'.

Once the run is complete, you should see outputs like those below.

Output values for final theoretical curve calculation using SasCalc

What have we generated:

test1/run_2/sascalc/neutron_D2Op_100

Then, compare the theoretical SANS curves to the SANS data using Chi-Square Filter.

Input for Chi-Square Filter

run name:

Once the run is complete, you should see outputs like those below.

Output from comparing theoretical and experimental scattering curves using the Chi-Square Filter module Output from comparing theoretical and experimental scattering curves using the Chi-Square Filter module

What have we generated:

test1/run_2/chi_square_filter

/spectra

References

  1. Conformation of the HIV-1 Gag Protein in Solution S. A. K. Datta, J. E. Curtis, W. Ratcliff, P. K. Clark, R. M. Crist, J. Lebowitz, S. Krueger, A. Rein, J. Mol. Biol. 365, 812-824 (2007). BIBTex, Endnote, Plain Text

  2. SASSIE: A program to study intrinsically disordered biological molecules and macromolecular ensembles using experimental scattering restraints J. E. Curtis, S. Raghunandan, H. Nanda, S. Krueger, Comp. Phys. Comm. 183, 382-389 (2012). BIBTeX, EndNote, Plain Text

Return to Main Documents Page

Go to top