Return to Main Documents Page

SASSIE

The purpose of this document is for the continued development and use of the program suite, SASSIE, which is used to create atomistic models of molecular systems and also to compare scattering data from these models directly to experimental data.

So, what does it do? The core ability of SASSIE is to generate and manipulate large numbers of structures and to calculate the SANS, SAXS, and neutron reflectivity profiles from atomistic structures.

SASSIE is modular by design. Users can use any or all modules to fit their own workflow. Structures can be generated using our simulation modules or can be imported from all molecular modeling and simluation packages. Modules to calculate scattering from structures and analysis of ensembles of scattering data can be applied regardless of the source of the structures. Likewise, structures generated within SASSIE can be exported for more extensive simulation and or analysis using other software.

There are two ways to use the software.

In addition to this documentation there are various tutorials available:

Brief Overview

In our experience, people that carry out SAS and reflectivity experiments do not have the software available to generate, handle, manipulate, calculate, and analyze structures. This is, of course, part of the day to day experience for people that perform molecular dynamics simulations. Current atomistic MD simulations can not generate the types of time averaged structures encountered in many systems that are studied by x-rays and neutrons. The molecules are large and configuration space is not sampled quick enough for the methods to be applicable to all systems. Thus, there is a technology gap to be filled.

This software was written to generate configurations of the HIV-1 Gag protein. The Gag protein is an intrinsically disordered protein which is too large to model using MD methods with limited computational resources. So, we need to generate structures that span the configuration space available to the molecule(s) of interest in a way that is physically reasonable. The quickest, and perhaps simplest, way to do this is to allow the dihedral angles of the flexible bits to vary. We have implemented this by using a Metropolis sampling protocol to sample the energetically allowed dihedral angles for a user-supplied temperature. This is a standard method used in the physical sciences.

We implement energetic sampling to include electrostatic non-bonded interactions in the structure generation process.

Yes, there are other programs available to study intrinsically disordered proteins. We feel that this framework is more suited to study a wide range of heterogeneous problems. In fact, we can easily handle non-biological problems in this framework. Of course, the scattering cross-sections and other details need to be worked out for non-protein based systems, but the framework is not inherently limited. Thus, we believe that this software will fit more of our user’s modeling needs to analyze neutron data.

So what do you do with the software?

What you do with the SASSIE software is largely up to you. In the end, the software is open-source, it is written in Python and thus the entire program is available for the user to modify, expand, or delete. There is a general idea of how to proceed which can generally be followed in the GUI from top to bottom.

There are two distinct ways to use the software.

One method is to use the structure building modules in Build and structure generation modules in Simulate to build and/or generate structures. Then proceed to calculate SAS profiles using one of the supported SAS calculators in Calculate. Then use Chi_Square_Filter to compare the results to experiment, and subsequently visualize the results in a 3D Density_Plot. A flow chart describing this process is below.

inline image

With that in mind, there are a few other modules included in this release to energy minimize structures and to generate SAS profiles from experimental EM density data. Programs to model reflectivity data are available and we are actively working on algorithms to study biological systems on membranes etc.

Again, the idea is to avoid a black-box approach to studying these systems. Unfortunately, there are not that many constraints provided by a SAS or reflectivity data set and thus the problem of going from data to model is grossly underdetermined. Mathematically this means there are an infinite number of solutions, so care must be taken when evaluating the data.

A starting structure

One of the biggest issues starting off is that users need to have a properly formatted Protein Data Bank (PDB) file. There are many legacy issues with this file format and various atom naming issues that grind projects to a halt. We encourage you to check out www.pdb.org for file formatting. There are web-resources and software to clean up files etc. We recommend that you include all atoms that are in your system in your PDB file. Do not leave out loops, include the hydrogens, fix the disulfide bonds, etc. See the Notes on Starting Structures and Force Fields page for some advice.

Once you have a structure you may want to orient it in a specific way. For example, to study proteins on membranes, we merely throw out structures with atoms with coordinates less than zero in the z-dimension. Thus, your starting structure should be oriented to meet this criterion.

File Formats

Generally the software is written to read a binary data file (DCD) that contains coordinates. DCD files are useless without a PDB (or PSF) file. The PDB/PSF file provides atom names to the individual entries in the binary DCD file. DCD files can contain a single configuration, or usually, multiple configurations (hundreds to millions). SASSIE saves files to your hard disk (configuration/DCD files, run log files, data from plots, temporary input/output files, etc.).

Some data

There are many different file formats used by laboratories to store scattering and reflectivity data. We require that the data be in a simple text file, generally with three fields (e.g. q, intensity, error) where the data are sampled evenly in the independent variable (q). The Data_Interpolation module will take care of this for NCNR SANS data. Alternatively, we recommend that you use Mathematica, MatLab, Igor, or any other standard method to create this file for each of your data sets.

You have to make sure that the number of points in the data file that you are using to compare to the SAS profile calculated using one of the supported SAS calculators, SasCalc, Xtal2Sas, Cryson or Crysol, matches the number of points and the exact q-values that these programs output. Unfortunately, this is not automated at this point. Thus, you should check this before comparing the experimental data to the calculated profiles using the Chi_Square_Filter module.

Now, you have a starting structure with ALL the atoms that were in the test-tube (ahem, minus bulk solvent), you have a text file of your data, so you are ready to get started.

Return to Main Documents Page

Go to top