Many structures that the SAS community uses are incomplete. Coordinates are missing for internal loops, amino acids are missing from the ends etc. In our experience, you need to include all atoms in your modeling project to match the sample that you measured.
Several examples of how to build PSF files using open-source programs are covered in our “MD School”. Specifically, look at lecture 2 and lab 2.
The purpose of this page is to give some guidance on how you can add missing residues and create protein structure files (PSF) that are necessary to access the energetic and MD portions of SASSIE. This may require you to gain access to third-party MD code (CHARMM, AMBER). This code is generally available at academic institutions.
If your PDB is complete (i.e. all the atoms are included in the file and the naming and numbering is correct) then it is very easy to create this file. You should download and install NAMD from http://www.ks.uiuc.edu/Research/namd/ which installs a program called “psfgen” which is the simplest open-source way to generate this file. In fact, psfgen can be used to correct atom naming and numbering issues, add hydrogens, etc.
You run psfgen at the command line (in a terminal). Make sure that your “path” is set up to find psfgen in your filesystem, or you can use the full path explicitly. You run psfgen by typing, for example,
psfgen “filename of a build script text file”
Below I am providing an example PSF file generation script. There are several lines that are commented out (using a # symbol at the beginning of the line) that I am including to show some of the additional capabilities that are available using psfgen. You may or may not need such additional lines in your file; it depends on your system.
You need a CHARMM topology file to run this script. You can download this from Alex Mackerell’s lab: http://mackerell.umaryland.edu/CHARMM_ff_params.html. In the example below, this file is in a directory called “~/toppar/”.
topology ~/toppar/top_all27_prot_na.inp
alias residue HIS HSE ### this allows you to change all HIS to HSE (probably needed)
alias atom ILE CD1 CD ### this allows you to change CD1 to CD in all ILE residues
alias atom SER HG HG1 ### this allows you to change HG to HG1 in all SER residues
alias atom CYS HG HG1 ### this allows you to change HG to HG1 in all CYS residues
segment PROT { ### the PROT statement can be any 3 or 4 letter code you want
first NTER ### use this if you want the first amino acid residue to be the N-terminus
pdb sample.pdb ### this line tells psfgen where to find the protein sequence information
last CTER ### use this if you want the last amino acid residue to be the C-terminus
}
patch DISU PROT:5 PROT:28 ### this creates a disufide bond between residues 5 & 28 in PROT
coordpdb sample.pdb PROT ### load the coordinates for the segment PROT
guesscoord ### use this command to add hydrogen coordinates
writepsf sample.psf ### this command saves the PSF file
writepdb new_sample.pdb ### this command saves a new PDB file (for example purposes)
In this example we only included a single segment. You can have many segments in your system. The different segments are not covalently bonded to each other. For example, you generally have a protein segment and a water segment (for bulk water). For most cases in SASSIE, you will only have a single segment (i.e. your single protein).
I am sure that minor errors may be encountered with your system. Unfortunately, some experience with this process is required, but a standard Google search can solve most issues that may arise. In addition the authors of psfgen have written a decent manual that is available through the NAMD website. Good luck!