3D Structure Prediction: Template Detection and Alignment - Worked Example.
Sequence Search
Search for a suitable template for the following sequence:
>tafazzin
MPLHVKWPFPAVPPLTWTLASSVVMGLVGTYSCFWTKYMNHLTVHNREVLYELIEKRGPA
TPLITVSNHQSCMDDPHLWGILKLRHIWNLKLMRWTPAAADICFTKELHSHFFSLGKCVP
VCRGAEFFQAENEGKGVLDTGRHMPGAGKRREKGDGVYQKGMDFILEKLNHGDWVHIFPE
GKVNMSSEFLRFKWGIGRLIAECHLNPIILPLWHVGMNDVLPNSPPYFPRFGQKITVLIG
KPFSALPVLERLRAENKSAVEMRKALTDFIQEEFQHLKTQAEQLHNHLQPGR
First we want to know if we can find a homologous template with known structure. Go to the BLAST page and
search with BLAST (against the PDB) and then PSI-BLAST (with the nr database). Does this sequence have a
homologous template structure?
If for some reason BLAST does not work, the results for BLAST and PSI-BLAST are collected for you
Additional Clues
Before you send your sequence to fold recognition servers you should try to see if there is any additional information that
we can use to help predict the structure. How many domains are there? Are there any disordered regions? Are there any trans-membrane regions or signal peptides?
Once you are happy with the sequence you want to send to the fold recognition servers, you can send your sequence and while you wait investigate whether there is any important information from other sources. For example, does the predicted
secondary structure and accessibility agree with our predicted model? Does the sequence have any important functional features that we would expect to see conserved?
Try checking the secondary structure predictions
- This is the secondary structure prediction from [PSIPRED]
Do any of these predictions refine your view of the possible structure?
Is the anything useful information to be found from the SwissProt annotation?
Fold Recognition
If the results show that there are no homologous structures with BLAST or PSI-BLAST then we will have to use more
powerful methods to look for structural templates. Even if a template is found with PSI-BLAST we will have to do
some work with the alignments to ensure that we are modelling the correct residues.
If we dont succeed with BLAST, we need to use fold recognition methods.
Fold recognition and hybrid sequence methods are structure prediction techniques that can be used when the twilight zone is reached
(meaning there is less than 20% of identity between 2 proteins in a pair-wise alignment). In this case the direct homology
modelling approach is often unreliable.
In general, fold recognition methods attempt to detect similarities between 3D features in proteins that show no sequence
similarity. Fold recognition methods use a variety of techniques to go deeper into the twilight zone
than pairwise sequence search methods.
The important criteria of fold recognition methods that differentiates them from sequence-only methods is that fold detection methods
use additional structural information from 3D structure databases, allowing them to more accurately predict how well the target sequence fits
each fold.
Strategies vary between fold recognition programs. They might use for instance secondary
structure coincidence, predicted residue-residue contacts, accesibility, or solvatation energy to evaluate the alignments.
"Threading" methods align the target sequence with folds to evaluate the structural fit of the sequence with the folds. Hybrid
methods are generally based on sequence profiles and may include information such as secondary structure prediction to improve their
alignments.
Try sending the sequence to some of the following servers: 3DPSSM, PSIPRED, SPX and FFAS.
HHPred
HHPred
An innovation is HHPred. This site allows you to make a number of queries and allows you to have much more control over the whole process. You can choose how you want the method to make the sequence searches, you can choose which databases you want to search, you can make easy comparison between secondary structure programs and once you have the results you can adjust the parameters and run the server again. Once you have understood what you are looking for, this is an exceptional tool.
Try sending the sequence to this server too, you can also use it to make a secondary structure prediction.
Collected Templates
Now that we have a lot of data we need to try to understand what we are looking at. Are there any results that look interesting?
Do any of the templates make sense? Does anything stand out?
Try finding out whether the structures that the servers are predicting come from related structural families.
Try sending the sequence to one (or both) of the following metaservers - they both work more or less the same way, above all they make it much easier to compare outputs from different servers.