[Class slides (.pdf)]

Practical Protein Bioinformatics
Book web site with updated links to tools and resources
for protein structure prediction and analysis.


Exercises

Resources (open in new tabs)

This sequence corresponds to an inactive mutant of found during your research.

>Influenza_Neuraminidase_mutant
MNPNQKIITIGSVSLTIATVCFLMQIAILVTTVTLHFKQHECDSPASNQVMPCEPIIIER
NITEIVYLNNTTIEKEICPKVVEYRNWSKPQCQITGFAPFSKDNSIRLSAGGDIWVTREP
YVSCDPVKCYQFALGQGTTLDNKHSNDTVHLRIPHRTLLMNELGVPFHLGTRQVCIAWSS
SSCHDGKAWLHVCITGDDKNATASFIYDGRLVDSIGSWSQNILRTQESECVCINGTCTVV
MTDGSASGRADTRILFIEEGKIVHISPLAGSAQHVEECSCYPRYPGVRCICRDNWKGSNR
PVVDINMEDYSIDSSYVCSGLVGDTPRNDDRSSNSNCRNPNNERGTQGVKGWAFDNGNDL
WMGRTISKDLRSGYETFKVIGGWSTPNSKSQINRQVIVDSDNRSGYSGIFSVEGKSCINR
CFYVELIRGRKQETRVWWTSNSIVVFCGTSGTYGTGSWPDGANINFMPI

1. Is the 3D structure of this protein known? If so, try to summarize the most important structural information for it browsing the different on-line resources. With that info, try to explain the loss of function of that mutant.

2. Generate a high-quality image for a publication with the structural explanation for that loss of function.

You might have a look at Creating an eye-catching figure with Pymol.

Sequence searches

Primary protein structure resources

PDB-derived data

Visualization software


Compare the structures of Rac unbound (PDB 1mh1) and Rac interacting with an effector, a toxin that modulates its function (PDB 1he1). What are the main structural diferences? Can these diferences explain the toxin's effect?

Take a look at how Rac (PDB 1mh1) is classified in SCOP and CATH.

Software for structural alignment

Pair-wise and database searches
Multiple structure alignment
Flexible alignments

Protein structure classification


Predict the secondary structure of these two proteins.

>tr|C6HZ97|C6HZ97_9BACT Putative aminomethyltransferase 
MMTTIDTRPSHIRAGLHIAPRTRVLVSVSGDDRASFLQGLLCQDVAGQKTGTLRYGFFLS
PKARILFDSWIGVLPDRILLSPSLFSKEDEEAFLAHLKKYLFFRTKATLSSETGAFISAS
LVGPEALALATPLFDPEAEEEGVRRLSEGGFAFLRPGIGAFDADTGGWIDLWLPAEKAGD
RLKGLEERVLSRGGQRLDDTGIEVYRVERGIPAVPFELNESHFPAEAGLDTLAVSYNKGC
YVGQEPVTRLKFQGQLSRKLVGIRLDGPFVSEVTLPRHLLASNDNTEAGTLTSLVSSVVC
GGPVGLAYVKRGHWDSGEPLIDGEGNRFEVSELPLLPRE

>Gliotactin C-term domain
IMWRNAKRQSDRFYDEDVFINGEGLEPEQDTRGVDNAHMVTNHHALRSRD
NIYEYRDSPSTKTLASKAHTDTTSLRSPSSLAMTQKSSSQASLKSGISLK
ETNGHLVKQSERAATPRSQQNGSIAKVASPPVEEKRLLQPLSSTPVTQLQ
AEPAKRVPTAASVSGSSRSTTPVPSARSTTTHTTTATLSSQPAAQPRRTH
LVEG

Has any of them been crystallized? If so, compare its real secondary structure with the predicted one.

Some pre-compiled results: C6HZ97_jpred.html C6HZ97_psipred.pdf C6HZ97_sspro.txt Gliotactin_iupred.png Gliotactin_jpred.html


Try to infer topological models for these two sequences by retrieving as many structural features as posible: secondary structure, transmembrane, coiled-coil, unstructured/disordered regions, .... eventually also domains of known structure.
>Q96QS1|TSN32_HUMAN Tetraspanin-32 - Homo sapiens (Human).
MGPWSRVRVAKCQMLVTCFFILLLGLSVATMVTLTYFGAHFAVIRRASLEKNPYQAVHQW
AFSAGLSLVGLLTLGAVLSAAATVREAQGLMAGGFLCFSLAFCAQVQVVFWRLHSPTQVE
DAMLDTYDLVYEQAMKGTSHVRRQELAAIQDVFLCCGKKSPFSRLGSTEADLCQGEEAAR
EDCLQGIRSFLRTHQQVASSLTSIGLALTVSALLFSSFLWFAIRCGCSLDRKGKYTLTPR
ACGRQPQEPSLLRCSQGGPTHCLHSEAVAIGPRGCSGSLRWLQESDAAPLPLSCHLAAHR
ALQGRSRGGLSGCPERGLSD

>tr|D3LKI7|D3LKI7_MICLU LysM domain protein 
MDTMTLFTTSATRSRRATASIVAGMTLAGAAAVGFSAPAQAATVDTWDRLAECESNGTWD
INTGNGFYGGVQFTLSSWQAVGGEGYPHQASKAEQIKRAEILQDLQGWGAWPLCSQKLGL
TQADAEAGDVDATEAAPVAVERTATVQRQSAADEAAAEQAAAEQAAAEQAAADQAAAERW
AAKQAAAEQAAADKAAAQRAAAAEKAAAQKAAAAEQAAAAEEAVVAEAETIVVKSGDSLW
KLANEYEVEGGWTALYEANKGIVSDAAVIYVGQELVLPQA

Some pre-compiled results: Q96QS1_jpred.html Q96QS1_tmhmm.gif D3LKI7_tmhmm.gif D3LKI7_coils.gif

More sequences here, if you want to try at home.

1D prediction

Secondary structure
Transmembrane segments
Transmembrane helices
Transmembrane barrels
Coiled-coils
Disorder
Other 1D prediction tools

Try to model the 3D structure of this sequence:

>rpe_yeast
MVKPIIAPSILASDFANLGCECHKVINAGADWLHIDVMDGHFVPNITLGQPIVTSLRRS
VPRPGDASNTEKKPTAFFDCHMMVENPEKWVDDFAKCGADQFTFHYEATQDPLHLVKLI
KSKGIKAACAIKPGTSVDVLFELAPHLDMALVMTVEPGFGGQKFMEDMMPKVETLRAKF
PHLNIQVDGGLGKETIPKAAKAGANVIVAGTSVFTAADPHDVISFMKEEVSKELRSRDL
LD
Is it possible to build a model by homology? If so, which template? Evaluate the model and visualize it using pymol

Some pre-compiled results: rpe_sm.html


Try to model the 3D structure of this other protein:
>tafazzin
MPLHVKWPFPAVPPLTWTLASSVVMGLVGTYSCFWTKYMNHLTVHNREVLYELIEKRGPA
TPLITVSNHQSCMDDPHLWGILKLRHIWNLKLMRWTPAAADICFTKELHSHFFSLGKCVP
VCRGAEFFQAENEGKGVLDTGRHMPGAGKRREKGDGVYQKGMDFILEKLNHGDWVHIFPE
GKVNMSSEFLRFKWGIGRLIAECHLNPIILPLWHVGMNDVLPNSPPYFPRFGQKITVLIG
KPFSALPVLERLRAENKSAVEMRKALTDFIQEEFQHLKTQAEQLHNHLQPGR
Is it posible to do it by homology? If not, try first "remote homology detection" and then threading/fragment-based.

NOTES:

  • This is always the order to try, in increasing order of difficulty and decreasing order of accuracy: homology modelling (close homology) > remote homology detection > fold recognition/threading
  • If you have to go for threading, it is advisable to predict structural features (e.g. 1D: secondary structure, transmembrane segments, disordered regiosn, ... see above) and use them for filtering the malternative threading models.
  • Moreover, any information you might have on the protein could help in the modelling. Take a look at its Uniprot page: Tafazzin (TAZ_HUMAN)
  • Since threading programs produce many alternative models and you usually run different programs, it is also advisable to take a look at the structural classification of the proposed templates to assess whether these correspond to the same fold.

Some pre-compiled results: results from various programs

Homology detection

Homology modeling

Validation software

Visualization software

Remote homology detection

Fold recognition, fragment-based threading