Practical Protein Bioinformatics
Book web site with updated links to tools and resources
for protein structure prediction and analysis.


Exercises

Resources (open in new tabs)

Let's have a look at the available experimental structural information for E. coli chaperone DnaK:

>DnaK E coli
MGKIIGIDLGTTNSCVAIMDGTTPRVLENAEGDRTTPSIIAYTQDGETLVGQPAKRQAVTNPQNTL
FAIKRLIGRRFQDEEVQRDVSIMPFKIIAADNGDAWVEVKGQKMAPPQISAEVLKKMKKTAEDYLG
EPVTEAVITVPAYFNDAQRQATKDAGRIAGLEVKRIINEPTAAALAYGLDKGTGNRTIAVYDLGGG
TFDISIIEIDEVDGEKTFEVLATNGDTHLGGEDFDSRLINYLVEEFKKDQGIDLRNDPLAMQRLKE
AAEKAKIELSSAQQTDVNLPYITADATGPKHMNIKVTRAKLESLVEDLVNRSIEPLKVALQDAGLS
VSDIDDVILVGGQTRMPMVQKKVAEFFGKEPRKDVNPDEAVAIGAAVQGGVLTGDVKDVLLLDVTP
LSLGIETMGGVMTTLIAKNTTIPTKHSQVFSTAEDNQSAVTIHVLQGERKRAADNKSLGQFNLDGI
NPAPRGMPQIEVTFDIDADGILHVSAKDKNSGKEQKITIKASSGLNEDEIQKMVRDAEANAEADRK
FEELVQTRNQGDHLLHSTRKQVEEAGDKLPADDKTAIESALTALETALKGEDKAAIEAKMQELAQV
SQKLMEIAQQQHAQQQTAGADASANNAKDDDVVDAEFEEVKDKK

Download a PDB file with a structure for this protein and visualize it with PyMol.

Sequence searches

Primary protein structure resources

PDB-derived data

Visualization software


Compare the structures of Rac unbound (PDB 1mh1) and Rac interacting with an effector, a toxin that modulates its function (PDB 1he1). What are the main structural diferences? Can these diferences explain the toxin❜s effect?

Compare the structures of DnaK (PDB 5nro_A) and Hexokinase (PDB 3hm8_A), two distant homologs. Try rigid and flexible structural alignment.

Take a look at how Rac (PDB 1mh1) and DnaK (5nro_A) are classified in CATH.

Map residue evolutionary conservation on DnaK structure (5nro_A) with the ConSurf server.

Software for structural alignment

Pair-wise and database searches
Multiple structure alignment
Flexible alignments

Protein structure classification

Map evolutionary conservation on 3D structures


Predict the secondary structure of these two proteins.
>tr|C6HZ97|C6HZ97_9BACT Putative aminomethyltransferase 
MMTTIDTRPSHIRAGLHIAPRTRVLVSVSGDDRASFLQGLLCQDVAGQKTGTLRYGFFLS
PKARILFDSWIGVLPDRILLSPSLFSKEDEEAFLAHLKKYLFFRTKATLSSETGAFISAS
LVGPEALALATPLFDPEAEEEGVRRLSEGGFAFLRPGIGAFDADTGGWIDLWLPAEKAGD
RLKGLEERVLSRGGQRLDDTGIEVYRVERGIPAVPFELNESHFPAEAGLDTLAVSYNKGC
YVGQEPVTRLKFQGQLSRKLVGIRLDGPFVSEVTLPRHLLASNDNTEAGTLTSLVSSVVC
GGPVGLAYVKRGHWDSGEPLIDGEGNRFEVSELPLLPRE

>Gliotactin C-term domain
IMWRNAKRQSDRFYDEDVFINGEGLEPEQDTRGVDNAHMVTNHHALRSRD
NIYEYRDSPSTKTLASKAHTDTTSLRSPSSLAMTQKSSSQASLKSGISLK
ETNGHLVKQSERAATPRSQQNGSIAKVASPPVEEKRLLQPLSSTPVTQLQ
AEPAKRVPTAASVSGSSRSTTPVPSARSTTTHTTTATLSSQPAAQPRRTH
LVEG

Has any of them been crystallized? If so, compare its real secondary structure with the predicted one.
Try to infer topological models for these two sequences by retrieving as many structural features as posible: secondary structure, transmembrane, coiled-coil, unstructured/disordered regions, .... eventually also domains of known structure.
>Q96QS1|TSN32_HUMAN Tetraspanin-32 - Homo sapiens (Human).
MGPWSRVRVAKCQMLVTCFFILLLGLSVATMVTLTYFGAHFAVIRRASLEKNPYQAVHQW
AFSAGLSLVGLLTLGAVLSAAATVREAQGLMAGGFLCFSLAFCAQVQVVFWRLHSPTQVE
DAMLDTYDLVYEQAMKGTSHVRRQELAAIQDVFLCCGKKSPFSRLGSTEADLCQGEEAAR
EDCLQGIRSFLRTHQQVASSLTSIGLALTVSALLFSSFLWFAIRCGCSLDRKGKYTLTPR
ACGRQPQEPSLLRCSQGGPTHCLHSEAVAIGPRGCSGSLRWLQESDAAPLPLSCHLAAHR
ALQGRSRGGLSGCPERGLSD

>tr|D3LKI7|D3LKI7_MICLU LysM domain protein 
MDTMTLFTTSATRSRRATASIVAGMTLAGAAAVGFSAPAQAATVDTWDRLAECESNGTWD
INTGNGFYGGVQFTLSSWQAVGGEGYPHQASKAEQIKRAEILQDLQGWGAWPLCSQKLGL
TQADAEAGDVDATEAAPVAVERTATVQRQSAADEAAAEQAAAEQAAAEQAAADQAAAERW
AAKQAAAEQAAADKAAAQRAAAAEKAAAQKAAAAEQAAAAEEAVVAEAETIVVKSGDSLW
KLANEYEVEGGWTALYEANKGIVSDAAVIYVGQELVLPQA

>Rpf (resuscitation-promoting factor)
MTLFTTSATRSRRATASIVAGMTLAGAAAVGFSAPAQAATVDTWDRLAECESNGTWDINT
GNGFYGGVQFTLSSWQAVGGEGYPHQASKAEQIKRAEILQDLQGWGAWPLCSQKLGLTQA
DADAGDVDATEAAPVAVERTATVQRQSAADEAAAEQAAAAEQAVVAEAETIVVKSGDSLW
TLANEYEVEGGWTALYEANKGAVSDAAVIYVGQELVLPQA

Some pre-compiled results: Q96QS1_jpred.html Q96QS1_tmhmm.gif D3LKI7_tmhmm.gif D3LKI7_coils.gif

More sequences here, if you want to try at home.

1D prediction

Secondary structure
Transmembrane segments
Transmembrane helices
Transmembrane barrels
Coiled-coils
Disorder
Other 1D prediction tools

Try to model the 3D structure of this sequence:

>rpe_yeast
MVKPIIAPSILASDFANLGCECHKVINAGADWLHIDVMDGHFVPNITLGQPIVTSLRRS
VPRPGDASNTEKKPTAFFDCHMMVENPEKWVDDFAKCGADQFTFHYEATQDPLHLVKLI
KSKGIKAACAIKPGTSVDVLFELAPHLDMALVMTVEPGFGGQKFMEDMMPKVETLRAKF
PHLNIQVDGGLGKETIPKAAKAGANVIVAGTSVFTAADPHDVISFMKEEVSKELRSRDL
LD
Is it possible to build a model by homology? If so, which template? Evaluate the model and visualize it using pymol

Some pre-compiled results: rpe_sm.html


Try to model the 3D structure of this other protein:
>Light-mediated development protein Arabidopsis thaliana
MFTSGNVTARVFERQIRTPPPGASVNRARHFYENLVPSYTLYDVESPDHCFRKFTEDGLF
LISFSRNHQELIVYRPSWLTYSTTDDSTTTLPPLPRRASKFDSFFTQLYSVNLASSNELI
CKDFFLYHQTRRFGLFATSTAQIHDSSSPSNDAVPGVPSIDKITFVLLRLDDGVVLDERV
FLHDFVNLAHNMGVFLYDDLLAILSLRYQRIHLLQIRDSGHLVDARAIGYFCREDDELFL
NSSSQAMMSQDKSKQQSLSGSKEDDTGENGLRHSLSQPSGSNSFLSGVKQRLLSFIFREI
WNEESDNVMRVQSLKKKFYFHFQDYVDLIIWKVQFLDRQHLLIKFGSVDGGVTRSADHHP
AFFAVYNMETTDIVAFYQNSAEDLYQLFEQFSDHFTVSSSTPFMNFVTSHSNNVYALEQL
KYTKNKSNSFSQFVKKMLLSLPFSCQSQSPSPYFDQSLFRFDEKLISAADRHRQSSDNPI
KFISRRQPQTLKFKIKPGPECGTADGRSKKICSFLFHPHLPLAISIQQTLFMPPSVVNIH
FRR

Is it posible to do it by homology?

If not... Look for it in AlphaFold Protein Structure Database (pre-generated models).

Other AF models to take a look: Calmodulin, NAC domain-containing protein 94

Homology detection

Homology modeling

Validation software

Visualization software

AlphaFold