[CSBG main page]
Practical Protein Bioinformatics
Florencio Pazos & Mónica Chagoyen
Springer (2015)
This book describes more than 60 web-accessible computational tools for protein analysis. It contains detailed explanations on how to use these tools and interpret their results, and it is totally practical, with minimal mentions to their theoretical basis (only when that is required for making a better use of them). It covers a wide range of tools for dealing with different aspects of proteins, from their sequences, to their three-dimensional structures, and the biological networks they are immersed in. The selection of tools is based on the experience of the authors that lead a protein bioinformatics facility in a large research centre, with the additional constraint that the tools should be accessible through standard web browsers without requiring the local installation of specific software, command-line tools, etc.
The web tools covered include those aimed to retrieve protein information, look for similar proteins, generate pair-wise and multiple sequence alignments of protein sequences, work with protein domains and motifs, study the phylogeny of a family of proteins, retrieve, manipulate and visualize protein three-dimensional structures, predict protein structural features as well as whole three-dimensional structures, extract biological information from protein structures, summarize large protein sets, study protein interaction and metabolic networks, etc.
This monograph will be valuable for researchers in experimental labs without specific knowledge on bioinformatics or computing.
This is the web site associated to the book. It will keep an updated list of the tools commented in the book, reflecting changes in their web addresses, etc. In case a tool becomes unavailable, this site will reflect that and suggest alternatives.
More info on the book: [Springer] [Amazon]
Sequences
ReadSeq - Conversion between various sequence file formats
UniProt - Protein sequences and associated functional information
ProtParam - Calculation of various physico-chemical parameters of proteins
ProtScale - Graphical representation of a property profile along the protein sequence
EMBOSS Needle - Global alignment of two protein sequences
EMBOSS Water - Local alignment of two protein sequences
NCBI BLAST - Search for similar protein sequences in databases
Clustal Omega - Multiple alignment of protein sequences
MAFFT - Multiple alignment of protein sequences
Expresso - Multiple sequence alignment guided by structural information
Myhits reformat MSA - Conversion between multiple sequence alignment file formats
Jalview - Visualization and editing of multiple sequence alignments
Alignment-Annotator - On-line visualization and manipulation of MSAs
EMBOSS cons - Generation of consensus sequences of MSAs
Weblogo - Generation of sequence logos from multiple sequence alignments
PSI-BLAST - Iterative BLASTp profile-based search
HMMER hmmsearch - Search for sequences matching a MSA
HHPred - Search for profiles matching a MSA
InterPro - Integration of different family/domain databases
Pfam - Database of protein families and domains
PROSITE - Database of protein families, domains and recurrent short sequence motifs
SUPERFAMILY - HMM profiles for SCOP superfamilies
ClustalW2 phylogeny - Generation of phylogenetic trees from multiple sequence alignments
Phylodendron - Graphical representation of phylogenetic trees
iTol - Graphical interactive representation of phylogenetic trees and datasets associated to their elements
Phylemon2 - On-line tools for phylogenetic analysis
Phylogeny.fr - On-line tools for phylogenetic analysis
Structures
PDB - Main primary database on protein 3D structures
PDBsum - Derived data on protein 3D structures
SCOP - Hierarchical classification of protein domains of known 3D structure
CATH - Hierarchical classification and annotation of protein domains
JMol - Java applet for visualizing and manipulating protein structures
RCSB-PDB protein comparison tool - Structural alignment of two proteins (or domains) deposited in PDB
Dali_lite - Structural alignment of two protein structures
Dali database - Precompiled structural homologs for a protein deposited in PDB
Dali server - Retrieve structural homologs for a generic 3D structure
PDB eFold - Structural searches against the whole PDB. Also pairwise structural alignment and multiple structural aligment.
JPred - Prediction of secondary structure and solvent accessibility
PSIPRED - Prediction of secondary structure and solvent accessibility
TMHMM - Prediction of transmembrane helices and transmembrane topology
Phobius - Simultaneous prediction of signal peptide and transmembrane helices
BOCTOPUS - Prediction of transmembrane strands
COILS - Prediction of coiled-coils
LOGICOIL - Prediction of coiled-coil regions and their oligomeric state
DISOPRED - Prediction of disordered regions
IUPred - Prediction of disordered regions
ANCHOR - Prediction of disordered regions involved in protein interactions
SignalP - Prediction of signal peptides
Swiss-Model - Prediction of protein structure (homology-based approach)
I-TASSER - Prediction of protein structure (fragment-based approach)
ModFOLD - Estimation of the global and local quality of protein structural models
ConSurf - Analysis and mapping of residue conservation in protein structures
LPC/CSU server - Analysis of protein-ligand interactions
IsoCleft Finder - Detection of clefts in structures and search for structures with similar clefts
CASTp - Identification of surface pockets and internal cavities
MOLEonline - Identification of tunnels, pores and cavities in structures
Systems
Gene Ontology - Structured representation of molecular biology knowledge, and annotations of gene products
DAVID gene - Functional (enrichment) analysis of a set of proteins
STRING - Protein interactions and functional relationships inferred from diverse evidences
KEGG - Integration of chemical, genomic, systemic and functional biological information
iPath - Interactive visualization and manipulation of metabolic maps
Reactome - Pathway mapping and analysis of a set of proteins. Focused on signaling pathways