The aim of these exercises is to become acquainted with Entrez and with
several NCBI databases.
A. Retrieving nucleotide and protein
sequences, and accessing information related to them.
Obtaining information about your favorite genes (in this case, bofC
and
csbX,
from
Bacillus
subtilis)
Open Netscape or a similar browser and make a connection to the NCBI server,
at the address: http://www.ncbi.nlm.nih.gov/
Open Entrez, the general interface to NCBI's databases.
Search the Nucleotide database with the expression Bacillus
subtilis bofC.
You will see a list of entries that contain, in any field and in any order,
those three terms.
Because it may give you a faster answer, select the entry that contains
less genes (X93081).
A complete entry in GenBank format will be displayed. From the Header
section you will learn that the source of the sequence contained in entry
X93081 is Bacillus subtilis. From the Features section, you will learn
that the sequence encodes four genes, two of which have bofC and
csbX
as
names.
Select Send to text, to generate a plain text version of the entry,
that you could save.
Return to the HTML document that contains the entry in GenBank format.
Select Display FASTA, to obtain the nucleotide sequence in FASTA
format.
Select Display Graph, to obtain a linear map and the sequence
of entry X93081.
Return to the HTML document that contains the entry in GenBank format.
Search, in the Features section, the part were the second CDS is described.
In addition of having the aminoacid translation of the CDS on this document,
you can jump to the Protein database.
To do it, select the protein accession number (CAA63620.1).
You will access the GenBank formatted entry of protein CAA63620.1.
By clicking on the link Blink you will gain access to information
about proteins with a similar sequence to that in CAA63620.1: first, you
will learn that bofC has been assigned to a certain family or group of
proteins from the COG database; second, you will get a list proteins that
have been identified as similar to BofC by performing a standard BLAST
search.
By clicking on the link Conserved Domains, you will get information about
the presence in BofC of conserved domains. In this case, the protein includes
a conserved domain identified in the Pfam database, which is characteristic
of sugar transporters.
By clicking on the graphical representation of the domain, you will
get a multiple sequence alignment of the family members.
By clicking on "Search for similar domain architectures", you will get a graphical representation
of proteins that include related domains.
B. Accessing the Taxonomy database.
Browsing the taxonomy database to retrieve molecular information about
your favorite group of organisms (in this case, Viruses).
Open Netscape or a similar browser and make a connection to the NCBI server,
at the address: http://www.ncbi.nlm.nih.gov/
Open Entrez, the general interface to NCBI's databases.
Select the Taxonomy database and then click on Tax Browser
, or on the word "tree" that appears in the explanatory text of
the upper part.
Select Viruses
Select Protein y Structure (for example) and click on Display.
You will get information about how many protein sequence and protein structure
entries exist in the NCBI databases, for each of the viral species included
in the list. Both the entries with protein sequences and the entries with
structural information can be accessed from here.
Return to the main page of Taxonomy..
Select Arabidopsis thaliana.
You will see a document that contains links to:
Taxonomic information about Arabidopsis.
A table with links to information about each of the Arabidopsis thaliana
chromosomes.
By clicking on the chromosome identifiers you will access pages with chromosomal
maps, and links to files for all the genes and proteins from Arabidopsis.
A table that compiles links to entries in other NCBI databases, related
with Arabidopsis.
Links to databases at other institutions, that also contain information
about Arabidopsis.
C. Accessing the genetic diseases database (OMIM) y navigating
through LocusLink.
Obtaining general and molecular information about your favorite genetic
disease (in this case, alopecia).
Open Netscape or a similar browser and make a connection to the NCBI server,
at the address: http://www.ncbi.nlm.nih.gov/
Open Entrez, the general interface to NCBI's databases.
Select the OMIM database.
Enter the term alopecia and click on GO.
You will find that there are more than 162 entries (that appear in groups
of 20), that contain the term alopecia in their description.
Return to the main page of OMIM.
Select Search Morbid Map, on the right side of the page. You will
get a catalog of genetic diseases.
Search alopecia universalis.
By clicking on the numeric identifier of the column labeled as Disorder
, you will access a document with information about the disease, supported
with bibliographic information.
In the column labeled as Location you will see information about
the chromosomal location of mutations that have been associated with the
disease: chromosome number 8, short arm (identified with the letter "p";
the long arm is represented with the letter "q"). The number "21.2" indicates
the location in terms of the band, as detected by the Giemsa
method.
By clicking on the numeric identifier of the column labeled as OMIM,
you will access to a document with information about the locus or gene
that has been identified as affected by the mutations that cause the disease,
and links to other sources of information.
By clicking on Entrez Gene, on the left side, you will access to the
main page of this locus, which contains:
A diagram (the red line and rectangles in the upper part) that shows the
structure of the gene in terms of exons and introns. From this diagram
you can access to information about evidences that support the proposed
structure.
A collection of links to other databases with information about the locus.
A summary with information about several aspects of the gene: tissue specific
expression, annotations from GenRIF, classification according to the Gene
Ontology, related entries in nucleotide and protein sequence databases,
homologous genes, etc.
Among the databases that can be accessed from the Entrez gene entry, you
will find:
PubMed
OMIM: genetic diseases
MAP, chromosomal maps, from which the nucleotide sequence can be
retrieved.
RefSeq
GenBank
Protein, entries in the protein sequence database.