NCBI: Entrez Home

Practical lesson 3

The NCBI WWW server and Entrez, part 2

By Manuel J Gómez, CNB, CSIC. & updated by Juan Carlos Sanchez, CNB, CSIC

The aim of these exercises is to learn how to make field delimited text searches and to use the Limits and History forms of Entrez.
A list of Limits Available, Search Fields Available, Search Field Descriptors and Display Formats is available, following this link.



A. Simple Searching
  1. Look for all "photosystem" related sequences in the Nucleotide database (use wildcard "*").
  2. How many spinach sequences exist in the Nucleotide databank?
  3. And, how many in the Protein, Structure or Genome databanks?
  4. Display the FASTA view of the Protein entry AAD02267.
  5. Display the graphics view of the Nucleotide entry corresponding to the Protein entry AAD02267.
  6. How many non EST Nucleotide sequences were published by someone named "Jones"?
  7. How many potato polypeptides are included into the Structure database?
  8. Search for all plant proteins with a molecular weight range from 50,000 to  50,050 dalton (use field range format "050000:050050[MOLWT]").
  9. Search for all plant proteins with a sequence length from 300 to 310 aminoacids.
  10. How many spinach proteins have less than 50 amino acids in length?
  11. How big is the BIGGEST protein in the Protein database?
  12. How many articles have been written by someone called Ras?
  13. What kind of protein or peptide is known as Ramos?


B. Refining your search
  1. How many non EST Nucleotide sequences were published by someone named "Smith" and have a sequence length of 3000 to 4000 nucleotides? How many of them were not published in 1999? Perform three independent searches and use "History" to combine them [((#1 AND #2) NOT #3)]
  2. What are the differences among  [((#1 AND #2) NOT #3)] / [(#1 AND (#2 NOT #3)] / [((#1 AND #2) OR #3)] / [((#1 NOT #2) AND #3)]?
  3. Search for all plant rRNA sequences in the Nucleotide database (use "Limits" to restrict to rRNA in the "Molecule" pull-down menu).
  4. Search for all arabidopsis mitochondrion genes (using "Limits").
  5. Search for all arabidopsis chloroplast genes in the Nucleotide database updated in the last year.
  6. Retrieve all genomic plant genes in the Nucleotide database with a publication date from 1990 to 1995.
  7. Using "Index", obtain the number of tomato sequences in the Nucleotide database. How many non-tomato entries contain the word "tomato"?
  8. Search for all the protein sequences from chloroplasts of spinach, tomato and potato.
  9. Search for all the genomic sequences of protein kinases from arabidopsis.
  10. Using history, get all glucanase sequences from spinach, tomato and potato, excluding ESTs and patents.


C. Link Out
  1. Obtain the domain distribution of the protein sequence of "phosphoinositide specific phospholipase C" from Arabidopsis thaliana.

July 2007
Manuel J. Gómez
Updated by Juan Carlos Sanchez
Grupo de Diseño de Proteínas
Centro Nacional de Biotecnología, CSIC