nature genetics
    A user's guide to the human genome

Return to TOC
Previous Article AbstractFULL TEXTNext Article Abstract
Full Text PDF

volume 32 supplement pp 18 - 20

Question 2
How can sequence-tagged sites within a DNA sequence be identified?

The NCBI's electronic PCR (e-PCR) tool12, which is part of the UniSTS resource, can be used to find STS markers within a DNA fragment of interest. UniSTS ( contains all the available data on STS markers, including primer sequences, product size, mapping information and alternative names. Links to other NCBI resources such as Entrez, LocusLink and the MapViewer are also provided. e-PCR looks for potential STSs in a DNA sequence by searching for subsequences with the correct orientation and distance that could represent the PCR primers used to generate known STSs.

The e-PCR home page can be found by going to the NCBI home page, at, and then following the Electronic PCR link in the right-hand column. On the e-PCR home page, paste the sequence of interest or enter an accession number into the large text box at the top of the page. The accession number of the sequence for this example is AF288398. This sequence contains only one STS, stSG47693, which is located between nucleotides (nt) 2102 and 2232 of the sequence under study (Fig. 2.1).

Click on the marker name to bring up details of the STS from UniSTS (Fig. 2.2). The primer information and PCR product size are listed at the top of the page, along with alternative names for the marker. Often STSs are known by different names on different maps. Cross-references to LocusLink, UniGene and the Genebridge 4 map to which this STS was mapped are shown next. The mapping information section contains links to the NCBI's MapViewer. At the bottom of the page, the Electronic PCR results show other sequences, including contigs, mRNAs and ESTs that may contain this STS marker.

To see the genomic context of the STS marker in all maps to which it has been mapped, click on the link labeled MapViewer at the top of the Mapping Information section. This map view (Fig. 2.3) shows two maps. Note that, in this view, the STS stSG47693 is called RH92759 (highlighted in pink). Gene Map '99–Genebridge 4 (GM99_GB4, left) has 46,000 STS markers mapped onto the GB4 RH panel by the International Radiation Hybrid Consortium. The STS map (right) shows the NCBI's placement of STSs onto the genome sequence assembly using e-PCR. Gray lines connect markers that appear in both maps, whereas the red line denotes where the STS RH92759 appears on both maps. In the region shown, there are a total of 211 STSs on the STS map, but only 20 are labeled in this view. To the right of the STS map, the green and yellow circles show the maps on which the STS markers have been placed. One can zoom in or out of this view by clicking on the lines of the zoom tool in the left sidebar.

  1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). | Article | PubMed |
  2. Collins, F.S. and McKusick, V.A. Implications of the Human Genome Project for medical science. J. Am. Med. Assoc. 285, 540-544 (2001).
  3. Watson, J.D. & Crick, F.H.C. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171, 737-738 (1953).
  4. Green, E.D. Strategies for the systematic sequencing of complex genomes. Nature Rev. Genet. 2, 573-583 (2001). | Article | PubMed |
  5. Ouellette, B.F.F. & Boguski, M.S. Database divisions and homology search files: a guide for the perplexed. Genome Res. 7, 952-955 (1997). | PubMed |
  6. Bairoch, A. & Apweiler, R. The SWISS-PROT Protein Sequence Database and its supplement TREMBL in 2000. Nucleic Acids Res. 28, 45-48 (2000). | Article | PubMed |
  7. Hubbard, T. et al. The Ensembl Genome Database Project. Nucleic Acids Res. 30, 38-41 (2002). | Article | PubMed |
  8. Kent, W.J. BLAT--the BLAST-like Alignment Tool. Genome Res. 12, 656-664 (2002). | Article | PubMed |
  9. Stein, L. Genome annotation: from sequence to biology. Nature Rev. Genet. 2, 493-503 (2001). | Article | PubMed |
  10. Pruitt, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137-140 (2001). | Article | PubMed |
  11. Burge, C.B. & Karlin, S. Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 8, 346-354 (1998). | Article | PubMed |
  12. Schuler, G.D. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol. 16, 456-459 (1998). | Article | PubMed |
  13. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311 (2001). | Article | PubMed |
  14. Hamosh, A. et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52-55 (2002). | Article | PubMed |
  15. Baxevanis, A.D. & Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (John Wiley & Sons, New York, 2001).
  16. Solovyev, V.V., Salamov, A.A. & Lawrence, C.B. Identification of human gene structure using linear discriminant functions and dynamic programming. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 367-375 (1995). | PubMed |
  17. Yeh, R.F., Lim, L.P. & Burge, C.B. Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803-816 (2001). | Article | PubMed |
  18. Marchler-Bauer, A. et al. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281-283 (2002). | Article | PubMed |
  19. Apweiler, R. et al. InterPro--an integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16, 1145-1150 (2000). | Article | PubMed |
  20. Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656-664 (1998). | Article | PubMed |
  21. Blake, J.A., Richardson, J.E., Bult, C.J., Kadin, J.A. & Eppig, J.T. The Mouse Genome Database (MGD): the model organism database for the laboratory mouse. Nucleic Acids Res. 30, 113-115 (2002). | Article | PubMed |
  22. Hudson, T.J. et al. A radiation hybrid map of mouse genes. Nature Genet. 29, 201-205 (2001). | Article | PubMed |
  23. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276-280 (2002). | Article | PubMed |
  24. Letunic, I. et al. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242-244 (2002). | Article | PubMed |
  25. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402 (1997). | Article | PubMed |
  26. Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, 1998).
  27. Peri, S., Ibarrola, N., Blagoev, B., Mann, M. & Pandey, A. Common pitfalls in bioinformatics-based analyses: look before you leap. Trends Genet. 17, 541-545 (2001) [erratum Trends Genet. 18, 218 (2002)]. | Article | PubMed |
  28. Ponting, C. Issues in predicting protein function from sequence. Brief. Bioinform. 2, 19-29 (2001). | PubMed |
  29. Aparicio, S.A.J.R. How to count ... human genes. Nature Genet. 25, 129-130 (2000). | Article | PubMed |
  30. Beadle, G.W. & Tatum, E.L. Genetic control of biochemical reactions in Neurospora. Proc. Natl Acad. Sci. USA 27, 499-506 (1941).
  31. Jeffery, C.J., Bahnson, B.J., Chien, W., Ringe, D. & Petsko, G.A. Crystal structure of rabbit phosphoglucose isomerase, a glycolytic enzyme that moonlights as neuroleukin, autocrine motility factor, and differentiation mediator. Biochemistry 39, 955-964 (2000). | Article | PubMed |
  32. Wistow, G. & Piatigorsky, J. Recruitment of enzymes as lens structural proteins. Science 236, 1554-1556 (1987). | PubMed |
  33. Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8-11 (1999). | Article | PubMed |
  34. Chothia, C. Proteins. One thousand families for the molecular biologist. Nature 357, 543-544 (1992). | PubMed |
  35. Hegyi, H. & Gerstein, M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147-164 (1999). | Article | PubMed |
  36. Jansen, R. & Gerstein, M. Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 28, 1481-1488 (2000). | Article | PubMed |
  37. Brenner, S.E. Errors in genome annotation. Trends Genet. 15, 132-133 (1999). | Article | PubMed |
  38. Smith, R.F. Perspectives: sequence data base searching in the era of large-scale genomic sequencing. Genome Res. 6, 653-660 (1996). | PubMed |

Copyright 2002 Nature Publishing