nature genetics
    A user's guide to the human genome

Return to TOC
Previous Article AbstractFULL TEXTNext Article Abstract
Full Text PDF

volume 32 supplement pp 49 - 52

Question 8
How can one find all the members of a human gene family?

The HUGO Gene Nomenclature Committee ( has been working to develop a unique symbol, as well as a longer and more descriptive name, for each human gene. Thus, members of many gene families, previously cloned in different laboratories and known by a variety of terms, now share a common gene symbol. A text search in any of the genome browsers will often return links to all named members of a gene family that have been mapped to the genome. Whereas Ensembl and UCSC currently return lists of the genes, the NCBI presents both a list and a graphical overview.

Go to the NCBI home page at and click on the Human map viewer link on the right side to access the Map Viewer search page. Enter the term 'ADAM*[sym]' in the text query box. The asterisk, or wild card, will match any character, whereas the term [sym] limits the search to items with ADAM as their gene symbol. Other advanced search options are available by clicking the Advanced Search box or by reading the online documentation. The search returns 41 hits, which include members of the ADAM family as well as other related families whose names start with the term 'ADAM', such as ADAMTS and ADAMDEC. To limit the search to ADAM genes only, eliminate the undesired gene symbols with the Boolean NOT term, using the query ADAM*[sym] NOT ADAMTS*[sym] NOT ADAMDEC1*[sym]. The graphic at the top of the returned page shows the location of each gene with a red tick mark (Fig. 8.1). It is immediately clear that the 19 mapped ADAM genes are distributed among 11 chromosomes, and that some, such as those at the tips of the q arms of chromosomes 10 and 14, are close together. The list at the bottom of the page presents links to the 19 genes.

Another way to search for homologous genes in the genome is through a basic local alignment search tool (BLAST) search at the NCBI or Ensembl. BLAT searches at UCSC are not as sensitive as BLAST searches and may not find as many homologous genes. In this example, all genomic sequences homologous to the ADAM2 protein will be found using the Ensembl BLAST interface. From the Ensembl Human home page at, click on the link to BLAST. Paste the sequence of the ADAM2 protein (GenBank accession NP_001455.2) into the query box (having obtained the protein sequence from the NCBI's Entrez database by following the steps in Question 5). Set the database to Homo sapiens, genomic sequence to search the Ensembl genome assembly, and choose TBLASTN as the executable (Fig. 8.2). Use the default parameters for the remaining settings. When done, click Search. The returned page will contain a retrieval ID (Fig. 8.3), which, when the search is finished, will link to the search results page (Fig. 8.4).

The top of the results page shows a graphical overview of the locations of hits. These hits may be to the entire protein or just to a single domain. The hits are colored by BLAST score, red being most similar, blue least similar and green intermediate. Some of the hits, like the pairs on the q arms of chromosomes 10 and 14, lie in positions similar to those of ADAMs mapped by the NCBI (Fig. 8.1), but others, such as those on chromosomes 12 and Y, are unique to the BLAST search. These unique hits may represent real members of the ADAM family that have not yet been named and would therefore not show up in a text-based search. Alternatively, they may be unnamed pseudogenes or nonsignificant BLAST hits. One gene on chromosome 1 is found in the text-based search at the NCBI but not in the BLAST search at Ensembl. The similarity between this gene and ADAM2 is not high enough for it to appear in the BLAST search using the default Ensembl parameters.

Clicking on an arrow next to one of the hits shown in Figure 8.4 activates a pop-up menu that gives the details of the BLAST report and provides links to the BLAST alignment and the ContigView (Figs 8.5 and 8.6, respectively, for the hit on chromosome 12). The hit on chromosome 12 contains a stop codon and is probably an intronless pseudogene. The bottom of the results page (Fig. 8.4) shows a summary of the BLAST hits. Clicking on a hit links to the BLAST alignment (Fig. 8.5). A link in the middle of the results page (Fig. 8.4) provides the entire BLAST report in standard format. Clicking on a hit in the BLAST report retrieves the ContigView for the region around the hit (similar to what is shown in Fig. 8.6).

  1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). | Article | PubMed |
  2. Collins, F.S. and McKusick, V.A. Implications of the Human Genome Project for medical science. J. Am. Med. Assoc. 285, 540-544 (2001).
  3. Watson, J.D. & Crick, F.H.C. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171, 737-738 (1953).
  4. Green, E.D. Strategies for the systematic sequencing of complex genomes. Nature Rev. Genet. 2, 573-583 (2001). | Article | PubMed |
  5. Ouellette, B.F.F. & Boguski, M.S. Database divisions and homology search files: a guide for the perplexed. Genome Res. 7, 952-955 (1997). | PubMed |
  6. Bairoch, A. & Apweiler, R. The SWISS-PROT Protein Sequence Database and its supplement TREMBL in 2000. Nucleic Acids Res. 28, 45-48 (2000). | Article | PubMed |
  7. Hubbard, T. et al. The Ensembl Genome Database Project. Nucleic Acids Res. 30, 38-41 (2002). | Article | PubMed |
  8. Kent, W.J. BLAT--the BLAST-like Alignment Tool. Genome Res. 12, 656-664 (2002). | Article | PubMed |
  9. Stein, L. Genome annotation: from sequence to biology. Nature Rev. Genet. 2, 493-503 (2001). | Article | PubMed |
  10. Pruitt, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137-140 (2001). | Article | PubMed |
  11. Burge, C.B. & Karlin, S. Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 8, 346-354 (1998). | Article | PubMed |
  12. Schuler, G.D. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol. 16, 456-459 (1998). | Article | PubMed |
  13. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308-311 (2001). | Article | PubMed |
  14. Hamosh, A. et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52-55 (2002). | Article | PubMed |
  15. Baxevanis, A.D. & Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (John Wiley & Sons, New York, 2001).
  16. Solovyev, V.V., Salamov, A.A. & Lawrence, C.B. Identification of human gene structure using linear discriminant functions and dynamic programming. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 367-375 (1995). | PubMed |
  17. Yeh, R.F., Lim, L.P. & Burge, C.B. Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803-816 (2001). | Article | PubMed |
  18. Marchler-Bauer, A. et al. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281-283 (2002). | Article | PubMed |
  19. Apweiler, R. et al. InterPro--an integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16, 1145-1150 (2000). | Article | PubMed |
  20. Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656-664 (1998). | Article | PubMed |
  21. Blake, J.A., Richardson, J.E., Bult, C.J., Kadin, J.A. & Eppig, J.T. The Mouse Genome Database (MGD): the model organism database for the laboratory mouse. Nucleic Acids Res. 30, 113-115 (2002). | Article | PubMed |
  22. Hudson, T.J. et al. A radiation hybrid map of mouse genes. Nature Genet. 29, 201-205 (2001). | Article | PubMed |
  23. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276-280 (2002). | Article | PubMed |
  24. Letunic, I. et al. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242-244 (2002). | Article | PubMed |
  25. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402 (1997). | Article | PubMed |
  26. Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, 1998).
  27. Peri, S., Ibarrola, N., Blagoev, B., Mann, M. & Pandey, A. Common pitfalls in bioinformatics-based analyses: look before you leap. Trends Genet. 17, 541-545 (2001) [erratum Trends Genet. 18, 218 (2002)]. | Article | PubMed |
  28. Ponting, C. Issues in predicting protein function from sequence. Brief. Bioinform. 2, 19-29 (2001). | PubMed |
  29. Aparicio, S.A.J.R. How to count ... human genes. Nature Genet. 25, 129-130 (2000). | Article | PubMed |
  30. Beadle, G.W. & Tatum, E.L. Genetic control of biochemical reactions in Neurospora. Proc. Natl Acad. Sci. USA 27, 499-506 (1941).
  31. Jeffery, C.J., Bahnson, B.J., Chien, W., Ringe, D. & Petsko, G.A. Crystal structure of rabbit phosphoglucose isomerase, a glycolytic enzyme that moonlights as neuroleukin, autocrine motility factor, and differentiation mediator. Biochemistry 39, 955-964 (2000). | Article | PubMed |
  32. Wistow, G. & Piatigorsky, J. Recruitment of enzymes as lens structural proteins. Science 236, 1554-1556 (1987). | PubMed |
  33. Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8-11 (1999). | Article | PubMed |
  34. Chothia, C. Proteins. One thousand families for the molecular biologist. Nature 357, 543-544 (1992). | PubMed |
  35. Hegyi, H. & Gerstein, M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147-164 (1999). | Article | PubMed |
  36. Jansen, R. & Gerstein, M. Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 28, 1481-1488 (2000). | Article | PubMed |
  37. Brenner, S.E. Errors in genome annotation. Trends Genet. 15, 132-133 (1999). | Article | PubMed |
  38. Smith, R.F. Perspectives: sequence data base searching in the era of large-scale genomic sequencing. Genome Res. 6, 653-660 (1996). | PubMed |

Copyright 2002 Nature Publishing