Cite as: Cold Spring Harb. Protoc.; 2008; doi:10.1101/pdb.prot5023

This Protocol
Right arrow Abstract Freely available
Right arrow Update/discuss this protocolDiscussion icon
Right arrow Alert me when this protocol is cited
Right arrow Alert me when comments are published
Right arrow Alert me if a correction is posted
Services
Right arrow Similar protocols in this database
Right arrow Alert me to new releases of protocols
Right arrow Save to Personal Folders
Right arrow Download to citation manager
Right arrow Printer-friendly versionPrinter-friendly version
Citing Articles
Right arrow Citing Articles via HighWire
Google Scholar
Right arrow Articles by Smith, A. V.
PubMed
Right arrow Articles by Smith, A. V.
Related Collections
Right arrow Bioinformatics/Genomics, general
Right arrow Sequence Database Searching
Right arrow Computational Biology
Right arrow Genetics, general
Right arrow Genetic Variation
Right arrow Genome Analysis
Right arrowRelated Protocols
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Legend icon

protocolProtocol

Browsing HapMap Data Using the Genome Browser

Albert Vernon Smith

This protocol was adapted from "Using the HapMap Web Site," Chapter 6, in Genetic Variation (eds. Weiner et al.). Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA, 2007.


INTRODUCTION

The primary goal of the International Haplotype Map Project has been to develop a haplotype map of the human genome that describes the common patterns of genetic variation, in order to accelerate the search for the genetic causes of human disease. Within the project, ~3.9 million distinct single-nucleotide polymorphisms (SNPs) have been genotyped in 270 individuals from four worldwide populations. The project data are available for unrestricted public use at the HapMap website. This site, which is the primary portal to genotype data produced by the project, offers bulk downloads of the data set, as well as interactive data browsing and analysis tools that are not available elsewhere. Research into the genetic contributions to a human disease commonly focuses on candidate genes identified from linkage and/or association studies, as well as from pathways suspected to be involved in a particular disease process. In studying candidate genes, a researcher will want to know whether there are any common SNPs in the immediate vicinity, what those SNPs’ alleles are, and the relative frequencies of the alleles in the population. The researcher will also be particularly interested in coding SNPs, whose alleles change the amino acid sequence of the gene product and therefore might represent functional variations. This protocol provides details on how to use the genome browser to navigate to and explore HapMap data for a gene or region of interest.


RELATED INFORMATION

HapMap data from the International Haplotype Map Project (International HapMap Consortium 2005) are available at the project website: http://www.hapmap.org (Thorisson et al. 2005). The HapMap website provides researchers with a number of tools that allow them to analyze the data as well as to download data for local analyses. The following web resources are also useful:

http://www.ensembl.org (Hubbard et al. 2007)

The following protocols describe additional tools and functions that have been developed for viewing, retrieving, and analyzing HapMap data:

Generating HapMap Data Text Reports Using the Genome Browser (Smith 2008a)

Manipulating HapMap Data Using HaploView (Smith 2008b)

Retrieving HapMap Data Using HapMart (Smith 2008c)

Retrieving HapMap Data via Bulk Download (Smith 2008d)


MATERIALS

Equipment

Computer (Internet-connected)


METHOD

Finding and Browsing to a Region of Interest

The genome browser at the HapMap website provides access to small- to medium-sized regions of the genome for this type of interactive exploration. This basic protocol describes how to start using the genome browser.

1. Using any modern web browser, go to www.hapmap.org.

2. Click the "Browse Project Data" link under the "Project Data" section of the hapmap.org homepage.
This will take you to a genome browser based on the GBrowse package (Fig. 1 ).
Figure 1. The initial page shown when starting to use the HapMap genome browser for the first time. Depending on your computer language settings, this page can appear in one of several languages, although this section assumes English. The page can also be reached directly at http://www.hapmap.org/cgi-perl/gbrowse/.
3. Locate the "Landmark or Region" search box, and enter a search term.
Any of the following types of search terms will work:
  • A chromosome name (e.g., "Chr19")

  • A chromosomal position in the format Chromosome:start..stop (e.g., "Chr10:25000..300000")

  • The name of a SNP using its dbSNP "rs" name (e.g., "rs6870660")

  • A gene using its NCBI RefSeq accession number (e.g., "NM 153254")

  • A gene using its common name (e.g., "BRCA2")

  • A chromosomal band (e.g., "5q31")



4. After entering one of these landmarks, press the "Search" button (or hit "Enter").
This will return a page showing the region surrounding the requested feature (Fig. 2 ). If multiple features match, then the page will show a graphical summary, including genomic location, of all possible features and prompt you to choose one.
Figure 2. The HapMap genome browser displaying a requested feature.
  • At the top of the returned page is an "Overview" section that shows the cytogenetic map of the selected chromosome. A red box indicates the section of the chromosome in view.

  • Below this is a Region overview, displaying 2 Mb surrounding the region of interest. Again, a red box indicates the section of chromosome.

  • Beneath this is a "Detail" section that has horizontal tracks showing various types of data. By default, only a small number of genomic tracks are displayed initially for the region. The two most useful tracks are the "Genotyped SNPs" track that provides information on the position, alleles, and allele frequencies of each SNP characterized by the HapMap project, and the Entrez genes track, which shows the positions and structures of human protein-coding genes.

  • A number of additional information tracks are available, which can particularly help with the understanding and design of association studies. A number of analyses derived from HapMap data, as well as outside data sources, are available (Table 1). Particularly noteworthy are a number of tracks related to structural variation in the genome, as well as links to the Reactome database (http://www.reactome.org; Vastrik et al. 2007), a curated resource of core pathways and reactions in human biology.


By default, the genome browser goes to the most recent release of HapMap data. Previous releases are available via this interface, and the different releases can be selected under the "Data Source" menu.

5. Use the controls at the top of the page to scroll left, right, or to change the magnification of the region. Click anywhere on the "Overview," "Region," or the scale at the top of the "Details" section in order to center the view on this position.
The genotyped SNP track changes its appearance in a manner appropriate to the scale of the image:
  • At low magnifications, genotyped SNPs appear as equilateral triangles. These colors can be customized by selecting the "Highlight SNP Properties" item in the "Reports and Analysis" menu.

  • At higher magnifications, the genotyped SNPs change to display the alleles associated with the SNP. The allele shown in blue is the allele present in the reference genomic sequence at that location, and the red allele is the other allele present in the SNP.

  • When zoomed in still further, the genotyped SNPs track changes to show pie charts representing the allele frequency for each genotyped population. The blue wedge of the pie chart indicates the frequency of the allele that appears in the reference genome sequence. The red wedge is the frequency of the alternative allele. The pie chart display provides the researcher with the ability to easily distinguish SNPs that are highly polymorphic in all four of the HapMap populations and, therefore, more likely to be polymorphic in other populations as well. Alternatively, the researcher can identify SNPs that are more polymorphic in a single population and are therefore suitable as markers in population-specific genetic screens.



6. Click on the glyph for an individual SNP to see a text-based page with detailed genotype and allele counts, and assay information.
This provides the researcher with the information needed to generate an assay for the SNP, including the left and right flanking sequences needed to create PCR primers.
i. Click on the hypertext link to dbSNP (http://www.ncbi.nlm.nih.gov/SNP; Wheeler et al. 2007) for more information about how the SNP was first discovered and any other population genetic information that may exist for it outside the HapMap project.

ii. Click on the link to Ensembl (http://www.ensembl.org; Hubbard et al. 2007) to reach a site where the structural impact of the SNP on coding sequence, splice sites, and other features of nearby genes can be examined.

Viewing the Extent of Linkage Disequilibrium (LD)

When a researcher designs a study to detect the association between a common allelic variation of a gene and a disease of interest, knowledge of the extent of LD in the region is essential for reducing the number of SNPs that need to be genotyped across the region. If there is high LD in the region, then only a few SNPs need to be genotyped because their linkage to other SNPs in the region will serve as proxies for the genotypes of noncharacterized SNPs. In contrast, a region of low LD will need to be sampled more heavily because the allelic state of a genotyped SNP will be a poor predictor of the state of nongenotyped SNPs. The determination of patterns of LD in the populations characterized by the HapMap project has been one of the major goals of this project. The International HapMap Project has precalculated patterns of LD among the genotyped SNPs. The data can be downloaded in bulk from the HapMap website or browsed interactively using the HapMap genome browser. The latter method allows researchers to see patterns of LD in context with the distribution of genes of interest.

7. To view available LD data precalculated from HapMap genotypes, browse to a region of interest (see Steps 1-4).

8. Select the "Annotate LD plot" plug-in from the "Reports and Analysis" menu.

9. Click the "Configure" button to bring up a configuration page that will allow you to adjust the display properties to your liking.
Key parameters on this page are the HapMap populations to display, which measure of LD to use (choice of D', r2, or log of the odds [LOD]), whether the triangle plot should be oriented with the vertex pointing upward or downward, color scheme, and whether the box size in the plot should be proportional to genomic distance between markers or of uniform size (see Fig. 3 ).
Figure 3. The configuration page of the HapMap genome browser allows the user to customize numerous style features of the data display.
The traditional D' and r2 metrics reflect the degree of pairwise LD between two SNPs, but differ in their sensitivity and specificity across different size scales. See Mueller (2004) for a discussion of the practical application of these measurements. The LOD metric used in the HapMap website display is described in Daly et al. (2001).

10. Click on the "Configure" button to return to the main display, which will now show one triangle plot for each population selected (see Fig. 4 ).
Figure 4. The HapMap genome browser displaying a triangle plot of LD values for multiple populations. A typical region of LD demonstrating "patches" of high LD separated by relatively well-defined boundaries of low LD is shown. The triangle plot is constructed by connecting every pair of SNPs along lines at 45° to the horizontal track line. The color of the diamond at the position where two SNPs intersect indicates the amount of LD; more intense colors indicate higher LD. A gray diamond indicates that data are missing.
In regions with many genotyped SNPs, the LD plug-in adds significantly to the time it takes for the web page to load. You can turn off the LD display at any time by deselecting the appropriate checkbox in the "Tracks" section of the browser. The LD plug-in settings are stored in a browser cookie, so there is no need to visit the configuration page each time the plug-in is turned on.

Picking and Viewing tag-SNPs

tag-SNPs are a reduced set of SNPs that capture much of the LD in regions; they can be used in association studies to reduce the number of SNPs needed to detect LD-based association between a trait of interest and a region of the genome. For small regions, it is possible to select tag-SNPs by hand using the graphical and numeric displays of LD generated above, but for best results, it is recommended that the researcher use an algorithm that chooses tag-SNPs by formally maximizing the number of linked SNPs captured by the tag set. There is no single set of tag-SNPs that will satisfy the diverse requirements of every association study design. Researchers may wish to select SNPs that work well with a particular genotyping system (e.g., those that have been included on a particular "SNP chip") and may be willing to accept different tradeoffs between the cost of genotyping a study population and the strength of the association they can detect. For this reason, the HapMap website does not offer a static set of preselected tag-SNPs, but instead offers researchers a tool for interactively selecting tag-SNPs based on user-provided criteria. The tag-SNP lists are generated from algorithms in the Tagger program (http://www.broad.mit.edu/mpg/tagger/; de Bakker et al. 2005).

11. Navigate to a region of interest (see Steps 1-4).

12. Under the "Reports and Analysis" menu, select the "Annotate tag SNP Picker" option.

13. Press "Configure" to select the desired options for tag-SNP selection (see Fig. 5 ).
Figure 5. The HapMap genome browser graphically displaying tag-SNPs, as well as phased haplotypes.
Options include:
  • Selecting a population and an algorithm

  • Uploading a list of SNP IDs to be included in the set of tag-SNPs

  • Uploading a list of SNP IDs to be excluded from the set of tag-SNPs

  • Uploading a list of design scores (priorities) for each SNP

  • Selecting cutoffs for minimum acceptable LD value and allele frequency for SNPs to be included in the set



14. Click the "Configure" button to run the analysis and return to the main display.
Results are shown on a new feature track (see Fig. 5).
As with the LD display above (Step 10), settings are stored in a browser cookie, and the plug-in track can be turned off when it is not needed.

Viewing Phased Haplotypes

A researcher may wish to correlate the tag-SNP set selected by the tag-SNP picker algorithm with the underlying haplotype structure of the region. One way to do this is to turn both the pairwise LD and tag-SNP tracks on simultaneously (Steps 7-10 and 11-14, respectively). An alternative, however, is to activate a track that displays the phased haplotypes themselves. The phased haplotype data described in this section were generated by the International HapMap Project Consortium using the program PHASE version 2.1 (Stephens and Donnelly 2003). During phasing, each allele in a genotype is assigned to one or the other parental chromosome, using a maximum likelihood algorithm that uses trio (lineage) information in the HapMap population groups, or, if trio information is not available, by fitting the data to a model that minimizes the number of implied historical crossovers in the population. The phased haplotypes are displayed as a graphic in which each chromosome of the individuals sampled by the project is represented as a line one pixel high, and each SNP allele is arbitrarily colored blue or yellow. A region of high LD will appear as a region in which there are long runs of SNPs that share alleles across multiple chromosomes, indicating that there is little recombination among them. A region of low LD will appear as an area where the runs are shorter and more fragmentary.

15. Navigate to a region of interest (see Steps 1-4).

16. Select "Annotate Phased Haplotype Display" from the "Reports and Analysis" menu.

17. Press "Configure" to set options for Haplotype display.
The options give you the ability to select the population for which to display haplotype information.

18. After selecting the desired population(s), click the "Configure" button to return to the main display. A new feature track will appear for each population selected. Each track shows the haplotypes for that population using the two-color scheme described above (see Fig. 5).
The order of chromosomes is determined by a fast hierarchical clustering methodology, which places chromosomes that share similar haplotypes together.
The advantage of this display over the pairwise LD "triangle display" is that it is more compact and therefore better suited for the display of large regions. This makes it easy to correlate the position of long common haplotypes with SNPs chosen by the tag-SNP picker. The disadvantage of this display is that it conceals much of the fine structure of LD in the region; in particular, strong LD among SNPs that are not adjacent to one another.

19. To retrieve the detailed phased genotypes, click on the track of the desired population.

This will take you to a page that provides the haplotype information in tabular form. Each row of the table is an individual chromosome, and each column is an individual SNP. The background of each table entry is set to a color corresponding to that seen in the graphical track.


DISCUSSION

A number of public online resources have been developed as portals to high-volume genome-wide data sets. The UCSC Genome Browser (http://genome.ucsc.edu; Kent et al. 2002) and the EnsEMBL project (www.ensembl.org;Hubbard 2007) have developed multispecies genome browsers that display genomic annotations graphically and offer retrieval of the underlying data. dbSNP (http://www.ncbi.nlm.nih.gov/SNP;Wheeler et al. 2007) is a repository for information on SNPs, but does not yet contain extensive information on the relationships among those SNPs.

The HapMap Web site, located at http://www.hapmap.org, has a distinct focus. It aims to be a resource in the display, retrieval, and analysis of high-throughput, high-quality, genome-wide human genetic data, with an emphasis on the support of tools for facilitating disease association studies. Although the resource is still in development, it currently provides the basic tools for visualizing patterns of common polymorphism among the populations surveyed by the HapMap project, selecting tag-SNP sets based on a variety of criteria, and generating customized extracts of the data set. In the future, the HapMap Web site will evolve to provide more services to those designing and interpreting genetic association studies.


REFERENCES

Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E., and Pritchard, J.K. 2006. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38: 75–81.[Medline]

Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29: 229–232.[Medline]

de Bakker, P.I.W., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., and Altshuler, D. 2005. Efficiency and power in genetic association studies. Nat. Genet. 37: 1217–1223.[Medline]

Hinds, D.A., Kloek, A.P., Jen, M., Chen, X., and Frazer, K.A. 2006. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38: 82–85.[Medline]

Hubbard, T.J.P., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., et al. 2007. Ensembl 2007. Nucleic Acids Res. 35: D610–D617. doi: 10.1093/nar/gkl996.[Medline]

Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., and Lee, C. 2004. Detection of large-scale variation in the human genome. Nat. Genet. 36: 949–951.[Medline]

International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437: 1299–1320.[Medline]

Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The Human Genome Browser at UCSC. Genome Res. 12: 996–1006.[Abstract/Free Full Text]

McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S.B., Lee, C., Daly, M.J., et al. 2006. Common deletion polymorphisms in the human genome. Nat. Genet. 38: 86–92.[Medline]

Mueller, J.C. 2004. Linkage disequilibrium for different scales and applications. Brief Bioinform. 5: 355–364.[Abstract/Free Full Text]

Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., et al. 2006. Global variation in copy number in the human genome. Nature 444: 444–454.[Medline]

Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., et al. 2004. Large-scale copy number polymorphism in the human genome. Science 305: 525–528.[Abstract/Free Full Text]

Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R., et al. 2005. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77: 78–88.[Medline]

Smith, A.V. 2008a. Generating HapMap data text reports using the genome browser. CSH Protocols (this issue) doi: 10.1101/pdb.prot5024.[Abstract/Free Full Text]

Smith, A.V. 2008b. Manipulating HapMap data using HaploView. CSH Protocols (this issue) doi: 10.1101/pdb.prot5025.[Abstract/Free Full Text]

Smith, A.V. 2008c. Retrieving HapMap data using HapMart. CSH Protocols (this issue) doi: 10.1101/pdb.prot5026.[Abstract/Free Full Text]

Smith, A.V. 2008d. Retrieving HapMap data via bulk download. CSH Protocols (this issue) doi: 10.1101/pdb.prot5027.[Abstract/Free Full Text]

Smith, A.V., Thomas, D.J., Munro, H.M., and Abecasis, G.R. 2005. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res. 15: 1519–1534.[Abstract/Free Full Text]

Stephens, M. and Donnelly, P. 2003. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 1162–1169.[Medline]

Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. 2005. The International HapMap Project Web site. Genome Res. 15: 1592–1593.[Abstract/Free Full Text]

Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., et al. 2005. Fine-scale structural variation of the human genome. Nat. Genet. 37: 727–732.[Medline]

Vastrik, I., D’Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., et al. 2007. Reactome: A knowledge base of biologic pathways and processes. Genome Biol. 8: R39.[Medline]

Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., et al. 2007. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 35: D5–D12. doi: 10.1093/nar/gkl1031.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Related Protocols

Generating HapMap Data Text Reports Using the Genome Browser
Albert Vernon Smith
CSH Protocols 2008: 5024. [Abstract] [Full Text]

Manipulating HapMap Data Using HaploView
Albert Vernon Smith
CSH Protocols 2008: 5025. [Abstract] [Full Text]

Retrieving HapMap Data Using HapMart
Albert Vernon Smith
CSH Protocols 2008: 5026. [Abstract] [Full Text]

Retrieving HapMap Data via Bulk Download
Albert Vernon Smith
CSH Protocols 2008: 5027. [Abstract] [Full Text]



This article has been cited by other articles:


Home page
CSH ProtocolsHome page
A. V. Smith
Generating HapMap Data Text Reports Using the Genome Browser
CSH Protocols, July 1, 2008; 2008(8): pdb.prot5024 - pdb.prot5024.
[Abstract] [Full Text]


Home page
CSH ProtocolsHome page
A. V. Smith
Manipulating HapMap Data Using HaploView
CSH Protocols, July 1, 2008; 2008(8): pdb.prot5025 - pdb.prot5025.
[Abstract] [Full Text]


Home page
CSH ProtocolsHome page
A. V. Smith
Retrieving HapMap Data Using HapMart
CSH Protocols, July 1, 2008; 2008(8): pdb.prot5026 - pdb.prot5026.
[Abstract] [Full Text]


Home page
CSH ProtocolsHome page
A. V. Smith
Retrieving HapMap Data via Bulk Download
CSH Protocols, July 1, 2008; 2008(8): pdb.prot5027 - pdb.prot5027.
[Abstract] [Full Text]