Please cite as: CSH Protocols; 2007; doi:10.1101/pdb.prot4786

This Protocol
Right arrow Abstract Freely available
Right arrow Update/discuss this protocolDiscussion icon
Right arrow Alert me when this protocol is cited
Right arrow Alert me when comments are published
Right arrow Alert me if a correction is posted
Services
Right arrow Similar protocols in this database
Right arrow Alert me to new releases of protocols
Right arrow Save to Personal Folders
Right arrow Download to citation manager
Right arrow Printer-friendly versionPrinter-friendly version
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Barbazuk, W. B.
Right arrow Articles by Schnable, P. S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Barbazuk, W. B.
Right arrow Articles by Schnable, P. S.
Related Collections
Right arrow Molecular Biology, general
Right arrow Analysis of Gene Expression
Right arrow Analysis of Gene Expression, general
Right arrow cDNA
Right arrow DNA Sequencing
Right arrow RNA
Right arrow RNA, general
Right arrow mRNA
Right arrow Genome Analysis
Right arrow Expression Analysis of RNA
Right arrow Genomic Analysis
Right arrow Plant
Right arrow Analysis of Gene Expression in Plants
Right arrow Expression Libraries
Right arrow Plant Cell Culture
Right arrow Bioinformatics/Genomics, general
Right arrow Alignment of Sequences
Right arrow Alignment of Sequences, general
Right arrow Multiple Sequence Alignment
Right arrow Sequence Database Searching
Right arrow Computational Biology
Right arrow Genetics, general
Right arrow Genetic Variation
Right arrow Laboratory Organisms, general
Right arrowRelated Protocols
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Legend icon

protocolProtocol

SNP Mining from Maize 454 EST Sequences

W. Brad Barbazuk1,4, Scott Emrich2, and Patrick S. Schnable3

1Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
2Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
3Department of Agronomy; Department of Genetics, Development, and Cell Biology; and Center for Plant Genomics; Iowa State University, Ames, IA 50011, USA

4Corresponding author (bbarbazuk{at}danforthcenter.org)


INTRODUCTION

In this protocol, 454 expressed sequence tags (ESTs) are generated by sequencing shoot apical meristem (SAM) cDNA from maize inbred lines on the 454 Life Sciences GS-20 sequencing system. The computational tool PolyBayes (Marth et al. 1999) is then used to identify single-nucleotide polymorphisms (SNPs). PolyBayes has been used successfully to identify SNPs in many different systems, including maize, and is particularly recommended for identifying SNPs in 454 sequences.


RELATED INFORMATION

For related protocols on tissue preparation, RNA extraction, and amplification, refer to Maize Tissue Preparation and Extraction of RNA from Target Cells for Genotyping and T7-Based RNA Amplification for Genotyping from Maize Shoot Apical Meristem. The use of PolyBayes to identify SNPs in maize is described in Useche et al. (2001).


MATERIALS

Reagents

Maize shoot apical meristem (SAM) cDNA from inbred lines B73 and Mo17

Prepare the cDNA as described in Maize Tissue Preparation and Extraction of RNA from Target Cells for Genotyping and T7-Based RNA Amplification for Genotyping from Maize Shoot Apical Meristem.

Equipment

454 Life Sciences Genome Sequencer 20 (GS 20)

454 Life Sciences has a sequencing service center that will provide sequences from cDNA and genomic DNA samples. Inquiry with the company regarding requirements for cDNA quantity and quality is recommended.

BLAST (Altschul et al. 1990)

Cross_match (P. Green, unpubl.)

PolyBayes (http://genome.wustl.edu/tools/software/polybayes.cgi)


METHOD

1. Generate 454 ESTs by sequencing SAM cDNA from the maize inbred lines B73 and Mo17 on the 454 Life Sciences GS-20 sequencing system.

2. Assign 454 ESTs to maize genomic anchor sequences using BLAST. Identify the highest-scoring alignment between each 454 EST and the collection of genomic sequences (1e–8 minimum E-value).
In place of genomic DNA, assembled ESTs can be used as an anchor. The main requirement is that the anchor sequences be of high quality because they are driving the multiple sequence alignment (MSA). Although "best hit" criteria are used during EST-to-anchor assignment, poor alignments or alignments between paralogs will be caught either during formation of MSAs by cross_match (see below) or by the internal paralog filter implemented within PolyBayes. The genome of B73 maize is currently being sequenced and this will provide an excellent collection of anchor sequences.

3. Run cross_match on each anchor sequence and its associated 454 ESTs to create an anchored MSA. The following cross_match parameters are recommended:
-discrep_lists

-tags

-masklevel 5

-gap_init -1

-gap_ext -1.
Low initiation (-gap_init) and gap extension (-gap_ext) are used to increase alignment tolerance between the short 454 ESTs and genomic anchors. Substitute higher values for gap_init and gap_ext if the anchored MSAs are unspliced (i.e., ESTs aligned to an EST anchor, or genomic sequence aligned to a genomic sequence anchor).

4. Run PolyBayes on the MSA. Recommended PolyBayes parameters for maize are:
-maskAmbiguousMatches

-nofilterParalogs

-priorParalog 0.03

-thresholdNative 0.75

-screenSnps

-considerAnchor

-noconsiderTemplateConsensus

-prescreenSnps

-priorPoly 0.01

-thresholdSnp 0.5.
It is necessary to include sequence quality files for the anchor sequence and the sequences aligned to it (member sequences). If these are unavailable or unreliable, set default quality values with:
-anchorBaseQualityDefault

-memberBaseQualityDefault.
Because cross_match aligns each sequence individually to the anchor during MSA construction, and PolyBayes assesses base quality on an individual basis, the use of a stringent default rather than the base quality information provided by 454 Life Sciences is expected to increase the accuracy of polymorphism detection.

5. Perform post-processing by reading the PolyBayes output files and deciding on appropriate rules to distinguish putative SNPs from false positives.
See Discussion.


DISCUSSION

In maize SNP-mining experiments conducted by the authors, both Mo17 and B73 454 ESTs were available, and the B73 maize MAGI assemblies were used as alignment anchors. Because Mo17 and B73 are inbreds, they should be monoallelic at every base position, with relatively rare exceptions caused by nearly identical paralogs (NIPs). Hence, putative SNPs were filtered using rules designed to substantially decrease the rate of false positives. These rules were:


REFERENCES

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410.[Medline]

Marth, G.T., Korf, I., Yandell, M.D., Yeh, R.T., Gu, Z., Zakeri, H., Stitziel, N.O., Hillier, L., Kwok, P.Y., and Gish, W.R. 1999. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23: 452–456.[Medline]

Useche, F.J., Gao, G., Harafey, M., and Rafalski, A. 2001. High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform. 12: 194–203.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?

Related Protocols

Maize Tissue Preparation and Extraction of RNA from Target Cells for Genotyping
Kazuhiro Ohtsu and Patrick S. Schnable
CSH Protocols 2007: 4784. [Abstract] [Full Text]

T7-Based RNA Amplification for Genotyping from Maize Shoot Apical Meristem
Kazuhiro Ohtsu and Patrick S. Schnable
CSH Protocols 2007: 4785. [Abstract] [Full Text]