Bioinformatic Methods

The following methods and software were used in production of the data and analysis presented on this site.

Unigene V1

The original unigenes were produced by assembling the 454 sequences with the Roche 454 software package Newbler.

Unigene V2

The version 2 unigenes were produced by assembling the 454 sequences with the software SeqManPro. This version of assemblies also included hybrid assemblies of 454 data with longer Sanger-style reads.

Unigene V3

The version3 unigenes were produced by assembling 454 and 454/Sanger-style sequences with cap3 with a high stringency of -p 90 and without end trimming.

  • Huang X and Madan A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9(9):868-877.

SSRs

A multi-step procedure has been developed at CUGI to facilitate SSR mining. The following steps are performed on each contig of an assembly:

  1. The script searches for repetitive patterns in the contig consensus sequence that match one of the following criteria:
    • 2 base pair motif (dinucleotide) repeated at least 5 times
    • 3 base pair motif (trinucleotide) repeated at least 4 times
    • 4 base pair motif (tetranucleotide) repeated at least 3 times
    • 5 base pair motif (pentanucleotide) repeated at least 3 times
  2. Primer3 was run with default parameters. The output is entered in the Excel spreadsheet and includes the forward primer, the reverse primer, the melting temperature for the forward primer, the melting temperature for the reverse primer, and the product size between the primers.
    • Koressaar T, Remm M. (2007) Enhancements and modifications of primer design program Primer3. Bioinformatics. May 15;23(10):1289-91. Epub 2007 Mar 22.
  3. Underlying evidence of polymorphism is reported in the "Alignment" column. The software looks for one of four characteristics: a 2bp or larger gap in consensus sequence in SSR region, multiple 1bp gaps in consensus sequence in SSR region, a gap at either end of consensus with another repeat of the motif at corresponding region of an underlying sequence, a gap of 2bp or more in an underlying sequence. Additionally, these sequences have enough flaking sequence to have primers that amplify the SSR region. This analysis helps to filter the list of potential SSRs down to manageable number, but manual examination and selection is still beneficial.

SNPs

PolyBayes version 3.0 was run utilizing the assembly and quality values associated with the sequences.

  • POLYBAYES 3.0 Copyright (C) 1998, 1999, 2000, 2001 Gabor T. Marth, Washington University, St. Louis, Missouri USA. All Rights Reserved.

Specific parameters were:

  • polybayes.pl -inputFormat ace -readPhdFiles -filterParalogs -screenSnps -prescreenSnps -noconsiderAnchor

Primer3 was run with default parameters. The output is entered in the Excel spreadsheet and includes the forward primer, the reverse primer, the melting temperature for the forward primer, the melting temperature for the reverse primer, and the product size between the primers.

  • Koressaar T, Remm M. (2007) Enhancements and modifications of primer design program Primer3. Bioinformatics. May 15;23(10):1289-91. Epub 2007 Mar 22.

Indels are excluded from the analysis of SNPs. Indels are the most common type of 454 errors, and it is likely the indels reported would have a very low rate of true polymorphism. Other 454/SNP studies have chosen to exclude indels and utilize on substitutions as well (for example: Barbazuk et al, 2007, The Plant Journal; Novaes et al, 2008, BMC Genomics)