| PLNT4610/PLNT7690
Bioinformatics Lecture 7, part 2 of 4 |
|
n |
![]() |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
All 15 tree topologies for 5 species
redrawn from Felsenstein [http://www.cs.washington.edu/education/courses/590bi/98wi/ppt15/sld011.htm ].
Therefore, unless only a small number of sequences are to be included in a tree, methods to avoid considering obviously suboptimal trees must be used to reduce the total number of trees considered. There are two main categories of phylogeny methods, distance methods and character methods. In distance methods, the first step is to calculate a matrix of all pairwise differences between a set of sequences. Next, the tree is constructed to minimize the distance when all branches are added together. Distance methods do not attempt to consider internal branches of the trees, and therefore are not strictly modeled on evolution.
Character methods attempt to reconstruct ancestral nodes of trees in order to fit the tree to an evolutionary model. They therefore use more of the information in the data, at the expense of longer execution time. Character methods include parsimony and maximum likelihood methods.
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DNA scoring
methods:
Protein scoring methods - Because of the many nuances in working with proteins, there are many scoring schemes. Most are based on existing PAM or BLOSUM matrices. One common method is to use Dayhoff's PAM 001 matrix to score distances. (One PAM unit is defined as the amount of sequence divergence corresponding to a 1% amino acid replacement rate.) Alternatively, Kimura's protein distance metric simply uses observed amino acid frequencies from a protein to approximate a PAM distance:
D = -ln (1 - p - 0.2 p2)
where
p is the fraction of amino acids that differ between two sequences
Using the appropriate
scoring methods, all pairwise distances between
sequences are calculated. For more details on protein scoring matrices,
see the documentation for the Phylip protdist
program.
For example, the PHYLIP documentation gives the example of a
set of 5 short aligned DNA sequences
Alpha AACGTGGCCACATThe corresponding distance matrix using the Kimura 2 parameter model is
Beta ..G..C......C
Gamma C.GT.C......A
Delta G.GA.TT..G.C.
Epsilon G.GA.CT..G.CC
| Alpha | Beta | Gamma | Delta | Epsilon | |
| Alpha | 0.2997 | 0.7820 | 1.1716 | 1.4617 | |
| Beta | 0.3219 | 0.8997 | 0.5653 | ||
| Gamma | 1.4481 | 1.0726 | |||
| Delta | 0.1679 | ||||
| Epsilon |
| B | C | |
| A | 24 | 28 |
| B | 32 |
Simultaneous linear equations can be used to calculate the branch lengths:
A to B: x + y = 24Thus with 3 equations and 3 unknowns we can calculate that x = 10, y = 14 and z = 18.
A to C: x + z = 28
B to C: y + z = 32
Addition of branches is iterative. Branches are added until all sequences are included in the tree.
Advantages
| Fitch and Margoliash
showed that different sets of internal branch lengths
could be obtained by considering alternate trees which moved one or
more
branches to different parts of the tree. Consider a distance matrix for
four sequences with pairwise distances Dij; |
|
||||||||||||||||||||||||||||||
The Neighbor-Joining tree
for these sequences is

| If we recalculate the
pairwise distances dij from
the tree, they are different from the original distances, as shown at
right. The least squares method of Fitch and Margoliash tries different tree topologies, swapping branches among closely-related sequences, and reculating the distances. For each tree considered, a different matrix of distances will be generated (dij). The best tree is defined as that tree which minimizes:
|
|
||||||||||||||||||||||||||||||
| What
about UPGMA? It has been exhaustively demonstrated in the literature, both on theoretical grounds and from phylogenies constructed on strains of known pedigree, that UPGMA is the least robust method. It is based on the assumption that rates of evolution are constant along all branches ie. follows an evolutionary clock. This assumption is almost never valid. Especially with the many choices of far more sophisticated phylogeny inference methods, there is little justification for ever using UPGMA. One point to make is that a comparison of Neighbor-Joining results with UPGMA results could provide a test for the hypothesis that evolution in a given tree is clock-like. Distance Methods [http://helix.biology.mcmaster.ca/721/outline2/node49.html] Nei M and Roychoudhury AK (1993) Evolutionary Relationships of Human Populations on Global Scale. Mol. Biol. Evol. 10:927-943. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. |
| PLNT4610/PLNT7690
Bioinformatics Lecture 7, part 2 of 4 |