The Fungi Phylogenetic Project: Pairwise Distance Calculation

Distance between each pair of sequences is the input for our algorithms performing dimensional scaling and clustering. The usual form of distance we've been using is to compute percent identity as described here. However, since the sequences are already aligned with respect to each other, our collaborators were interested in computing pairwise distance from multiple sequence alignment (MSA) itself based on the following strategy. Apparently, this is identical to the method we've been adopting previously, known as PIDNG with Strict DNA Alphabet in here.

The following content is copied from the email sent by Geoffrey House.

Pairwise gap deletion strategy for AMF clustering project

May 21, 2013

Yellow highlight – Same base; base position is counted in length

Red highlight – Different base; base position is counted in length

Green highlight – Skip this position (there is a gap in one or both of the sequences); base position not counted in length

Pink highlight - Skip this position (there is an ambiguous nucleotide in one or both of the sequences); base position not counted in length

Sequence 1: -ATG---WCT

Sequence 2: -TTGC--W-A

Sequence 3: GATGY---GT

Pairwise comparison 1 between sequences 1 and 2 (the length of the comparison is 4; 2 base positions are different):

-ATG---WCT

-TTGC--W-A

Pairwise comparison 2 between sequences 1 and 3 (the length of the comparison is 5; 1 base position is different):

-ATG---WCT

GATGY---GT

Pairwise comparison 3 between sequences 2 and 3 (the length of the comparison is 4; 2 base positions is different):

-TTGC--W-A

GATGY---GT

The Fungi Phylogenetic Project

Thursday, June 6, 2013

Pairwise Distance Calculation

No comments:

Post a Comment

Contributors