Thursday, June 6, 2013

Pairwise Distance Calculation

Distance between each pair of sequences is the input for our algorithms performing dimensional scaling and clustering. The usual form of distance we've been using is to compute percent identity as described here. However, since the sequences are already aligned with respect to each other, our collaborators were interested in computing pairwise distance from multiple sequence alignment (MSA) itself based on the following strategy. Apparently, this is identical to the method we've been adopting previously, known as PIDNG with Strict DNA Alphabet in here.

The following content is copied from the email sent by Geoffrey House.


Pairwise gap deletion strategy for AMF clustering project
May 21, 2013

Yellow highlight – Same base; base position is counted in length

Red highlight – Different base; base position is counted in length


Pink highlight - Skip this position (there is an ambiguous nucleotide in one or both of the sequences); base position not counted in length

Sequence 1:      -ATG---WCT
Sequence 2:      -TTGC--W-A
Sequence 3:      GATGY---GT


-ATG---WCT
-TTGC--W-A


-ATG---WCT
GATGY---GT

Pairwise comparison 3 between sequences 2 and 3 (the length of the comparison is 4; 2 base positions is different):

-TTGC--W-A

GATGY---GT

No comments:

Post a Comment