The comparison is based on the sequences found here. MSA corresponding to this can be found here.
 Distances
 We computed the percent identity excluding gaps (PIDNG) distance for each pair of aligned sequences in the MSA file. The formula is to count the number of identical base pairs and divide it by the total number base pairs. Each base pair should have characters belonging to the given alphabet to be considered in the computation. To convert this value to represent the distance between two sequences we subtract it from 1.0.
 PIDNG Distance = 1.0 – PIDNG
 where PIDNG = # Identical Pairs / # Pairs and both characters in each pair belongs to the given alphabet.
 We compute percent identity (PID) distance for each pair of sequences in the original sequence (FastA) file after aligning them locally using SmithWaterman algorithm. For a given alignment of the two sequences, PID is calculated similarly to PIDNG except those pairs whose one character is a gap are also counted towards the total number of pairs in the alignment. Similar to PIDNG distance we take the 1.0 – PID to get PID distance.
 PID Distance = 1.0 – PID
 where PID = # Identical Pairs / # Pairs
 Alphabet
 Strict DNA
 Considers only A, C, T, G characters
 Full DNA
 Considers A, C, M, G, R, S, V, T, W, Y, H, K, D, B, N characters
 We computed PIDNG distance for MSA file using both alphabets
 We computed PID distance for locally aligned sequences using Full DNA, but excluding character N.
 SmithWaterman Parameters
 Scoring Matrix
 EDNAFULL
 Penalties
 Gap open = –16 and gap extension = – 4
 Heatmaps
Strict DNA Alphabet for MSA PIDNG Calculation

Full DNA Alphabet for MSA PIDNG Calculation
 
No comments:
Post a Comment