Wednesday, October 24, 2012

Pairwise Distances from Multiple Sequence Alignment (MSA) Vs Pairwise Local Alignment Distances

Here we have compared the pairwise distances calculated from a multiple sequence alignment (MSA) file versus pairwise distances calculated from the local pairwise alignment of sequences.
The comparison is based on the sequences found here. MSA corresponding to this can be found here.
  • Distances
    • We computed the percent identity excluding gaps (PIDNG) distance for each pair of aligned sequences in the MSA file. The formula is to count the number of identical base pairs and divide it by the total number base pairs. Each base pair should have characters belonging to the given alphabet to be considered in the computation. To convert this value to represent the distance between two sequences we subtract it from 1.0.
      • PIDNG Distance = 1.0 – PIDNG
      • where PIDNG = # Identical Pairs / # Pairs  and both characters in each pair belongs to the given alphabet.
    • We compute percent identity (PID) distance for each pair of sequences in the original sequence (FastA) file after aligning them locally using Smith-Waterman algorithm. For a given alignment of the two sequences, PID is calculated similarly to PIDNG except those pairs  whose one character is a gap are also counted towards the total number of pairs in the alignment. Similar to PIDNG distance we take the 1.0 – PID to get PID distance.
      • PID Distance = 1.0 – PID
      • where PID =  # Identical Pairs / # Pairs
  • Alphabet
    • Strict DNA
      • Considers only A, C, T, G characters
    • Full DNA
      • Considers A, C, M, G, R, S, V, T, W, Y, H, K, D, B, N characters
    • We computed PIDNG distance for MSA file using both alphabets
    • We computed PID distance for locally aligned sequences using Full DNA, but excluding character N.
  • Smith-Waterman Parameters
    • Scoring Matrix
      • EDNAFULL
    • Penalties
      • Gap open = –16 and gap extension = – 4
  • Heatmaps
Strict DNA Alphabet for MSA PIDNG Calculation
Full DNA Alphabet for MSA PIDNG Calculation
whole-plot-MSA-PIDNG-Vs-Pairwise-SWG-PIDDensitySat[0.85]-large whole-plot-MSA-PIDNG-Vs-Pairwise-SWG-PIDDensitySat[0.85]-large

Monday, October 8, 2012

Heatmaps of Different Distances

  • The type of distances
    • PID – #-of-matches / alignment-length-including-gaps-in-the-middle
    • AvgLocal – Score(A,B) / Avg (Score(A’A’), Score(B’,B’)) where A’ and B’ indicate the aligned regions in the original A and B sequences
    • MaxGlobal – Score(A,B) / Max(Score(A,A), Score(B,B))
    • MaxLocal – Score(A,B) / Max(Score(A’A’, B’B’)
    • MinGlobal – Score(A,B) / Min(Score(A,A), Score(B,B))
    • MinLocal – Score(A,B) / Min(Score(A’A’), Score(B’B’))

  • Heatmaps for vs PID
    • AvgLocal Vs PID
    • MaxGlobal Vs PID
whole-plot-SWG-AvgLocal-Vs-SWG-PIDDensitySat[0.85]-large whole-plot-SWG-MaxGlobal-Vs-SWG-PIDDensitySat[0.85]-large
    • MaxLocal Vs PID
    • MinGlobal Vs PID
whole-plot-SWG-MaxLocal-Vs-SWG-PIDDensitySat[0.85]-large whole-plot-SWG-MinGlobal-Vs-SWG-PIDDensitySat[0.85]-large
    • MinLcal Vs PID

  • Heatmaps of vs AvgLocal
    • MaxGlobal Vs AvgLocal
    • MaxLocal Vs AvgLocal
whole-plot-SWG-MaxGlobal-Vs-SWG-AvgLocalDensitySat[0.85]-large whole-plot-SWG-MaxLocal-Vs-SWG-AvgLocalDensitySat[0.85]-large
    • MinGlobal Vs AvgLocal
    • MinLocal Vs AvgLocal
whole-plot-SWG-MinGlobal-Vs-SWG-AvgLocalDensitySat[0.85]-large whole-plot-SWG-MinLocal-Vs-SWG-AvgLocalDensitySat[0.85]-large

  • Heatmaps of vs MaxGlobal
    • MaxLocal Vs MaxGlobal
    • MinGlobal Vs MaxGlobal
whole-plot-SWG-MaxLocal-Vs-SWG-MaxGlobalDensitySat[0.85]-large whole-plot-SWG-MinGlobal-Vs-SWG-MaxGlobalDensitySat[0.85]-large
    • MinLocal Vs MaxGlobal

  • Heatmaps of vs MaxLocal
    • MinGlobal Vs MaxLocal
    • MinLocal Vs MaxLocal
whole-plot-SWG-MinGlobal-Vs-SWG-MaxLocalDensitySat[0.85]-large whole-plot-SWG-MinLocal-Vs-SWG-MaxLocalDensitySat[0.85]-large

  • Heatmaps of vs MinLocal
    • MinLocal Vs MinGlobal