Showing posts with label PIDNG. Show all posts
Showing posts with label PIDNG. Show all posts

Sunday, September 8, 2013

Pairwise Distances from MSA Vs Pairwise Local Alignment Distances for Revised 599Nts

Following heatmap present the correlation between distances computed from multiple sequence alignment (MSA) versus pairwise local alignment for revised 599Nts dataset (MSA is available here and sequence file is here).

Details on MSA based distance computation is at http://salsafungiphy.blogspot.com/2013/06/pairwise-distance-calculation.html

Details on pairwise local alignment based distance computation is at http://salsafungiphy.blogspot.com/2012/10/pairwise-distances-from-multiple.html



Monday, June 17, 2013

Pairwise Distances from Multiple Sequence Alignment (MSA) Vs Pairwise Local Alignment Distances

Following heatmaps present the correlation between distances computed from multiple sequence alignment (MSA) versus pairwise local alignment. Distance computation from MSA is explained in the previous post at http://salsafungiphy.blogspot.com/2013/06/pairwise-distance-calculation.html. Distances computed from pairwise local alignment follows the usual recipe of using Smith-Waterman with -16/-4 gap penalties and EDNAFULL scoring matrix for alignment followed by percent identity based distance computation. A similar study was done with previous data as well (see http://salsafungiphy.blogspot.com/2012/10/pairwise-distances-from-multiple.html).


  • 599Nts data set




  • 999Nts data set

Monday, November 5, 2012

Tree Distance Heatmaps

This is an attempt to find a “goodness” measure for phylogenetic trees generated by programs such as Ninja (http://nimbletwist.com/software/ninja/index.html) or RAxML (http://www.phylo.org/news/RAxML). We calculate distance, which we refer as “Tree Distance”,  for each pair of sequences based on the structure of the tree. We then compare this against original pairwise distances computed for sequences based on either local alignment or multiple sequence alignment. Definition for tree distance is not unique and we currently use the “Edge Sum” definition given below. Edge Count is possibly another definition to test, though we have not tested it yet.
  • Edge Sum
    • Given two sequence A and B, we find the shortest path from A to B in the tree and sum up values on edges along the path.
  • Edge Count
    • Given two sequence A and B, we find the shortest path from A to B and count the number of edges in it.
We have performed this analysis for two datasets.

Heatmaps for Fungi 200

3D DA-SMACOF of SWG PID Vs SWG PID
 
3.)whole-plot-DASMACOF-SWG-PID-Vs-SWG-PIDDensitySat[0.85]-large  
   
   
   
Edge Sum Ninja from SWG PID Vs 3D DA-SMACOF of SWG PID
Edge Sum Ninja from SWG PID Vs SWG PID
1.)whole-plot-Ninja-Edgesum-Vs-DASMACOF-SWG-PIDDensitySat[0.85]-large 2.)whole-plot-Ninja-Edgesum-Vs-SWG-PIDDensitySat[0.85]-large

Heatmaps of Fungi 2133

3D DA-SMACOF of SWG PID Vs SWG PID
 
whole-plot-DASMACOF-SWG-PID-Vs-SWG-PIDDensitySat[0.85]-large  
   
   
   
Edge Sum Ninja from 3D DA-SMACOF of SWG PID Vs 3D DA-SMACOF of SWG PID
Edge Sum Ninja from 3D DA-SMACOF of SWG PID Vs SWG PID
whole-plot-Ninja-Edgesum-Vs-DASMACOF-SWG-PIDDensitySat[0.85]-large whole-plot-Ninja-Edgesum-Vs-SWG-PIDDensitySat[0.85]-large
   
   
   
Edge Sum RAxML(20 iterations) from ClustlO MSA Vs SWG PID
Edge Sum RAxML (20 iterations) from ClustlO MSA (distances from 0 to 0.15) Vs SWG PID
whole-plot-Raxml(20)ClustlO-Edgesum-Vs-SWG-PIDDensitySat[0.85]-large-1 whole-plot-Raxml(20)ClustlO-Edgesum-Vs-SWG-PIDDensitySat[0.85]-large-2
   
   
   
Edge SumRAxML(20 iterations) from ClustlO MSA Vs Strict PIDNG
Edge Sum RAxML (20 iterations) from ClustlO MSA (distances from 0 to 0.15) Vs Strict PIDNG
whole-plot-Raxml(20)ClustlO-Edgesum-Vs-Strict-PIDNGDensitySat[0.85]-large-1 whole-plot-Raxml(20)ClustlO-Edgesum-Vs-Strict-PIDNGDensitySat[0.85]-large-2

Wednesday, October 24, 2012

Pairwise Distances from Multiple Sequence Alignment (MSA) Vs Pairwise Local Alignment Distances

Here we have compared the pairwise distances calculated from a multiple sequence alignment (MSA) file versus pairwise distances calculated from the local pairwise alignment of sequences.
The comparison is based on the sequences found here. MSA corresponding to this can be found here.
  • Distances
    • We computed the percent identity excluding gaps (PIDNG) distance for each pair of aligned sequences in the MSA file. The formula is to count the number of identical base pairs and divide it by the total number base pairs. Each base pair should have characters belonging to the given alphabet to be considered in the computation. To convert this value to represent the distance between two sequences we subtract it from 1.0.
      • PIDNG Distance = 1.0 – PIDNG
      • where PIDNG = # Identical Pairs / # Pairs  and both characters in each pair belongs to the given alphabet.
    • We compute percent identity (PID) distance for each pair of sequences in the original sequence (FastA) file after aligning them locally using Smith-Waterman algorithm. For a given alignment of the two sequences, PID is calculated similarly to PIDNG except those pairs  whose one character is a gap are also counted towards the total number of pairs in the alignment. Similar to PIDNG distance we take the 1.0 – PID to get PID distance.
      • PID Distance = 1.0 – PID
      • where PID =  # Identical Pairs / # Pairs
  • Alphabet
    • Strict DNA
      • Considers only A, C, T, G characters
    • Full DNA
      • Considers A, C, M, G, R, S, V, T, W, Y, H, K, D, B, N characters
    • We computed PIDNG distance for MSA file using both alphabets
    • We computed PID distance for locally aligned sequences using Full DNA, but excluding character N.
  • Smith-Waterman Parameters
    • Scoring Matrix
      • EDNAFULL
    • Penalties
      • Gap open = –16 and gap extension = – 4
  • Heatmaps
Strict DNA Alphabet for MSA PIDNG Calculation
Full DNA Alphabet for MSA PIDNG Calculation
whole-plot-MSA-PIDNG-Vs-Pairwise-SWG-PIDDensitySat[0.85]-large whole-plot-MSA-PIDNG-Vs-Pairwise-SWG-PIDDensitySat[0.85]-large