Wednesday, June 4, 2014

GSF D1/D2 Data: Statistics


  • Original Sequence Counts
    • D1 - 265655
    • Reversed D2 - 265655
  • Paired Unique Sequence Counts
    • D1 - 143223
    • Reversed D2 - 143223
  • Average Lengths
    • Paired Unique D1 - 254
    • Paired Unique Reversed D2 - 247



  • Paired Unique D1 Sequence Length Histogram


  • Paired Unique Reversed D2 Sequence Length Histogram

Sunday, April 27, 2014

GSF R1/R2 Data: R2 Statistics


The following figures present statistics for a selected 100k of  R2 dataset along with the so called "Master" data set of 1020 sequences. 

Summary of Content
  • Histograms
    • SWG Alignment length of R2 with itself (top left)
    • SWG Alignment length of Reversed R2 with Master (top right)
    • SWG PID of R2 with itself (bottom left)
    • SWG PID of Reversed R2 with Master (bottom right)
  • Heatmaps for SWG PID of R2 with itself against alignment length. Also with alignment length cut
  • Heatmap for SWG PID of Reversed R2 with Master against alignment length.
  • Heatmap for SWG alignment length of Reversed R2 with Master against Non-reversed R2 with Master
  • Heatmap for SWG score of Reversed R2 with Master against Non-reversed R2 with Master



  • Histograms

  • R2 SWG PID Vs R2 SWG Align Length -- length cut > 100 on right

  • Reversed R2 100K + Master 1020 SWG PID Vs. Alignment Length

  • 100K+1020 R2 Reversed Alignment Length Vs Alignment Length

  • 100K+1020 R2 Reversed Score Vs Score

GSF R1/R2 Data: Heatmaps for (R1/R2) + "Master" Data - 2



  • R1 100K + Master 1020 SWG PID Vs. Alignment Length


  • Reversed R2 100K + Master 1020 SWG PID Vs. Alignment Length

Saturday, April 19, 2014

GSF R1/R2 Data: Heatmaps for (R1/R2) + "Master" Data

  • 100K+1020 R1 Reversed Alignment Length Vs Alignment Length

  • 100K+1020 R1 Reversed Score Vs Score

  • 100K+1020 R2 Reversed Alignment Length Vs Alignment Length

  • 100K+1020 R2 Reversed Score Vs Score

Tuesday, April 15, 2014

GSF R1/R2 Data: Heatmaps for (R1/R2) Data

Heatmaps for corresponding 100k random samples of GSF data.

  • R2 SWG PID Vs R1 SWG PID



  • R1 SWG PID Vs R1 SWG Align Length -- length cut > 100 on right




  • R2 SWG PID Vs R2 SWG Align Length -- length cut > 100 on right

Friday, October 11, 2013

revised 599nts and 999nts Raxml vs Clustering using Spherical Phylogenetic Tree


Dataset

1. revised 599nts
This work follows the previous work on http://salsafungiphy.blogspot.com/2013/09/pairwise-distances-from-msa-vs-pairwise.html
This dataset has a total number of 831 sequences.
2. 999nts
This work follows the previous work on http://salsafungiphy.blogspot.com/2013/06/pairwise-distances-from-multiple.html
This dataset has a total number of 1306 sequences.

Alignment

The clustering was done based on
1) SWG, using EDNAFULL scoring matrix, with gap open = -16 and gap extension = -4
2) Multiple sequence alignment, PID.

The Raxml was done based on multiple sequence alignment,
1) revised 599nts, newick file
2) 999nts, newwick file

Spherical Tree

The information of spherical tree can be found on previous work of http://salsafungiphy.blogspot.com/2012/11/phylogenetic-tree-generation-for.html and http://salsafungiphy.blogspot.com/2012/11/phylogenetic-tree-mega-table.html

Dimension Reduction

Manxcat SMACOF:
The pviz file for SWG clustering result with revised 599nts is here, for 999nts is here.
For MSA clustering result with revised 599nts is here, for 999nts is here.
WDA-SMACOF:
alpha set to 0.95
The pviz file for SWG clustering result with revised 599nts is here, for 999nts is here.
For MSA clustering result with revised 599nts is here, for 999nts is here.

Sum of branch lengths (edge sum)

(note that the difference of edge sum between SWG and MSA should due to the difference of the original distances of the clustering plot)
Manxcat SMACOFWDA-SMACOF
revised 599nts999ntsrevised 599nts999nts
SWG19.3719.8916.0112.74
MSA16.6216.3615.2013.65
1) revised 599nts spherical tree

MSA Result

SWG Result


2) 999nts spherical tree

MSA Result
SWG Result