Friday, April 20, 2012

PhyTree visualized on 123+74+420+100k (pid distance)


Description

Environment: Tempest; 32 nodes (768 cores)
Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-Normalized Score)
Method: 100k+123 fixed, 74 and 420 varied
WithReverse: 74 reversed, 420 not reversed

Dataset

1) 100k selected from 440k; (id:617~100616)
2) 123 consensus sequence; (id: 420~542)
3) 74 sequences from Gen Bank; (id: 543~616)
4) 420 center sequences; (id: 0~419)

Tree Configuration
New sequences from Gen Bank: Hexagon
Consensus sequence in Haixu 123: Triangle
Center sequences in 420: Rectangle
Root: Sphere
Color Scheme:  Phylogenetic tree generated here

Final Result

1. All shown Plot File in PVIZ format

Screen Shot


Thursday, April 12, 2012

PhyTree visualized on 197 (pid and score distance)

Input

1) PID distance
Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-PercentIdentity)
WithReverse: Yes

2) Score distance
Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-Normalized Score)
WithReverse: Yes

Visualization Algorithm 

DA-SMACOF

Result in Plot

1) Plot from PID distance


2) Plot from Score distance


Detailed Analysis on Length Effect

Input

8 sequences selected from 100k+197+420 FASTA file

Description

Aligner: SmithWaterman
OpenGapPentalty: -16
GapExtensionPenalty: -4
WithReverseCompliment: No

Configuration


Sequence IDs:
73394
48231
20081
53373
81334
20764
25672
76209

Result

1) Self aligned Result


2) Aligned Result based on analysis on 73394


3) Distance Result based on 73394(id) sequence

Wednesday, April 11, 2012

Length Checking On PID and Score Matrices

Input

1) PID distance
Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-PercentIdentity)
WithReverse: No

2) Score distance
Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-Normalized Score)
WithReverse: No

3) Score distance
Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-Normalized Score)
WithReverse: Yes

Visualization Algorithm 

DA-SMACOF

Configuration


Original Sequence Lengths
Cluster 4: 200~249
Cluster 5: 250~299
Cluster 6: 300~349
Cluster 7: 350~399
Cluster 8: 400~449
Cluster 9: 450~499
Cluster 10: 500~549
Cluster 11: 550~599
Cluster 12: 600~649
Cluster 13: 650~699
Cluster 14: 700~749
Cluster 16: 800~849
Cluster 20: 1050~1099

Result in Plot

Tuesday, April 10, 2012

PhyTree Visualized on 100k+197+420 (score distance)


Description

Environment: Polar Grid (Quarry); 100 nodes (800 cores)
Runtime: Twister
Algorithm: DA-SMACOF

Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-Normalized Score)
With Reverse Compliment: Yes
Method: All Varied

Total Time: 4006.004 Seconds.

Dataset

1) 100k selected from 440k; (id:617~100196)
2) 123 consensus sequence; (id: 420~542)
3) 74 sequences from Gen Bank; (id: 543~616)
4) 420 sequences as new centers; (id: 0~419)


MDS Configuration

[1. Num map tasks ]
[2. Input Folder]
[3. Input File Prefix]
[4. IDs File ]
[5. Label Data File ]
[6. Output File ]
[7. Threshold value ]
[8. The Weighted Flag ]
[9. The Target Dimension ]
[10. Cooling parameter (alpha) ]
[11. Input Data Size]
[1. Num map tasks ]: 800
[2. Input Folder]: /N/dc/scratch/yangruan/matrix/haixu/800/
[3. Input File Prefix]: 100617_800p_
[4. IDs File ]: /N/dc/scratch/yangruan/matrix/haixu/100k+197+420_800p.idx
[5. Label Data File ]: NoLabel
[6. Output File ]: /N/dc/scratch/yangruan/fasta/haixu/440k/100k_197_420/100617_score_dasmacof.txt
[7. Threshold value ]: 1.0E-6
[8. The Weighted Flag ]: 0
[9. The Target Dimension ]: 3
[10. Cooling parameter (alpha) ]: 0.95
[11. Input Data Size]: 100617

Tree Configuration
New sequences from Gen Bank: Hexagon
OK sequence in Haixu Consensus: Triangle
Not OK sequence in Haixu Consensus: Rectangle
Root: Sphere
Color Scheme:  Phylogenetic tree generated here

Final Result

1. Plot File in PVIZ format

Screen Shot


Monday, April 9, 2012

Fungi 100K+197 Normalized Score with Sammon and DA-SMACOF Init

Description

DataSet: Haixu 100K+197 Size: 100197 Unique: Yes
Aligner: SmithWaterman ScoringMatrix: EDNAFULL GapOpen: -16 GapExt: -4 DistanceType: Normalized SWG Score (2AB/AA+BB) Transformation: None
Mapping: Sammon DistanceCut: None
Initialization: DA-SMACOF Fixed: None
Varied: [0-100196]
DensitySat: 0.85

Links

Images

Full Sample with Selected Clusters

 Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\millions\haixu_100K+197\100k+197_swg_distance_normscore_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1000
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    100197
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   -1
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         120
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              0
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               0
 TransformParameter:            0.125
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              
 XmaxBound:                     1
 Xres:                          50
 YmaxBound:                     1
 Yres:                          50

Tuesday, April 3, 2012

PhyTree visualized on 100k+197 (pid distance)

Input

Three MDS results from Manxcat run from Tempest:
1.) Manxcat STRESS all varied DA-SMACOF initial
2.) Manxcat STRESS 197 varied 100K fixed DA-SMACOF initial ChisqPrintConst=1000
3.) Manxcat Sammon 197 varied 100K fixed DA-SMACOF initial chisqprintconst=1000

Description

Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-PercentIdentity)

Dataset

1)100k selected from 440k; (id:197~100196)
2) 123 consensus sequence; (id: 0~122)
3) 74 sequences from Gen Bank; (id: 123~196)

Result

1.) Manxcat STRESS all varied DA-SMACOF initial
2.) Manxcat STRESS 197 varied 100K fixed DA-SMACOF initial ChisqPrintConst=1000
3.) Manxcat Sammon 197 varied 100K fixed DA-SMACOF initial chisqprintconst=1000

Screen shot





PhyTree Visualized on 100k+197 (score distance)


Description

Environment: Polar Grid (Quarry); 100 nodes (800 cores)
Runtime: Twister
Algorithm: DA-SMACOF

Aligner: SmithWaterman
ScoringMatrix: EDNAFULL
GapOpen: -16
GapExt: -4
DistanceType: (1-Normalized Score)
Method: All Varied

Total Time: 4207.237 Seconds.

Dataset

1) 100k selected from 440k; (id:197~100196)
2) 123 consensus sequence; (id: 0~122)
3) 74 sequences from Gen Bank; (id: 123~196)


MDS Configuration

[1. Num map tasks ]: 800
[2. Input Folder]: /N/dc/scratch/yangruan/fasta/haixu/440k/100k_197/800_mul/
[3. Input File Prefix]: end_distance_
[4. IDs File ]: /N/dc/scratch/yangruan/fasta/haixu/440k/100k_197/100k+197_800p.idx
[5. Label Data File ]: NoLabel
[6. Output File ]: /N/dc/scratch/yangruan/fasta/haixu/440k/100k_197/100k+197_dasmacof.txt
[7. Threshold value ]: 1.0E-6
[8. The Weighted Flag ]: 0
[9. The Target Dimension ]: 3
[10. Cooling parameter (alpha) ]: 0.95
[11. Input Data Size]: 100197

Tree Configuration
New sequences from Gen Bank: Hexagon
OK sequence in Haixu Consensus: Triangle
Not OK sequence in Haixu Consensus: Rectangle
Root: Sphere
Color Scheme:  Phylogenetic tree generated here

Final Result

1. Plot File in PVIZ format

Screen Shot