logo

Latest Version .jar File View on GitHub
Get ACES

ACES is a machine learning toolbox for clustering analysis and visualization of both biological data and other types data. Given the biological data or their distance/probability matrix, ACES can automatically extract the features of each identity and cluster them by various widely used clustering algorithms. To facilitate the Hierarchical and k-means Clustering, the candidate centroids of clusters are first estimated by a novel distances and standard deviation based algorithm. To visualize the original data or distance matrix, Principle Component Analysis is used to reduce the dimensionality and extract three significant components for plotting them into 3D space. ACES also provides the interface for clustering analysis and visualization together with the attributes or sample information of each identi.ty. It is clear to show which attributes contribute to the clustering results.

Run ACES

Step 1: Get the latest version of ACES, together with some data for testing, from here. To run ACES just double-click on the latest ACES.jar file.

Step 2: Check the format of files. Original Data, Distance Matrix, Data Attributes

Step 3: Start your ACES journey.


How to use?

Load Original Data File:

Check the recommended format

Open a file

Distance Matrix:

Check the recommended format

Open a file or files

Clustering Analysis:

Get the number of Clusters

Show Hierarchical clustering results

Show K-means clustering results

Show DBSCAN clustering results

Attributes:

Check the recommended format

Open a SampleInfo file

Show all the Attributes in the SampleInfo file

Show the discriminative power of each Attribute

Select an Attribute to plot

Add Clusters Info to the SampleInfo file

Save the SampleInfo

Visualization:

Plot samples with clustering results

Plot samples with the selected attribute

Heat map of the samples

Heat map of the samples with clustering results

Heat map of the samples with the selected attribute



Original Sample Data Format

There are two choices to format your sample data file: File -> Formats -> Raw data

Format 1: The Label ID is shown in the one of columns. The data vector of each sample is distributed by rows.
Format 2: The Label ID is shown in the one of rows. The data vector of each sample is distributed by columns.




back to top


Open a Sample Data file

File -> Open -> Raw data

The file named "Original_Data_file" in the test samples folder can be used for testing.

User is asked to set the required parameters to extract the sample data and the label IDs will be automatically set as "Sample1, Sample2, Sample3..." if there is no label information in the file or the user fails to set the label location.

Once the sample data file is loaded, ACES will automatically calculate its distance matrix and obtain the Hierachical clustering results.


There are three distance matrix mearurement methods provided in ACES:

1. Manhattan distance

2. Euclidean distance

3. Pearson's correlation coefficients (ACES automatically converts them to range [0,1])




back to top


Distance Matrix

There are four choices to format your distance matrix file: File -> Formats -> Distance matrix

For each format, it is allowed to contain the name of distance matrix.

Format 1: Labels locate in both horizontal and vertical
Format 2: Labels locate in horizontal
Format 3: Labels locate in vertical
Format 4: no labels, labels will be automatically set as "Sample1, Sample2, Sample3..."




DM format

back to top


Open a Distance Matrix file

File -> Open -> Distance matrix

For single distance matrix, the file named "single_distance_matrix_AML" in the test samples folder can be used for testing.





If the file contains many distance matrices, please select one of the matrix to analyse.

The file named "multi_distance_matrix_brain" in the test samples folder can be used for testing.



If you want to choose the other distance matrix to compare or analyse: Edit -> Select other distance matrix

The current distance matrix is shown in the menu.


back to top


Get the number of Clusters

View -> Clustering -> Numbers



back to top


Show Hierarchical clustering results

View -> Clustering ->Hierarichical

The number of clusters has been estimated automatically so that the user doesn't need to set the parameters.
The clustering results will be shown on the screen.

back to top


Show K-means clustering results

View -> Clustering -> Kmeans

The number of clusters and the centroids have been estimated automatically so that the user doesn't need to set the those parameters.
The clustering results will be shown on the screen.

back to top


Show DBSCAN clustering results

View -> Clustering -> DBSCAN

The scan radius and minimum number of samples have been estimated automatically and provided on the dialog below. However, it is best to use your own parameters as this clustering algorithm aims at clustering samples by density and it is hard to estimate them automatically only on the basis of the data.



The clustering results will be shown on the screen. "0" means the outlier or noisy sample that is defined in the DBSCAN algorithm.

back to top


Plot samples with clustering results (3D and 2D)

The samples can be visualized in the 3D space or 2D plane by the PCA dimensionality reduction. Each sample is coloured by its clustering result.

As the figure shown below, all the samples are clearly categorized into two groups after PCA, which shows that the blue samples and pink samples are two seperate groups.

The number of clusters/groups are automatically calculated by ACES, however, the cluster results might be varying if the user selects different clustering algorithms. The number of clusters can be changed by setting different parameters in DBSCAN clustering.

The 3D plot can be rotated to have an appropriate view.

The Label IDs are shown on the right. Both the points on the left plot and IDs on the right can be selected when the user clicks. As shown in the 3D plot, the top point/sample is selected (the colour change to black from pink) and its ID is immediately shown in the legend highlighted by black.

Visualization -> 3D Plot -> Samples


Visualization -> 2D Plot -> Samples


back to top


Heat map of the samples

The distance matrix, which demonstrates the relationship between each two samples pair, can be visualized by the Heat map. Visualization -> Heat Map -> Original Distance Matrix


back to top


Heat map of the samples with clustering results

The samples in the distance matrix are reordered according to the clustering results. Visualization -> Heat Map -> After Clustering

The distance matrix has been reordered by the clustering results.


back to top


Check the recommended format for attributes files

File -> Formats -> Attributes


back to top


Open a SampleInfo file

File -> Open -> Attributes

The file named "samplesInfo_AML" and "samplesInfo_brain" in the test samples folder can be used for testing.


According to your Distance Matrix, ACES will automatically check whether your input SampleInfo file is correct and then remind you to sort your SampleInfo file. Please open the file "samplesInfo_brain" to better understand the following example.

To sort the SampleInfo file, the SampleInfo Labels should be choosen first.




Then, ACES will remind you to reformat the SampleInfo or Distance Matrix labels.




To make them consistent, you can simply change the Distance Matrix labels.



Or, change both types of labels.



The new labels will be shown on the screen.


back to top


Show all the Attributes in the SampleInfo file

View -> All attributes

....

back to top


Show the discriminative power of each Attribute

View -> Discriminative power

The discriminate power of each attribute is estimated by our proposed algorithm.


back to top


Select an Attribute to plot

Edit -> Select an attribute to plot



For example, the Gender is selected.



back to top


Add Clusters Info to the SampleInfo file

The clustering results can be added to the SampleInfo file. Edit -> Add clusters to SampleInfo



back to top


Save the SampleInfo file

The user can save the sorted SampleInfo file. File -> Save the SampleInfo




back to top


Plot samples with the selected attribute(3D and 2D)

Visualization -> 3D plot -> Selected Attribute


Visualization -> 2D plot -> Selected Attribute


back to top


Heat map of the samples with the selected attribute

The selected attribute info are shown on the bottom of heat map using the same colour shown in the PCA plot. Visualization -> Heat Map -> with Selected Attribute


back to top