The gCluster algorithm is a general clustering method that predicts clusters of any biological word or combination of them, relying only on the DNA sequence and the statistical significance. When using CG as word, gCluster works similarly to CpGcluster [1], our method to predict CpG islands. More broadly, gCluster has much in common with wordCluster [2] but uses an improved distance model.

