The gCluster[1] algorithm is a general clustering method that predicts clusters of any biological word or combination of them, relying only on the DNA sequence and the statistical significance. When using CG as word, gCluster works similarly to CpGcluster [2], our method to predict CpG islands. More broadly, gCluster has much in common with wordCluster [3] but uses an improved distance model.
[1] Gómez-Martín C., Lebrón R., Oliver J.L., Hackenberg M. (2018) Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation. In: Vavouri T., Peinado M. (eds) CpG Islands. Methods in Molecular Biology, vol 1766. Humana Press, New York, NY. doi: 10.1007/978-1-4939-7768-0_3
[2] Hackenberg M, Previti C, Luque-Escamilla PL, Carpena P, Martínez-Aroza J, Oliver JL. CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics. 2006; 7:446.
[3] Hackenberg M, Carpena P, Bernaola-Galvan P, Barturen G, Alganza AM, Oliver JL. WordCluster: detecting clusters of DNA words and genomic elements. Algorithms Mol. Biol. 2011; 6:2.