GenomeCluster

GenomeCluster website

Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in the genome. To evaluate the role of DNA clustering in genome complexity, we develop GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. This allows us to get a detailed description of clusters for ten categories of genome elements, including functional, regulatory, variant and repeat elements. For each category, we locate their clusters in the genome, then quantifying cluster length and composition, and estimating the proportion of genome elements that are clustered. In average, we found a 27% of clustered elements, although a considerable variation occurs between element categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest proportion of clustered elements, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome clusters we obtained for all the categories are in turn composed by clusters of low-level entities (i.e. domains within domains), thus uncovering a complex genome landscape dominated by hierarchical clustering.

[1] Dios F., G. Barturen, R. Lebrón, A. Rueda, M. Hackenberg and J.L. Oliver. 2014. DNA clustering and genome complexity. Computational Biology and Chemistry (in preparation).