The analysis of sequence correlation structure, in both the spatial and the frequency domains, resulted in the finding of short-range and long-range correlations in nucleotide sequences, thus uncovering a complex fractal structure for DNA. Sequence structure can be adequately revealed through segmentation algorithms. One of these, conceptually simple and computationally efficient, was proposed last year by our group. With such a method, a DNA sequence can be decomposed into homogeneous subsequences (patches or domains). By varying an appropriate threshold, we obtain different partitions of the sequence at different statistical significance levels.

When segmenting sequences with simple domain structures, homogeneous domains can be consistently found (if purely random fluctuations are excluded). However, when the method is applied to more complex, long-range correlated sequences, such homogeneity vanishes: by relaxing the threshold value, we find new domains within other domains, previously taken as homogeneous under a higher threshold value. This domains-within-domains phenomenon points to complex compositional heterogeneity in DNA sequences, which is consistent with the hierarchical nature of biological complexity.

Bernaola-Galván P, Román-Roldán R, Oliver JL. 1996.
Compositional segmentation and long-range fractal correlations in DNA sequences.
Physical Review E 53: 5181-5189 [PDF]

Oliver JL, Román-Roldán R, Pérez J, Bernaola-Galván P. 1999.
SEGMENT: identifying compositional domains in DNA sequences.
Bioinformatics 15: 974-979 [PDF]

Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley HE. 2002.
Analysis of symbolic sequences using the Jensen-Shannon divergence.
Physical Review E 65: 041905 [PDF]

Pedro Bernaola-Galván, José L. Oliver, Michael Hackenberg, Ana V. Coronado, Plamen Ch. Ivanov, and Pedro Carpena. 2012. Segmentation of time series with long-range fractal correlations
The European Physical Journal B 85: 211 [PDF]