gClusterVM (64-bit Xubuntu 16.04 LTS)
- commons-primitives-1.0.jar (mandatory dependency). Must be in the folder where the other executables are.
- makeSeqObj.jar: index the fasta input files in order to make them faster accessible. With an ‘assembly.fa’ input file it generates an ‘assembly.zip’ file which, in turn, is the input file for the programs described below.
- gCluster.jar: determines the local clusters of a given DNA word (CpG islands if the DNA word is ‘CG’) and its global clustering properties. It can work on both strands (for non-palindromic words) and accepts any combination of DNA words (like CAG:CTG:CCG for the methylation context CHG).
- randomizer.jar: randomizes DNA sequences preserving the dinucleotide frequencies. The theoretical framework does not contemplate the existence of N-runs, therefore: i) contigs are merged, ii) the sequence is shuffled, iii) the N-runs are introduced again at the original position. Therefore the dinucleotides are not strictly conserved, because the last nucleotide of a contig and the first of the next contig form an ‘artificial’ dinucleotide. However, the ‘error’ is negligible even for highly fragmented assemblies (high number of contigs).
- prepareAssembly.py: i) obtains the genome sequences and ii) splits the assembly into canonical sequences (the reference sequences of the chromosomes) and alternative sequences (alternative assemblies, unassembled sequences etc.). Normally, the predictions will be carried out on the canonical sequences only. It can work with local files or directly retrieve the corresponding files from UCSC.
- GenomeCluster.pl: determines the local clusters of whatever genome element identified by chromosome coordinates.