{"id":2348,"date":"2020-06-18T12:27:21","date_gmt":"2020-06-18T10:27:21","guid":{"rendered":"http:\/\/bioinfo2.ugr.es\/gCluster\/?page_id=2348"},"modified":"2020-06-18T12:27:22","modified_gmt":"2020-06-18T10:27:22","slug":"manual-2","status":"publish","type":"page","link":"https:\/\/bioinfo2.ugr.es\/gCluster\/manual-2\/","title":{"rendered":"gCluster Manual"},"content":{"rendered":"<p><a href=\"http:\/\/bioinfo2.ugr.es\/gCluster\/downloads\/MANUAL-gCluster.pdf\">Download in pdf<\/a><\/p>\n<h1><span style=\"font-size: 16pt; color: #003366;\"><strong>Preparation of the protocol<\/strong><\/span><\/h1>\n<p>You\u00a0can follow the protocol in two ways:<\/p>\n<ol>\n<li>Install\u00a0the executables files.<\/li>\n<li>Use gClusterVM, a virtual machine with all that you need.<\/li>\n<\/ol>\n<p>Both options work on most of operating system (Windows, Mac OS X, Linux and Solaris).<\/p>\n<h2><span style=\"color: #003366; font-size: 14pt;\">Using\u00a0gClusterVM<\/span><\/h2>\n<ul>\n<li>Download and install\u00a0<a href=\"https:\/\/www.virtualbox.org\/wiki\/Downloads\">VirtualBox<\/a>\u00a0(Windows, Mac OS X, Linux and Solaris are supported). Install also the VirtualBox Extension Pack.<\/li>\n<li>Download gClusterVM OVA file and import it on VirtualBox by double-clicking.<\/li>\n<li>Configures the machine virtual: allocated memory (minimum: 4\u00a0GB), CPUs assigned (minimum: 2) and folder shared for exchange files with the machine virtual (highly recommended).<\/li>\n<li>Run the virtual machine<\/li>\n<\/ul>\n<h2><span style=\"font-size: 14pt;\"><strong><span style=\"color: #003366;\">Using standalone executables<\/span><\/strong><\/span><\/h2>\n<p>To use standalone executables, first you must follow the following steps:<\/p>\n<ul>\n<li>Download and install <a href=\"https:\/\/www.java.com\/en\/download\/\">Java<\/a> 8 or higher.<\/li>\n<li>Download and install <a href=\"https:\/\/www.python.org\/downloads\/\">Python 3.4<\/a> or higher. Important: in Windows check\u00a0the\u00a0option\u00a0\u2018Add\u00a0to\u00a0the\u00a0PATH\u2019 during\u00a0the\u00a0installation.<\/li>\n<li>Download and extract the standalone executables.<\/li>\n<li>Open\u00a0a\u00a0terminal\u00a0to\u00a0execute\u00a0the\u00a0commands\u00a0of\u00a0the\u00a0protocol or in this manual. If\u00a0you are using\u00a0windows,\u00a0you\u00a0can\u00a0use\u00a0CMD\u00a0or\u00a0PowerShell.<\/li>\n<\/ul>\n<p>In the following sections we will describe in detail the features of all\u00a0programs with usage examples.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong><span style=\"color: #ff0000;\">IMPORTANT:\u00a0<\/span><\/strong><\/span>All commands in this manual are for standalones. If you use gClusterVM you must omit the interpreter name (i.e. \u201cpython\u201d, \u201cjava\u201d) and the extension of the script (i.e. \u201c.py\u201d, \u201c.jar\u201d).<\/p>\n<p>E.g. of command using standalones:<\/p>\n<pre style=\"text-align: center;\"><code>python prepareAssembly.py -u hg38 -o \/home\/gcluster\/sequences -l hg38<\/code><\/pre>\n<p>E.g. of command using gClusterVM:<\/p>\n<pre style=\"text-align: center;\"><code>prepareAssembly -u hg38 -o \/home\/gcluster\/sequences -l hg38<\/code><\/pre>\n<p><span style=\"font-size: 16pt;\"><strong><span style=\"color: #003366;\">Prepare the sequences: prepareAssembly.py, makeSeqObj.jar and randomizer.jar<\/span><\/strong><\/span><\/p>\n<h2><span style=\"font-size: 14pt; color: #003366;\"><strong><em>prepareAssembly.py<\/em><\/strong><\/span><\/h2>\n<p>It can obtains an assembly (from UCSC) and extract canonical sequences from it. It also works with local assembly files.<\/p>\n<p><span style=\"font-size: 10pt;\"><em>Input parameters<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>-i &lt;path or URL&gt;:<\/strong> Path or URL to multi-FASTA file (.fa, .fa.gz,.fa.tar.gz, .bz2)<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-u &lt;UCSC id&gt;:<\/strong> UCSC assembly ID (e.g. hg38)<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-l &lt;label&gt;:<\/strong> Label for output files. (Default: &#8216;assembly&#8217;)<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-o &lt;outdir&gt;:<\/strong> Path to output directory, default current directory<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-r &lt;regex&gt;:<\/strong> REGEX to filter canonical or non-canonical chromosomes. REGEX to filter canonical chromosomes must be precede$<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 10pt;\"><em>Output files<\/em><\/span><\/p>\n<p><span style=\"font-size: 10pt;\">In the output folder two files will be generated:<\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>&#8216;label&#8217;_canonical.fa:<\/strong> multi-FASTA file that contains all canonical chromosomes sequence.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>&#8216;label&#8217;_noncanonical.fa:<\/strong> multi-FASTA file that contains all non-canonical chromosomes sequence.<\/span><\/li>\n<\/ul>\n<h3><strong><span style=\"text-decoration: underline; color: #003366; font-size: 12pt;\">Extract canonical sequences from assembly (from local file)<\/span><\/strong><\/h3>\n<pre style=\"padding-left: 30px; text-align: center;\"><code>python prepareAssembly.py -i \/home\/gcluster\/sequences\/hg38.fa -o \/home\/gcluster\/sequences -l hg38<\/code><\/pre>\n<h3><span style=\"color: #003366;\"><strong><span style=\"text-decoration: underline; font-size: 12pt;\">Obtain assembly and extract canonical sequences (from UCSC)<\/span><\/strong><\/span><\/h3>\n<pre style=\"padding-left: 30px; text-align: center;\"><code>python prepareAssembly.py -u hg38 -o \/home\/gcluster\/sequences -l hg38<\/code><\/pre>\n<h2><span style=\"font-size: 14pt; color: #003366;\"><strong><em>makeSeqObj.jar<\/em><\/strong><\/span><\/h2>\n<p><span style=\"font-size: 12pt;\">It index the fasta input files in order to make them faster accessible. With an \u2018assembly.fa\u2019 input file it generates an \u2018assembly.zip\u2019 file which, in turn, is the input file for the programs described below.<\/span><\/p>\n<p><em><span style=\"font-size: 10pt;\">Input parameters<\/span><\/em><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>\u2018assembly\u2019_canonical.fa:<\/strong> Multi-FASTA file of the genome assembly.<\/span><\/li>\n<\/ul>\n<p><em><span style=\"font-size: 10pt;\">Output files<\/span><\/em><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>\u2018assembly\u2019.zip file<\/strong>, that should be used for all the subsequent analyses.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>\u2018assembly\u2019_canonical.N<\/strong> : file with the coordinates of the N-runs in the assembly.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>\u2018assembly\u2019_canonical.chromSize<\/strong> : chromosome size file.<\/span><\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\"><span style=\"font-size: 10pt;\">All of them will be created in the same folder as input<\/span><\/p>\n<pre style=\"text-align: center;\"><code>java -jar makeSeqObj.jar \/home\/gcluster\/sequences\/hg38_canonical.fa<\/code><\/pre>\n<h2><span style=\"font-size: 14pt; color: #003366;\"><strong><em>randomizer.jar<\/em><\/strong><\/span><\/h2>\n<p>Randomizes DNA sequences preserving the dinucleotide frequencies.<\/p>\n<p><span style=\"font-size: 10pt;\"><em>Input parameters<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>\u2018assembly\u2019.zip<\/strong> file generated by makeSeqObj.jar<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 10pt;\"><em>Output files<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>\u2018assembly\u2019_random.zip<\/strong> file. It would be created in the same folder as input.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: center;\"><code>java -jar randomizer.jar \/home\/gcluster\/sequences\/hg38_canonical.zip<br \/>\n<\/code><\/p>\n<h1><span style=\"font-size: 16pt; color: #003366;\">Determine local clusters of DNA words and its global clustering properties: <span style=\"font-size: 14pt;\">gCluster<\/span><\/span><\/h1>\n<p><span style=\"font-size: 12pt;\">gCluster could determine the local clusters of a given DNA word (CpG islands if the DNA word is \u2018CG\u2019) and its global clustering properties. It can work on both strands (for non-palindromic words) and accepts any combination of DNA words (like CAG:CTG:CCG for the methylation context CHG).\u00a0<\/span><\/p>\n<p>The following parameters are all possible options of gCluster. In the above sections it would be detailed which parameters must be set for each analysis.<\/p>\n<p><span style=\"font-size: 10pt;\"><em>Input parameters:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>genome=&lt;path&gt;:<\/strong> the indexed multi-FASTA file (see \u2018Prepare the sequences\u2019 section above).<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>output=&lt;output folder&gt;:<\/strong> the output folder where the results will be written.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>pattern=&lt;the k-mers&gt;: <\/strong>the k-mers (DNA words) that should be analysed, eg. Pattern=CG to obtain CpG islands. For detection of CWG cluster, pattern=CAG:CTG should be used.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>writedistribution=&lt;boolean&gt;:<\/strong> write out the observed and expected distance distribution (default writedistribution=false)<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>chromStat=&lt;boolean&gt;:<\/strong> if true, the program writes out additional information of the chromosome sequences like G+C content, CpG frequencies (observed\/expected ratios), lengths, etc. (default: chromStat=false)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 10pt;\"><em>Output files:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>txt:<\/strong> The clusters identified by its chromosomal coordinates and p-value adding basic compositional statistics like G+C content and O\/E ratios of the pattern (the number of observed patterns divided by the number of expected pattern). In case of a compound pattern like CAG:CTG, the mean O\/E ratio is calculated.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>txt: <\/strong>The file holds the normalized coefficient of variation for each chromosome.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>*.distr files: <\/strong>The distance distribution, the observed and expected frequencies as a function of next-neighbour distance.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>txt: <\/strong>A log file<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>txt: <\/strong>A basic statistic as a function of chromosome, i.e pattern frequency, G+C content, lengths and contig length (the sequence length minus the sum of all N\u2019s.<\/span><\/li>\n<\/ul>\n<h2><span style=\"font-size: 14pt; color: #003366;\">Calculate clustering properties<\/span><\/h2>\n<pre style=\"text-align: center;\"><code>mkdir \/home\/gcluster\/results (generates the \u2018results\u2019 output directory)<\/code><\/pre>\n<pre style=\"text-align: center;\"><code>java -jar gCluster.jar genome=\/home\/gcluster\/sequences\/hg38_canonical.zip pattern=CG output=\/home\/gcluster\/results\/hg38_CG \u00a0writedistribution=true chromStat=true<\/code><\/pre>\n<p><span style=\"color: #ff0000;\"><strong>IMPORTANT:<\/strong><\/span> writedistribution and chromStat parameters must be set to \u2018True\u2019<\/p>\n<h2><span style=\"font-size: 14pt; color: #003366;\">Detect the clusters of DNA words<\/span><\/h2>\n<h3><span style=\"text-decoration: underline; color: #003366;\"><span style=\"font-size: 12pt;\">CpG Islands<\/span><\/span><\/h3>\n<p>With the following command (same as above), the clusters are automatically determined and written to the cluster.txt file. The distance threshold is determined by the genome intersection.<\/p>\n<pre style=\"text-align: center;\"><code>java -jar gCluster.jar genome=\/home\/gcluster\/sequences\/hg38_canonical.zip pattern=CG output=\/home\/gcluster\/results\/hg38_CG \u00a0writedistribution=true chromStat=true<\/code><\/pre>\n<h3 style=\"text-align: left;\"><span style=\"text-decoration: underline;\"><span style=\"font-size: 12pt; color: #003366;\"><strong>Clusters of other biologically relevant k-mers<\/strong><\/span><\/span><\/h3>\n<p>Like mentioned before, in plants cytosine can be methylated as well in other contexts like CWG (CAG, CTG), CCG or CHH.<\/p>\n<p>The clustering of these contexts can be calculated with the following command.<\/p>\n<pre style=\"text-align: center;\"><code>java -jar gCluster.jar genome=\/home\/gcluster\/sequences\/ hg38_canonical.zip pattern=CAG:CTG output=\/home\/gcluster\/results\/hg38_CWG writedistribution=true chromStat=true<\/code><\/pre>\n<h1 style=\"text-align: left;\"><span style=\"font-size: 16pt; color: #003366;\">Clusters of clusters: <\/span><span style=\"font-size: 14pt; color: #003366;\">GenomeCluster.pl<\/span><\/h1>\n<p>GenomeCluster determines\u00a0the local clusters of genome elements identified by its chromosome coordinates. It could be used to cluster the clusters predicted by gCluster, in order to make clusters of cluster.<\/p>\n<p><span style=\"font-size: 10pt;\"><em>Input parameters (the arguments must be given in this order):<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>Argument 1:<\/strong> the distance model &lt;element; start; middle; end&gt;.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>Argument 2:<\/strong> the BED file within the chromosome coordinates of whatever genome element (e.g. CpG islands).<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>Argument 3<\/strong>: the distance threshold model &lt;gi: genome intersection; ci: chromosome intersection; N (integer): percentile&gt;.<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>Argument 4: <\/strong>the p-value threshold (by default 1E-5 as in gCluster).<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>Argument 5:<\/strong> the file with the N-runs created by makeSeqObj.jar (see \u2018Prepare the sequences\u2019 section above).<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>Argument 6:<\/strong> the maximum number of allowed Ns between two elements. It must be an integer greater or equal to 0 (default: 0).<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 10pt;\"><em>Output files:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>*_genomeIntersec_start_GenomeCluster.txt:<\/strong> there are as many of these files as chromosomes in the assembly. They are tabular files with the columns described below.<\/span><\/li>\n<\/ul>\n<table class=\" aligncenter\" style=\"height: 385px; width: 519px;\">\n<tbody>\n<tr style=\"height: 48px;\">\n<td style=\"width: 88px; text-align: center; height: 48px;\"><span style=\"font-size: 10pt;\">\u00a0<strong>Columns in outfile<\/strong><\/span><\/td>\n<td style=\"width: 408.5px; text-align: center; height: 48px;\"><strong><span style=\"font-size: 10pt;\">\u00a0Description<\/span><\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 88px; height: 24px;\"><span style=\"font-size: 10pt;\">Chrom<\/span><\/td>\n<td style=\"width: 408.5px; height: 24px;\"><span style=\"font-size: 10pt;\">Chromosome where the local cluster belongs.<\/span><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 88px; height: 23px;\"><span style=\"font-size: 10pt;\">From\u00a0<\/span><\/td>\n<td style=\"width: 408.5px; height: 23px;\"><span style=\"font-size: 10pt;\">Start chromosome coordinate of the local cluster.<\/span><\/td>\n<\/tr>\n<tr style=\"height: 19px;\">\n<td style=\"width: 88px; height: 19px;\"><span style=\"font-size: 10pt;\">To\u00a0<\/span><\/td>\n<td style=\"width: 408.5px; height: 19px;\"><span style=\"font-size: 10pt;\">End chromosome coordinate of the local cluster.\u00a0<\/span><\/td>\n<\/tr>\n<tr style=\"height: 3px;\">\n<td style=\"width: 88px; height: 3px;\"><span style=\"font-size: 10pt;\">Length\u00a0<\/span><\/td>\n<td style=\"width: 408.5px; height: 3px;\"><span style=\"font-size: 10pt;\">Length of the local cluster.\u00a0<\/span><\/td>\n<\/tr>\n<tr style=\"height: 17px;\">\n<td style=\"width: 88px; height: 17px;\"><span style=\"font-size: 10pt;\">Count\u00a0<\/span><\/td>\n<td style=\"width: 408.5px; height: 17px;\"><span style=\"font-size: 10pt;\">Number of genome elements within the local cluster.\u00a0<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 88px; height: 24px;\"><span style=\"font-size: 10pt;\">PValue\u00a0<\/span><\/td>\n<td style=\"width: 408.5px; height: 24px;\"><span style=\"font-size: 10pt;\">P-value of the local cluster.\u00a0<\/span><\/td>\n<\/tr>\n<tr style=\"height: 12.9141px;\">\n<td style=\"width: 88px; height: 12.9141px;\"><span style=\"font-size: 10pt;\">logPValue<\/span><\/td>\n<td style=\"width: 408.5px; height: 12.9141px;\"><span style=\"font-size: 10pt;\">Decimal logarithm of the p-value\u00a0<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<pre style=\"text-align: center;\"><code>perl GenomeCluster.pl start \/home\/gcluster\/results\/hg38_CG\/cluster.txt \u00a0gi 1E-5 \/home\/gcluster\/sequences\/hg38_canonical.N 0<\/code><\/pre>\n<h1><span style=\"font-size: 16pt; color: #003366;\"><strong>Determine the methylation of CpG Islands: <\/strong><\/span><span style=\"font-size: 14pt; color: #003366;\">NGSmethDB_API_client.py<\/span><\/h1>\n<p>In order to calculate differentially methylation CGIs, we will have to obtain first the methylation values for the CpG islands from our <a href=\"http:\/\/bioinfo2.ugr.es\/NGSmethDB\/\">database<\/a> and second, compare the methylation values between two samples. To obtain the methylomes, we will use NGSmethDB_API_client.py executable.<\/p>\n<p><span style=\"font-size: 10pt;\"><em>Input parameters:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>-i &lt;input bed file&gt;:<\/strong> in general, BED3 files will be accepted (only the first 3 columns will be considered), including the \u2018cluster.txt\u2019 output files generated by gCluster<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-o &lt;output folder&gt;:<\/strong> output folder<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 10pt;\"><em>Output files:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>&lt;sample&gt;.CG.meth.tsv:<\/strong> Contains the CG methylation data for all regions defined in the bed data input file. There is one &lt;sample&gt;.CG.meth.tsv file for each selected sample. Each line corresponds to one region.<\/span><\/li>\n<\/ul>\n<table class=\" aligncenter\" style=\"width: 558px; height: 636px;\">\n<tbody>\n<tr style=\"height: 48px;\">\n<td style=\"width: 97px; text-align: center; height: 48px;\"><strong>Columns in outfile<\/strong><\/td>\n<td style=\"width: 422px; text-align: center; height: 48px;\"><strong>Description<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">ID<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">chrom_start_end<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">refPatt<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Number of CpGs within the region in the reference genome<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">indPatt<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Number of CpGs in the region for the genotype of the sample<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">meanMR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Mean of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">sdMR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Standard Deviation of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">p10MR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Percentile 10 of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">q1MR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Quartil 1 of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">q2MR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Median of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">q3MR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Quartil 3 of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">p90MR<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Percentile 90 of CpG methylation ratio distribution<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 97px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">totalMC<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Total number of methylcytosines in the region<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24.7891px;\">\n<td style=\"width: 97px; text-align: left; height: 24.7891px;\"><span style=\"font-size: 10pt;\">totalC<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24.7891px;\"><span style=\"font-size: 10pt;\">Total number of mapped reads within the region<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>&lt;sample&gt;.CHG.meth.tsv (plants only<\/strong>): Same as above, but for CHG.<\/span><\/li>\n<\/ul>\n<pre style=\"text-align: center;\"><code>python3 NGSmethDB_API_client.py -i \/home\/gcluster\/results\/hg38_CG\/cluster.txt \u00a0-o \/home\/gcluster\/results\/methylationData<\/code><\/pre>\n<h1><span style=\"font-size: 18pt; color: #003366;\">Calculate differential methylation: <\/span><span style=\"font-size: 14pt; color: #003366;\">calcDMIs.py<\/span><\/h1>\n<p>Takes the output files from NGSmethDB_API_client.py for two samples and calculates statistically significant differentially methylated CpG islands between them. The statistical significance is assessed by means of a Fisher exact test.<\/p>\n<p><span style=\"font-size: 10pt;\"><em>Input parameters:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>-a &lt;sample1 file&gt; and \u2013b &lt;sample2 file&gt;: <\/strong>output methylation files from NGSmethDB_API_client.py<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-o &lt;outfile&gt;:<\/strong> output file. Default is \u2018outputDMIs.txt\u2019<\/span><\/li>\n<li><span style=\"font-size: 10pt;\"><strong>-p &lt;pvalue&gt;:<\/strong> p-value threshold for DMIs. Default is 1E-5.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 10pt;\"><em>Output files:<\/em><\/span><\/p>\n<ul>\n<li><span style=\"font-size: 10pt;\"><strong>&lt;outfile&gt;: <\/strong>a file containing the differentially methylated CpG islands.<\/span><\/li>\n<\/ul>\n<table class=\" aligncenter\" style=\"width: 559px; height: 549px;\">\n<tbody>\n<tr style=\"height: 48px;\">\n<td style=\"width: 103.5px; text-align: center; height: 48px;\"><strong>Columns in outfile<\/strong><\/td>\n<td style=\"width: 422px; text-align: center; height: 48px;\"><strong>Description<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">ID<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">chrom_start_end<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">mC_Sample1<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Number of methylcytosines for each cluster in sample 1<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">reads_Sample1<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Number of reads for each cluster in sample 1<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">mRatio_Sample1<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">methRatio (mC_Sample1\/reads_Sample1) for each cluster in sample 1<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 13.3333px;\">mC_Sample2<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Number of methylcytosines for each cluster in sample 2<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 13.3333px;\">reads_Sample2<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Number of reads for each cluster in sample 2<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 13.3333px;\">mRatio_Sample2<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">methRatio (mC_Sample2\/reads_Sample2) for each cluster in sample 2<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 13.3333px;\">diffMeth<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Absolute difference between methRatio_Sample1 and methRatio_Sample2<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 103.5px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">pvalue<\/span><\/td>\n<td style=\"width: 422px; text-align: left; height: 24px;\"><span style=\"font-size: 10pt;\">Statistical significance assessed by means of the Fisher exact test<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<pre style=\"text-align: center;\"><code>python calc_DMIs.py \u2013a home\/gcluster\/results\/methylationData\/STL003.gastric.CG.meth.tsv -b \/home\/gcluster\/results\/methylationData\/STL003.pancreas.CG.meth.tsv -o \/home\/gcluster\/results\/methylationData\/STL003.gastric.STL003.pancreas.DMIs.txt \u00a0-p 1E-5F<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Download in pdf Preparation of the protocol You\u00a0can follow the protocol in two ways: Install\u00a0the executables files. Use gClusterVM, a virtual machine with all that you need. Both options work on most of operating system (Windows, Mac OS X, Linux &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"more-link\" href=\"https:\/\/bioinfo2.ugr.es\/gCluster\/manual-2\/\"> <span class=\"screen-reader-text\">gCluster Manual<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/pages\/2348"}],"collection":[{"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/comments?post=2348"}],"version-history":[{"count":1,"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/pages\/2348\/revisions"}],"predecessor-version":[{"id":2349,"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/pages\/2348\/revisions\/2349"}],"wp:attachment":[{"href":"https:\/\/bioinfo2.ugr.es\/gCluster\/wp-json\/wp\/v2\/media?parent=2348"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}