miRanalyzer manual

Table of Contents

1 Release notes

1.1 miRanalyzer 0.3: 28/02/2012

1.1.1 New features

1.1.2 Changes

1.1.3 Bug fixes

2 Introduction

miRanalyzer is a free web-server tool and standalone application for processing small-RNA data obtained from next generation sequencing platforms such as Illumina or SOLiD. The tool requires unique reads in read-count format (i.e., a list of sequences together with the number of times each has been sequenced in the experiment) or fasta format (both redundant or non-redundant is possible) which can be sequence space (Illumina) or color space (SOLiD). The standalone version accepts also fastq input files (the extension needs to be ’fastq’ & an adapter sequence must be provided). The main features of miRanalyzer are:

3 Work flow

3.1 Analysis steps

3.2 IsomiR classification

The following isomiR classes are defined
This classification schema is non redundant, i.e. a read can be member of only one class. Some reads however will have both, variation in length and nucleotide substitutions. In the above classification, the length variants is given precedence. Therefore, apart from this classification, we also report separately the total number of nucleotide substitutions including those that occur within a length variant (’All substitutions’ row in the figure below).
figure pics/classi.jpg

3.3 Classification of novel microRNAs

The predicted microRNAs are classified into four groups based on the secondary structure and the read alignments. The most expressed read is defined as the putative mature microRNA sequence.

4 How to use the web server

4.1 Input formats

miRanalyzer requires a single file containing the unique reads and their counts. The application accepts two different input formats:

4.1.1 A tab or space separated file as in the following example (read-count format)

GAGGTAGTAGGTTGTA 49862
ACCCGTAGAACCGACC 15490
... ...
GGAGCATCTCTCGGTC 13762

4.1.2 A multifasta file

>ID1 49862
GAGGTAGTAGGTTGTA
>ID2 15490
ACCCGTAGAACCGACC
....
>ID 13762
GGAGCATCTCTCGGTC
The description field must hold the read count. If not set, it is supposed to be 1. The file must have extension ’fa’, ’fasta’ or ’mfa’.
A Perl script to generate the tab-separated input format from sequence or color space fastq files can be downloaded here. Launching the script without options displays the help page.

4.2 Parameters

Apart from the input file in read-count or multifasta format the following parameters/options do exist

5 Differential expression

First, all individual samples must be processed with miRanalyzer. The miRanalyzer jobIDs are then used to indicate for which samples the differential expression analysis should be carried out. This module uses DESeq and R/bioconductor to calculate the statistically significance. The module will report 3 output files. Two of them are directly derived from DESeq (see here and here for interpretation). The third file reports basically the consensus sequences of the predicted microRNAs based on the sequences of all used samples. The file has the following columns:

6 Output

The main output page of the miRanalyzer shows the summary box with the current status of the job (pending, running, or finished), the summary of the parameters used and a brief summary of the results, and the four main results boxes:

6.1 pre-microRNA mapping, isomiR analysis and detection of novel mature* sequences

There are two summary boxes, one for the mappings to the pre-microRNAs and one for the isomiR analysis

6.1.1 pre-microRNA mapping summary

The mapping to the pre-microRNA sequences (hairpin) from mirBase
figure pics/preMicroRNA.jpg
The columns are:
There are 4 links to detailed output files:
pre-microRNA: Mappings To Pre-microRNAs
figure pics/preMicroRNA_detail.jpg
The pre-microRNA detail page has the following columns:
isomiR details mature: the read counts of all mature microRNA isomiR variants
isomiR details mature*: the read counts of all mature* microRNA isomiR variants
figure pics/isomiR_detail.jpg
the isomiR output (same format for mature and mature*) has the following columns:
novel mature*: detected mature* sequences that are not in miRBase
figure pics/novelMicroRNA.jpg

6.1.2 IsomiR summary

Summary of the isomiR analysis with the following columns:
figure pics/isomiR.jpg

6.2 Mapping to known microRNAs

This box summarizes the mappings to known mature and pre-microRNAs, and the mappings to all other (putative homologous) microRNAs (from all other species). The alignments to each library is divided into unique and ambiguous mappings.
figure pics/mature.jpg
The summary box has the following rows:
Detailed output pages
figure pics/mature_detail.jpg
The detailed output page for known microRNAs has the following columns

6.3 Mapping to transcribed libraries

figure pics/transLibs.jpg
The summary box gives the following information:
Detailed pages:

6.4 Predicted candidate microRNAs

figure pics/novel.jpg
The columns are:
the details pages have the following format:
figure pics/novelDetail.jpg