miRanalyzer standalone version


April 26, 2011

Contents

1 Introduction
2 Downloads
3 Installation
 3.1 General structure of the database
 3.2 Populate the database
  3.2.1 Bowtie indexes
  3.2.2 Genome sequence data
  3.2.3 Models
4 Using miRanalyzer
 4.1 Mandatory parameters
 4.2 Optional parameters
 4.3 An example
5 miRanalyzer output
 5.1 *_unique.txt:
 5.2 *_ambig.txt
 5.3 *_reads.txt
 5.4 *_parsed.txt
 5.5 *_grouped.txt
 5.6 newMicroRNA.txt & newMicroRNA_pure.txt
 5.7 Candidates.txt and Candidates_pure.txt

1 Introduction

The miRanalyzer standalone version requires the following third party software that needs to be installed first:

2 Downloads

In this section the available programs and perl scripts can be downloaded. Please read next section on how to install

3 Installation

miRanalyzer relies on a huge number of libraries like mature microRNAs, chromosome sequences, bowtie indexes, etc. These data needs to be stored in a local file-base database. Before using miRanalyzer, this database needs to be generated.

3.1 General structure of the database

The easiest way to generate the basic structure of the database is to download the miRanalyzer start-up DB following this steps:

Now you will see a directory named miRanalyzerDB which is the base directory . Inside the base directory you will find 4 directories, the miRanalyzer jar file, the makeSeqObj jar file, the fastq convertion perl script and a example input file from Drosophila melanogaster. Please note that all folders must have exactly the names as used in this manual, otherwise miRanalyzer can not access the data. The folders are the following:

The bowtie folder needs several subfolders which need to be named exactly as follows:

3.2 Populate the database

3.2.1 Bowtie indexes

The bowtie indexes for the miRBase can be downloaded above. Please, note that the mature microRNA sequences are to short to work with normal read lengths. Therefore 25 Gs are added in order to be able to use Bowtie. Please, make sure using the latest version of miRBase indexes provided here, or take into account the 25 Gs added when preparing your own microRNA references. Please, use the bowtie-built algorithm or download the bowtie indexes from the bowtie pagefor the genome assemblies and transcribed libraries.

3.2.2 Genome sequence data

For the prediction of new microRNAs, miRanalyzer needs to extract sequences from the genome assemblies. In order to facilitate a rapid access to this information, miRanalyzer uses preprocessed fasta files. Table 1 shows the assemblies which are currently available and provides the download links. The user can generate these files for other species or unpuplished assemblies using the makeSeqObj Java program. The program takes a (multi)fasta as input and generates the corresponding zip file.





Species (short name)

Assembly/Version

Database

Download link





Homo sapiens (hsa)

hg18,hg19

UCSC

hg18, hg19





Mus musculus (mmu)

mm8,mm9

UCSC

mm8, mm9





Ratus norvegicus (rno)

rn4

UCSC

rn4





Pan troglodytes (ptr)

panTro2

UCSC

pantro2





Macaca mulatta (rma)

rheMac2

UCSC

rhemac2





Bos taurus (bta)

bosTau4

UCSC

bostau4





Canis familiaris (cfa)

canFam2

UCSC

canfam2





Gallus gallus (gga)

galGal3

UCSC

galgal3





Gasterosteus aculeatus (gac)

gasAcu1

UCSC

gasacu1





Xenopus tropicalis (xtr)

xenTro2

UCSC

xentro2





Danio rerio (dre)

danRer6

UCSC

danrer6





Taeniopygia guttata (tgu)

taeGut1

UCSC

taegut1





Tetraodon nigroviridis (tni)

tetNig2

UCSC

tetnig2





Monodelphis domestica (mdo)

monDom5

UCSC

mondom5





Anopleles gambiae (aga)

anoGam1

UCSC

anogam1





Apis mellifera (ame)

apiMel3

UCSC

apimel3





Drosophila melanogaster (dme)

dm3

UCSC

dm3





Caenorhabditis elegans (cel)

ce6

UCSC

ce6





Bombyx mori (bmo)

bm2 (silkworm_genome_v2.0)

SilkDB

bm2





Pea Aphid (pap)

peaAph2

peaAph2

peaaph2





Arabidopsis thaliana (ath)

tair9

TAIR

tair9





Zea mays (zma)

zm1 (ZmB73_AGPv1_genome)

PlantGDB

zm1





Vitis vinifera (vvi)

vv12x (Genoscope 12x)

PlantGDB

vv12x





Oryza sativa (osa)

OSgenomeV6.1 (osa6)

PlantGDB

osv61





Medicago truncatula (mtr)

mt3

M. truncatula Genome Project

mt3





3.2.3 Models

The latest models for animal and plant prediction of new microRNAs can be found here (if not downloaded along with the start-up DB). They must be placed in the “models” folder within the miRanalyzer directory.

4 Using miRanalyzer

Once the database is generated, miRanalyzer can be used. In general, the command line parameters must be given in the following format: parameterName=value. The program uses the following input parameters:

4.1 Mandatory parameters

4.2 Optional parameters

4.3 An example

This would be an example to launch the example data file provided with the start-up db. Note that you need to adapt the dbPath and bowtiePath to your local values. For very big input files, it might be necessary to increase the memory (-Xmx option of Java VM).

java -Xmx2000m -jar miRanalyzer.jar input=SRR069503.rc dbPath=/home/user/miRanalyzer species=dm3 speciesShort=dme kingdom=animal bowtiePath=/usr/local/bin/bowtie64 translibs=rfam:RFam:15:5

5 miRanalyzer output

miRanalyzer produces several different of output files. In the following * can be mature, maturestar, matureunobs, hairpin or any of the transcribed libraries.

5.1 *_unique.txt:

A summary of the reference sequences which have been uniquely matched. That means all reads just map to this and not any other reference sequence. The follwing columns exist:

5.2 *_ambig.txt

Sometimes a read maps with the same quality (number of mismatches and length) to more than one reference sequence. In such cases, miRanalyzer groups together those reference sequences and reports them as ambigous matches. We do so, as sometimes the members of a family have very similar sequences and cannot be readily distinguished. However, by means of this ambigous matches at least the microRNA family can be detected. The format of the file is the same as the “*_unique” file.

5.3 *_reads.txt

A summary on the reads level.

5.4 *_parsed.txt

The parsed bowtie output for the library. It holds just the longest alignments with the lowest number of mismatches (see Bowtie manual for more details).

5.5 *_grouped.txt

All reference sequences mapped by at least one read (no distinction between unique and ambigous matches!).

5.6 newMicroRNA.txt & newMicroRNA_pure.txt

The “newMicroRNA.txt” and “newMicroRNA_pure.txt” files hold a summary on the newly predicted microRNAs. Depending on the configuration, some reads which mapped to the transcriptome or RFam are also used for the prediction of new microRNAs. Those reads migth be more prone to cause false positive predictions and therefore, in the file “newMicroRNA_pure.txt” only those microRNAs with previously unmapped reads are reported. The columns of the files are:

5.7 Candidates.txt and Candidates_pure.txt

“Candidates.txt” and “Candidates_pure.txt” files contain additional information on the newly predicted microRNAs. The first column is the name of the candidate. Then follows a string which holdsseparated by “;”: the precursor sequence, the secondary structure, the mean free energy of the structure, the chromosome, chromosomic coordinate start, chromosomic coordinate end, the number of times it was predicted and the reads which compose this new microRNA. The different reads are separated by ’:’. The information for the different reads is the following (separated by ’@’): read sequence, position in precursor (0-based), read count and aligment length.