Quick start

Quick start is thought to try MethFlow without going into details and using the MethFlow virtual machine. For further information, see the reference manual.

Install MethFlowVM

  1. Install VirtualBox.
  2. Install VirtualBox Extension Pack.
  3. Download MethFlowVM (mirror).
  4. Import MethFlowVM to VirtualBox by double-clicking.
  5. Optional: add a shared folder (strongly recommended).
  6. Run MethFlowVM

The local database

Set your working folder

At first startup, you will be asked which working folder you want to use. If you ignore this question, your home folder, /home/methflow, will be used as working folder.

If you want to change the working folder, open a terminal and type the following command:

MethFlow_manager working_folder

Set your assembly collection

Tell MethFlow where the assembly collection is by typing:

MethFlow_manager assembly_collection Assemblies

This command looks for a folder named Assemblies inside the working folder.

Set your adapter collection

Tell MethFlow where the adapter collection is by typing:

MethFlow_manager adapter_collection Adapters

This command looks for a folder named Adapters inside the working folder.

Set your root input folder

Tell MethFlow where to look for the input folders:

MethFlow_manager root_input_folder Inputs

This command looks for a folder named Inputs inside the working folder.

Set your root output folder

Tell MethFlow where to kept the output folders:

MethFlow_manager root_output_folder Outputs

This command creates a folder named Outputs inside the working folder.

Get test datasets

Open a terminal and type the following command to get test datasets:

MethFlow_manager get_test_datasets

Test datasets are then downloaded and unpacked.

These test datasets contain:

  • The collection of adapters of Trimmomatic. It goes to adapter collection.
  • A small assembly (chromosomes 12 and 19 of hg38). It goes to assemby collection.
  • Data from nine samples (from three individuals and three tissues). It goes to root input folder.

Launch MethFlow

Now, launch MethFlow with default options:

MethFlow

This command looks inside the working folder and asks you for:

  1. Assembly 1 folder. This folder should contain FASTA or multiFASTA files and must be inside assembly collection folder. Optionally, it could contain Bismark Bowtie2 indexes.
  2. Adapter file. This file must be a multiFASTA file inside adapter collection.
  3. Input data folder(s). Each of these folders should contain all the input datasets of a sample in SRA, FASTQ, SAM or BAM format (all files must be in the same format) and must be inside the root input folder.

If you want to use a second assembly, launch MethFlow as follow:

MethFlow ‐‐assembly2

In this case, MethFlow asks you for:

  1. Assembly 1 folder. This folder should contain FASTA or multiFASTA files and must be inside assembly collection folder. Optionally, it could contain Bismark Bowtie2 indexes.
  2. Assembly 2 folder. This folder should contain FASTA or multiFASTA files and must be inside assembly collection folder. Optionally, it could contain Bismark Bowtie2 indexes.
  3. What type of reads you want to use against the assembly 2: multiple-mapped reads, unmapped reads or both.
  4. Adapter file. This file must be a multiFASTA file inside adapter collection.
  5. Input data folder(s). Each of these folders should contain all the input datasets of a sample in SRA, FASTQ, SAM or BAM format (all files must be in the same format) and must be inside the root input folder.

Additionally, use the options ‐‐enable_api and ‐‐enable_diffmeth to activate NGSmethDB API client and differential methylation analysis functionalities, respectively. MethFlow asks you what samples you want to download from the NGSmethDB and what samples compare in the differential methylation analysis.

A look at the output

  • Intermediates folder: all the intermediate files generated during the analysis. It contains a folder for each analyzed sample.
  • Plots folder: plots generated by FastQC, BSeQC and methylKit. It contains a folder for each analyzed sample.
  • Meth folder: all methylation maps calculated by MethFlow. It contains a folder for each analyzed sample.

If you use the NGSmethDB API client functionality, each of the downloaded methylation maps will be stored in a folder inside the Meth folder.

If you use the differential methylation analysis functionality, you will have an additional output folder:

  • Diffmeth folder: all differential methylation maps calculated by MethFlow. It contains a folder for each pair of samples used in the differential methylation analysis.