DROMPA Manual

See also the reference of DROMPA.

Install

DROMPA and its associated program parse2wig are written in ANSI-C, and can be executed on a Linux OS.
DROMPA and parse2wig require programs and libraries below: These are all freely available. For example, on the Ubuntu OS workstation, these programs can be installed using the apt-get command as follows:
  % sudo apt-get install gcc
  % sudo apt-get install libgtk2.0-dev
  % sudo apt-get install pdftk

To build DROMPA from the source, first download the archive file and extract it with:

  % tar -xvzf drompa-1.*.*.tar.gz

Then, go to the new directory and type:
  % make

Add the above directory to your PATH environment variable. If you download DROMPA in $HOME/my_chipseq_exp directory, type:
  % export PATH = $PATH:$HOME/my_chipseq_exp/drompa_1.*.*

Annotation data

DROMPA can include several kinds of genomic annotations downloaded from public database listed below.
The annotation data files can be downloaded from this site, and latest version can be downloaded from original websites.

Several versions of genome build often exist for a given species, e.g., for the human genome, genome build 18 and 19 are usually used.
The same genome build should be used for genome sequence and annotation data.

parse2wig: make wig files

The input data of DROMPA is SAM or Bowtie formatted map file.
Before running DROMPA, each map file to be analyzed should be preprocessed by parse2wig. Parse2wig converts a map file into wig files.
For example, the command:
  % parse2wig -i ChIP.bowtie -o ChIP -binsize 10 -F3 50 -gt genome_table-hg19.txt

makes wig files for ChIP.bowtie with bin size of 10 bp.
When converting paired-end file, supply "-pair" option:
  % parse2wig -i ChIP.bowtie -o ChIP -binsize 10 -F3 50 -gt genome_table-hg19.txt -pair

The wig files for chromosomes are created in the directory ./(output directory)/ (default: parse2wigdir).
Parse2wig can output wig files in binary (.bin), compressed (.wig.gz) and uncompressed (.wig) format.
Compressed wig files generated by parse2wig can be used for other visualizing programs and browser.
When using DROMPA with different bin size, make wig files for the bin size.

Options

DROMPA: peak-calling and visualization

After making wig files, you can invoke DROMPA as follows:
  % drompa -i parse2wigdir/ChIP -w parse2wigdir/Input -p ChIP -optype3 -gt genome_table-hg19.txt

Then the peak-list file "ChIP.xls" is generated. This list is output in a tab-delimited text file that can be handled by a text editor or Microsoft Excel.

Options

Examples

Genome mapping

Map the read file of each sample onto the reference genome (human build hg19) with a mapping tool such as Bowtie:
  % bowtie UCSC-hg19 ChIPsample1.fastq -n3 -m1 > ChIPsample1-n3-m1-hg19.bowtie
  % bowtie UCSC-hg19 ChIPsample2.fastq -n3 -m1 > ChIPsample2-n3-m1-hg19.bowtie
  ....
  % bowtie UCSC-hg19 ChIPsampleN.fastq -n3 -m1 > ChIPsampleN-n3-m1-hg19.bowtie
  % bowtie UCSC-hg19 WCEsample.fastq -n3 -m1 > WCEsample-n3-m1-hg19.bowtie

Making wig files

Convert each map file into wig files with parse2wig:
  % parse2wig -i ChIPsample1-n3-m1-hg19.bowtie -o ChIPsample1-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
  % parse2wig -i ChIPsample2-n3-m1-hg19.bowtie -o ChIPsample2-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
  ....
  % parse2wig -i ChIPsampleN-n3-m1-hg19.bowtie -o ChIPsampleN-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
  % parse2wig -i WCEsample-n3-m1-hg19.bowtie -o WCEsample-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
The output directory "parse2wigdir" (default name) is created automatically, and the chromosome-separated wig data are placed in that directory.
In the default mode, binary wig files are generated. Use the "-if1" option if compressed wig files are desired, e.g. uploading the wig data to some browsers.

Peak-calling

Obtain peak list of ChIPsample1 with RPKM threshold 10 and binsize 10 bp:
  % drompa -i parse2wigdir/ChIPsample1-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -p ChIPsample1-n3-m1-hg19-10 -optype3 -gt genome_table-hg19.txt -ipm 10 -binsize 10

Obtain peak list of ChIPsample1 without smoothing and RPKM threshold 30:
  % drompa -i parse2wigdir/ChIPsample1-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -p ChIPsample1-n3-m1-hg19-10 -optype3 -gt genome_table-hg19.txt -ipm 30 -nosm

Making pdf files

For small genome such as yeast

For species that have small genome, due to enough sequencing depth (>10 fold), the statistical reliability at every genomic position is large in each case.
Therefore we recommend the protein-binding profile map (the ratio of sequence reads from the ChIP fraction and the WCE fraction) which reflects the actual protein-binding probability at every position in the genome.
To make a pdf file of protein-binding profile map, type:
  % drompa -g_scer gene_scer.txt $s1 $s2 ... $sN -p ChIPseq_yeast-n3-k1-10 -LPP3 -binsize 10 -showratio -notag -LS50 -optype2 -gt genome_table_scer -nosig -scale_ratio 2 -rmchr

It is possible to change the scale of each ratio line with "-scr" option. If you focus replication origin of yeast, this parameter set is appropriate:
  % drompa -showars -ars ARS-oriDB_scer.txt $s1 $s2 ... $sN -p ARS$postfix -gt genome_table_scer -optype2 -LS100 -LPP2 -binsize 10 -notag -showratio -scale_ratio 1 -bn 3 -ystep 14 -rmchr -divide -ethre 2 -ethre2 1.5 -ipm 1

For large genome such as human

For species that have large genome, we recommend to visualize the ChIP-read distribution map.
Obtain pdf files (chromosome-separated files and whole-genome file) of all ChIP samples with binsize 10 bp and 1-Mbp (1000-kbp) length for one line:
  % s1="-i parse2wigdir/ChIPsample1-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -name ChIPsample1"
  % s2="-i parse2wigdir/ChIPsample2-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -name ChIPsample2"
  ....
  % sN="-i parse2wigdir/ChIPsampleN-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -name ChIPsampleN"
  % drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p ChIPseq-n3-m1-hg19-10 -optype2 -gt genome_table-hg19.txt -LS1000 -binsize 10

It is possible to change the scale of each read line with "-sct" option.
To make the figure focusing some region (here MACF1 gene region), make "MACF1.txt", tab-separated region file and supply the option "-r MACF1.txt" as follows:
  % echo "chr1	39500000	40000000" > MACF1.txt
  % drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p MACF1-n3-m1-hg19-10 -optype2 -gt genome_table-hg19.txt -r MACF1.txt -LS300 -binsize 10

When the user wants to include annotation data with BED format, e.g., known enhancer regions (here called genhancer_hg19.txth), use the "-bed" and "-bedname" options as follows (Sample):
  % drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p MACF1-n3-m1-hg19-10 -optype2 -gt genome_table-hg19.txt -r MACF1.txt -LS300 -binsize 10 -bed enhancer_hg19.txt -bedname enhancer

To identify broadly enriched region, use 1-kbp bin size and modified parameter set (Sample), e.g. as follows:
  % drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p ChIPseq_broad-n3-m1-hg19 -optype2 -gt genome_table-hg19.txt -LS2000 -scale_tag 25 -binsize 1000 -sw 2000 -ethre 2.0

To make a chromosome-wide overview of the ChIP-seq data (Sample), specify the "-wg" option and use a larger bin size (e.g., 100 kbp). Type:
  % drompa -wg $s1 $s2 ... $sN -p ChIPseq-wholegenome-n3-m1-hg19 -binsize 100000 -optype2 -gt genome_table-hg19.txt -showratio -scale_ratio 1 -graph GCcontents -gcsize 500000 -notag

For chromosome-wide figure, we generally set bin size 100-kbp.
When specifying the "-wg" option, DROMPA does not perform the significance test but simply highlights the bins containing ChIP/WCE enrichments above the y axis (the value specified with option "-scale_ratio") in red.

Making averaged read distribution (require R)

Optionally, DROMPA can show the averaged reads distribution around transcription start sites (TSS), transcription termination sites (TTS), gene bodies and specified peak list by supplying "-profile" option. For instance, to show read density around TSS (Sample), type as follows:
  % drompa -profile1 -g_ref refFlat.txt $s1 $s2 ... $sN -p aroundTSS -gt genome_table-hg19.txt

Then DROMPA outputs a script for R. Do the R command:
  % R --vanilla ƒ aroundTSS.R

Then the pdf file "aroundTSS.pdf" is output.

Ryuichiro Nakato
(rnakato AT iam.u-tokyo.ac.jp)