The input data of DROMPA is SAM or Bowtie formatted map file.
Before running DROMPA, each map file to be analyzed should be preprocessed by parse2wig.
Parse2wig converts a map file into wig files.
For example, the command:
% parse2wig -i ChIP.bowtie -o ChIP -binsize 10 -F3 50 -gt genome_table-hg19.txt
makes wig files for ChIP.bowtie with bin size of 10 bp.
When converting paired-end file, supply "-pair" option:
% parse2wig -i ChIP.bowtie -o ChIP -binsize 10 -F3 50 -gt genome_table-hg19.txt -pair
The wig files for chromosomes are created in the directory ./(output directory)/ (default: parse2wigdir).
Parse2wig can output wig files in binary (.bin), compressed (.wig.gz) and uncompressed (.wig) format.
Compressed wig files generated by parse2wig can be used for other visualizing programs and browser.
When using DROMPA with different bin size, make wig files for the bin size.
Options
file formats: -bowtie / -sam (default:bowtie)
-pair: for paired-end file
-odir: output directory name (default:parse2wig)
-binsize: bin size (default: 10bp)
-tf: PCRbias threshold (default: over max(1, 10*exp(r)) reads)
-F3: forward read length (default: 50)
-flen: expected flagment length (default: 150)
-of: 0;binary file, 1;compressed wig file (.wig.gz), 2;uncompressed wig file (.wig) (default: 0)
-norm: 0;do not normlize 1;normlize for genome (RPM) 2;normlize for chromosome (RPKM) (default:2)
-nofilter: do not filter PCR bias
After making wig files, you can invoke DROMPA as follows:
% drompa -i parse2wigdir/ChIP -w parse2wigdir/Input -p ChIP -optype3 -gt genome_table-hg19.txt
Then the peak-list file "ChIP.xls" is generated. This list is output in a tab-delimited text file that can be handled by a text editor or Microsoft Excel.
Options
-binsize : bin size (default: 10)
-bs : change bin size of each sample
-nosm: do not smooth reads distribution
-sw : smoothing width (default: 500bp)
-optype: output type 1:all (default) 2:only figure 3:only peak file
Thresholds
-cont : bin number of each window (default: 30 bins)
-ipm : maximum read intensity normalized in each ChIP bin (default: 6)
-pthre : p-value threshold (default: 1e-4)
-ethre : threshold of IP/WCE enrichment (default: 3.0)
-ethre2 : second threshold of IP/WCE enrichment (default: 1.5)
-ithre : threshold of IP peak/IP genome average (default: 3.0)
-nosig: do not implement significant test (default: off)
show line
-showratio: display enrichment ratio in liner scale (default: off)
-showlogratio: display enrichment ratio in log10 scale (default: off)
-notag: do not display ChIP and WCE tag line (default: off)
-wtag: display all WCE read lines (default: off)
-wtag1: display first WCE read line only (default: off)
Drawing parameters
-r : specify the regions to visualize
-LS: length per one line (kb) (default: 100 kb)
-ystep : height of tag line (default: 20)
-bn : number of separations for y-axis (default: 2)
-chr: output only a chromosome specified
-LPP: number of lines per one page (kb) (default: 3)
-scale_tag : scale of tag line
-scale_ratio : scale of ratio line
-scale_pvalue : scale of p_value line
-wg: whole-genome view
-overlook: overlook bed files
Showing profile
-profile: (1; around TSS, 2; around TES, 3; divide gene into 100 subregions 4; around BED sites)
-cw : width from the center (default: 5 kb)
Annotations
-g_ref : RefSeq gene list
-g_ens : Ensembl gene list
-g_scer : S.cerevisiae gene list
-g_spom : S.pombe gene list
-repeat : Repeat list
-ars : ARS list (for S.cerevisiae)
-showars: display ARS only (do not display genes)
-ter : TER list (for S.cerevisiae)
-bed : specify the bedfile
-bedname : specify the name of bedfile -graph : show graph such as GCcontents (default: none)
-gcsize : resolution of GCcontents graph (default: 0)
Input and output
-rmchr: do not output each chromosome pdf file
-if: 0;binary file, 1;compressed wig file (.wig.gz), 2;uncompressed wig file (.wig) (default: 0)
-png: output with png format (Note: output file is not merged)
Genome mapping
Map the read file of each sample onto the reference genome (human build hg19) with a mapping tool such as Bowtie:
% bowtie UCSC-hg19 ChIPsample1.fastq -n3 -m1 > ChIPsample1-n3-m1-hg19.bowtie
% bowtie UCSC-hg19 ChIPsample2.fastq -n3 -m1 > ChIPsample2-n3-m1-hg19.bowtie
....
% bowtie UCSC-hg19 ChIPsampleN.fastq -n3 -m1 > ChIPsampleN-n3-m1-hg19.bowtie
% bowtie UCSC-hg19 WCEsample.fastq -n3 -m1 > WCEsample-n3-m1-hg19.bowtie
Making wig files
Convert each map file into wig files with parse2wig:
% parse2wig -i ChIPsample1-n3-m1-hg19.bowtie -o ChIPsample1-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
% parse2wig -i ChIPsample2-n3-m1-hg19.bowtie -o ChIPsample2-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
....
% parse2wig -i ChIPsampleN-n3-m1-hg19.bowtie -o ChIPsampleN-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
% parse2wig -i WCEsample-n3-m1-hg19.bowtie -o WCEsample-n3-m1-hg19 -binsize 10 -F3 50 -gt genome_table-hg19.txt
The output directory "parse2wigdir" (default name) is created automatically, and the chromosome-separated wig data are placed in that directory.
In the default mode, binary wig files are generated. Use the "-if1" option if compressed wig files are desired, e.g. uploading the wig data to some browsers.
Peak-calling
Obtain peak list of ChIPsample1 with RPKM threshold 10 and binsize 10 bp:
% drompa -i parse2wigdir/ChIPsample1-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -p ChIPsample1-n3-m1-hg19-10 -optype3 -gt genome_table-hg19.txt -ipm 10 -binsize 10
Obtain peak list of ChIPsample1 without smoothing and RPKM threshold 30:
% drompa -i parse2wigdir/ChIPsample1-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -p ChIPsample1-n3-m1-hg19-10 -optype3 -gt genome_table-hg19.txt -ipm 30 -nosm
Making pdf files
For small genome such as yeast
For species that have small genome, due to enough sequencing depth (>10 fold), the statistical reliability at every genomic position is large in each case.
Therefore we recommend the protein-binding profile map (the ratio of sequence reads from the ChIP fraction and the WCE fraction) which reflects the actual protein-binding probability at every position in the genome.
To make a pdf file of protein-binding profile map, type:
% drompa -g_scer gene_scer.txt $s1 $s2 ... $sN -p ChIPseq_yeast-n3-k1-10 -LPP3 -binsize 10 -showratio -notag -LS50 -optype2 -gt genome_table_scer -nosig -scale_ratio 2 -rmchr
It is possible to change the scale of each ratio line with "-scr" option.
If you focus replication origin of yeast, this parameter set is appropriate:
% drompa -showars -ars ARS-oriDB_scer.txt $s1 $s2 ... $sN -p ARS$postfix -gt genome_table_scer -optype2 -LS100 -LPP2 -binsize 10 -notag -showratio -scale_ratio 1 -bn 3 -ystep 14 -rmchr -divide -ethre 2 -ethre2 1.5 -ipm 1
For large genome such as human
For species that have large genome, we recommend to visualize the ChIP-read distribution map.
Obtain pdf files (chromosome-separated files and whole-genome file) of all ChIP samples with binsize 10 bp and 1-Mbp (1000-kbp) length for one line:
% s1="-i parse2wigdir/ChIPsample1-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -name ChIPsample1"
% s2="-i parse2wigdir/ChIPsample2-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -name ChIPsample2"
....
% sN="-i parse2wigdir/ChIPsampleN-n3-m1-hg19 -w parse2wigdir/WCEsample-n3-m1-hg19 -name ChIPsampleN"
% drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p ChIPseq-n3-m1-hg19-10 -optype2 -gt genome_table-hg19.txt -LS1000 -binsize 10
It is possible to change the scale of each read line with "-sct" option.
To make the figure focusing some region (here MACF1 gene region), make "MACF1.txt", tab-separated region file and supply the option "-r MACF1.txt" as follows:
% echo "chr1 39500000 40000000" > MACF1.txt
% drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p MACF1-n3-m1-hg19-10 -optype2 -gt genome_table-hg19.txt -r MACF1.txt -LS300 -binsize 10
When the user wants to include annotation data with BED format, e.g., known enhancer regions (here called genhancer_hg19.txth), use the "-bed" and "-bedname" options as follows (
Sample):
% drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p MACF1-n3-m1-hg19-10 -optype2 -gt genome_table-hg19.txt -r MACF1.txt -LS300 -binsize 10 -bed enhancer_hg19.txt -bedname enhancer
To identify broadly enriched region, use 1-kbp bin size and modified parameter set (
Sample), e.g. as follows:
% drompa -g_ref refFlat_hg19.txt $s1 $s2 ... $sN -p ChIPseq_broad-n3-m1-hg19 -optype2 -gt genome_table-hg19.txt -LS2000 -scale_tag 25 -binsize 1000 -sw 2000 -ethre 2.0
To make a chromosome-wide overview of the ChIP-seq data (
Sample), specify the "-wg" option and use a larger bin size (e.g., 100 kbp). Type:
% drompa -wg $s1 $s2 ... $sN -p ChIPseq-wholegenome-n3-m1-hg19 -binsize 100000 -optype2 -gt genome_table-hg19.txt -showratio -scale_ratio 1 -graph GCcontents -gcsize 500000 -notag
For chromosome-wide figure, we generally set bin size 100-kbp.
When specifying the "-wg" option, DROMPA does not perform the significance test but simply highlights the bins containing ChIP/WCE enrichments above the y axis (the value specified with option "-scale_ratio") in red.
Making averaged read distribution (require R)
Optionally, DROMPA can show the averaged reads distribution around transcription start sites (TSS), transcription termination sites (TTS), gene bodies and specified peak list by supplying "-profile" option. For instance, to show read density around TSS (
Sample), type as follows:
% drompa -profile1 -g_ref refFlat.txt $s1 $s2 ... $sN -p aroundTSS -gt genome_table-hg19.txt
Then DROMPA outputs a script for R. Do the R command:
% R --vanilla aroundTSS.R
Then the pdf file "aroundTSS.pdf" is output.
Ryuichiro Nakato (rnakato AT iam.u-tokyo.ac.jp)