JOBS mESC Library
TOP ↑

How ChIP-Array v2.0 works

Figure 1. ChIP-Array v2.0 workflow and results

 

Workflow

The main workflow of ChIP-Array 2 is shown in Figure 1A, ChIP-X and transcriptome data will be combined to detect the direct targets, using either the “direct” (intersection of DEGs and TFBS-enrich genes) or “Rank Product” method. Meanwhile, to detect TF-target relation involved in enhancer-promoter interaction, long-chromatin interaction data will be applied if available to consider the TSSs with peaks located in the distal interacting regions. Then if a TF appears in the direct targets, the same method with the same copy of transcriptome data will be applied to detect the indirect targets. For binding data, users can choose to use our curated ChIP-X data or putative TFBSs by motif scanning. If putative TFBSs are used, open chromatin regions and histone modifications will be consider to keep only the related TFBSs being involved in the target detection.

Motif scanning

If ChIP-X data are not available in detecting either direct or indirect targets, users are allowed to choose to use putative TFBSs by motif scanning in the conserved regions around TSSs (±100kb). The conservation is measured by the phastCons scores, which are downloaded from UCSC for all the species supported in ChIP-Array 2 except Arabidopsis. The conservation score of Arabidopsis is curated from http://gregorylab.bio.upenn.edu/ArabidopsisStructure/. The p value is calculated based on a null distribution of conservation scores, which are generated by randomly-picked sequences in intergenic regions. The motifs, presented in position weight matrices (PWMs), in our database are extended from 1,151 of 894 TFs to 6,584 of 4,481 TFs. The number of supported species is also increased from 5 to 7. MISP (http://www.bytebucket.org/hanfeisun/misp), which implements the algorithm from MOODS, is used to perform the motif scanning.

Target detection method

In our previous version, we use a direct way to detect target genes as the intersection set of TFBS-enriched genes and DEGs, with equal weight for each target. Additionally in the new version, we offer another method called “Rank Product”, based on the rank of the peak concentration and differential expression. The peak concentration is defined as follow:

,

where d0 is a constant distance (i.e. 10kb) around the TSS. This defines a region where the peaks are considered to contribute to the binding. di is the distance of the ith peak to the TSS in this region. When this distance increased, the contribution of this peak to the peak concentration is exponentially decreased. pi is the intensity of the peak. If this value is not offered in input binding information, the value will be set to 1; and the method will fall back to the one used in BETA. Suppose there are n significantly differentially expressed genes with PC > 0, which will have two ranks: rank by peak concentration in descending order (Rpc) and false discovery rate (FDR)/P-value from the expression profile in ascending order (Rde). The rank product of each gene, RP = (Rpc/n)*(Rde/n), can be regarded as a P-value denoting the probability a gene with peak concentration rank higher than Rpc and differential expression rank higher than Rde. With a given cutoff, users are more likely to select the true targets. If the cutoff is set to 1, then the method becomes “direct”, that is, the target gene set is the intersection of TFBS-enriched genes and DEGs.

To detect TF-target relation involved in enhancer-promoter interaction, long-range interaction data will be applied when calculating the peak concentration. The peaks will be mapped to the interacting regions, and those mapped peaks will also contribute to the peak concentration hence affect the target ranking. If the TFBSs by motif scanning are used instead of ChIP-X data, open chromatin region and histone modification data will be used to pick the peaks located in the open chromatin regions and demanded histone modification regions.

Input

General Options

Job Name

The job name is the identity of your job in ChIP-Array v2.0. We will add a unique suffix to make the job name is also available for other users. The default job name is “job”, you input should be composed of alphabetics or underscores and no more than 6 characters.

Email

The Email address used to notify you your job starts and ends. It is optional.

Genome Assembly

The genome assembly used for your input

Involved TF Name

The involved transcription factor/chromatin modifier name

Perturbation

How you perturb your TF or chromatin modifier

Detection method

“Direct” or “Rank Product” as described above

Cutoff

Use for the “Rank Product” method.

Promoter Definition

The promoter region defined from TSS

Binding Information

Distance

The distance within which the peaks are taken into account

Peaks Considered

Number peaks taken part into the calculation

Binding Data Source

You can choose either upload/paste the binding data, use the TFBS scanned by motifs or select our curated ChIP-X data.

Input Format

The file format of the binding file (see Binding Data File formats)

Select Data file

You can select the TF first and filter the data files in different aspects.

Cutoff

The cutoff of the statistical value to keep the significant peaks only.

Transcriptome Data

Input Format

The format of the expression file (see Expression File Formats)

Gene Name Format

The gene name format of the input expression file.

Statistical Value Cutoff

The statistical value cutoff to keep the significantly differentially expressed genes.

Options For Detecting Indirect Targets

Motif database

The databases of the motifs that you want to use to detect indirect targets

PWM Scan P-Value

The P-Value cutoff used when scanning the motifs along the genome

Conservation Filtering P-Value

The conservation P-Value, we just keep the TFBS in the conserved regions.

Use builtin ChIP-seq/chip data if available

You can use our binding ChIP-X data if available to detect indirect targets. You have to select tissues/cell lines to use the data.

Other Types Of Omics Data

Long Range Chromatin Interactions

This type of data is used to try to investigate distal cis-regulatory regulations. Please see the format in Long Range Chromatin Interaction Data Format.

Open Chromatin Regions

Please see the format: BED format. Name is used as the track name in JBrowse

ChIP-Seq For Histone Modification

Please see the format: BED format. Name is used as the track name in JBrowse

Database

Curated data, please select a tissue/cell line to use it.

File formats

Binding Data File Formats

The main information will be chromosome, start, end, name, and intensity. First three are essential, the last two are optional. We will extract these information from different formats.

BED format

Specification: BED format. Sometimes the first 3 column (BED3) is sufficient.

Example:

#chrom	start	end	name	score	strand
chr7	127471196	127472363	Pos1	0	+
chr7	127472363	127473530	Pos2	0	+
chr7	127473530	127474697	Pos3	0	+
chr7	127474697	127475864	Pos4	0	+
chr7	127475864	127477031	Neg1	0	-
chr7	127477031	127478198	Neg2	0	-
chr7	127478198	127479365	Neg3	0	-
chr7	127479365	127480532	Pos5	0	+
chr7	127480532	127481699	Neg4	0	-

chromosome = chrom
start = start
end = end
name = name
intensity = score

GFF format

Specification: GFF format

Example:

#seqname	source feature	start	end score	strand frame	group
chr22	TeleGene enhancer	10000000	10001000	500 +	.	touch1
chr22	TeleGene promoter	10010000	10010100	900 +	.	touch1
chr22	TeleGene promoter	10020000	10025000	800 -	.	touch2

chromosome = seqname
start = start
end = end
intensity = score

Summit format

Example:

#id chrom	strand summit
Peak1	chr10	+	28822077
Peak1	chr10	+	28823037

chromosome = chrom
start = summit
end = summit

Cisgenome format

Example:

#rank	chrom	start	end strand length
1	chr10	28820931	28823205	+	2275
2	chr10	28720931	28723205	+	2275

chromosome = chrom
start = start
end = end

MACS format

Example:

#chrom start	end	length summit tags
chr10	28820931	28823205	2275	288 10
chr10	28720931	28723205	2275	288 10

chromosome = chrom
start = start
end = end

PePr format

Specification: PePr format

PePr: consensus binding of replicates

Output Files:

A tab-delimited file containing chromosomal position of the peak, peak width, p-value and Benjamini-Hochberg FDR.

Example columns for peaks:

#chrom	start	end	peak width	p-value	q-value
chr1	837200 837500 300 1.29969869177e-18 3.12684869127e-15
chr1	960800 961100 300 1.58064616217e-09 1.93155073885e-06
chr1	1224000	1225400	1400	3.45692322029e-86 2.9886376305e-80

chromosome = chrom
start = start
end = end

Cutoff applied column: q-value

DBChIP format

DBChIP: differential binding analysis of ChIP-seq

http://pages.cs.wisc.edu/~kliang/DBChIP/DBChIP.pdf

#chr	pos nsig	origin ori.pos	FC.L1	pval	FDR
chrI	260346 1	emb 260346 0.1129558	7.073049e-05	0.002140829
chrI	673122 1	emb 673122 0.1168590	8.920121e-05	0.002140829
chrI	454904 1	emb 454904 0.1632294	6.755146e-04	0.010237900
chrI	757094 1	emb 757094 0.1717831	8.531584e-04	0.010237900
chrI	547611 1	L1	547611 5.6827777	1.207511e-03	0.011592101
chrI	41410	1	L1	41410	4.2984821	4.851749e-03	0.038813991
chrI	546710 1	L1	546710 4.3025538	6.129874e-03	0.042033423
chrI	43116	1	emb 43116	0.2581921	1.179393e-02	0.070763563
chrI	159188 1	L1	159188 2.9888809	3.639110e-02	0.194085880
chrI	3907	1	emb 3907	0.3718658	5.128119e-02	0.246149708

chromosome = chr
start = pos
end = pos

Cutoff applied column: FDR

DiffBind format

DiffBind: Differential Binding Analysis of ChIP-Seq peak data

http://bioconductor.org/packages/release/bioc/manuals/DiffBind/man/DiffBind.pdf

Columns:

1.       Chr Chromosome of binding site

2.       Start Starting base position of binding site

3.       End End base position of binding site

4.       Conc Concentration – mean (log) reads across all samples in both groups

5.       Conc_group1 Group 1 Concentration – mean (log) reads across all samples first group

6.       Conc_group2 Group 2 Concentration – mean (log) reads across all samples in second group

7.       Fold Fold difference – mean fold difference of binding affinity of group 1 over group 2 (Conc1 - Conc2). Absolute value indicates magnitude of the difference, and sign indicates which one is bound with higher affinity, with a positive value indicating higher affinity in the first group

8.       p-value p-value calculation – statistic indicating significance of difference (likelihood difference is not attributable to chance)

9.       FDR adjusted p-value calculation – p-value subjected to multiple-testing correction

chromosome = Chr
start = Start
end = End

Cutoff applied column: FDR

Expression File Formats

LIMMA Standard output

Example:
#ID RefseqID	logFC	AveExpre	T	Pvalue Adj.P.Value	B
12196	NM_001548_at	-6.945783684	9.632803007	-138.2402671	6.92E-10	2.08E-05	11.83285762
15675	NM_005409_at	-6.11280866	6.322508161	-117.5664651	1.51E-09	2.08E-05	11.57790488

Cutoff applied column: Adj.P.Value

Cuffdiff standard output

Example:

#Test_id	gene_id	gene	locus	sample1	sample2	status	value_1	value_2	log2(foldchange)	test_stat	p_value	q_value	significant
NM_000014	NM_000014	-	chr12:9217772-9268558	q1	q2	NOTEST 0.102845	0.0820513	-0.325878	0.498271	0.618293	1	no
NM_000015	NM_000015	-	chr8:18248754-18258723	q1	q2	NOTEST 0.127358	0.30975	1.28221	-1.32328	0.185744	1	no
NM_000016	NM_000016	-	chr1:76190042-76229355	q1	q2	NOTEST 0	0	0	0	1	1	no
NM_000017	NM_000017	-	chr12:121163570-121177811	q1	q2	NOTEST 3.47702	3.62422	0.0598207	-0.195815	0.844755	1	no

Cutoff applied column: q-value

ChIP-Array specific format

This format contains 3 columns:

1.       Gene name

2.       Fold change (or “+” for up-regulated gene, “-” for down-regulated)

3.       Statistical value

Example:

NM_000014	+	6.92E-10
NM_000015	+	1.51E-09
NM_000016	-	2.08E-05
NM_000017	-	2.08E-05

Long Range Chromatin Interaction Data Format

This data format is BED-like, mainly 4 columns

Example:

#chrom start	end interacting region
chr1	4847311	4848041	chr1:5072982-5073656
chr1	5072982	5073656	chr1:4847311-4848041
chr1	4847511	4848273	chr1:5012745-5013247
chr1	5012745	5013247	chr1:4847511-4848273
chr1	9535344	9536256	chr1:9737558-9738479
chr1	9737558	9738479	chr1:9535344-9536256
chr1	9689756	9690445	chr1:9737778-9738318
chr1	9737778	9738318	chr1:9689756-9690445

Application scenarios

This section will introduce which scenario (what kind of data you have) the parameters will apply to. Generally, you may have ChIP-X data, transcriptome data to construct Gene Regulatory Network (GRN). Additionally You can have other omics data including long-range chromatin interaction, open chromatin region and histone modification data. The scenario could refer to none to all of the above data availble.

1. Data you have: none

Yes, we does not require any data to infer GRN, what you need to know is the factor name so that we can fetch the putative TFBS for you from our database.

You just need to select "Use builtin motifs" in the "Binding information" section, AND ALSO "Motif Database" in the "Options For Detecting Indirect Targets". You may also adjust the "Conservation Filtering P-Value" and "PWM Scan P-Value" if you like.

Or you may try our builtin (curated) ChIP-X data by selecting "Use builtin ChIP-X data" in the "Binding information" section.

In this situation, we just infer direct targets.

2. Data you have: ChIP-X

Then you just select "Upload/Paste binding data" in "Binding information" section, upload or paste you binding data

In this situation, we just infer direct targets.

3. Data you have: Transcriptome

You just need to select "Use builtin motifs" in the "Binding information" section, AND ALSO "Motif Database" in the "Options For Detecting Indirect Targets". You may also adjust the "Conservation Filtering P-Value" and "PWM Scan P-Value" if you like.

Or you may try our builtin (curated) ChIP-X data by selecting "Use builtin ChIP-X data" in the "Binding information" section.

In this situation, we will try to infer direct targets once you tick any of the "Motif Database" or "Use builtin ChIP-X data if available" in "Options For Detecting Indirect Targets"

4. Data you have: ChIP-X and Transcriptome

You can upload or paste your ChIP-X data in the "Binding information" section and transcriptome data in the "Transcriptome Data" section. Select a proper file format and a cutoff (if available) that you would like to use part of the records in the file to participating the calculation.

In this situation, we will try to infer direct targets once you tick any of the "Motif Database" or "Use builtin ChIP-X data if available" in "Options For Detecting Indirect Targets"

5. Data you have: any of long-range chromatin interaction, open chromatin region or histone modification data

This is like situation 1, but with other omics data available. You do similarily as situation 1, and upload or paste you omics data in "Other Types Of Omics Data" section.

Here TFBSs will be mapped by long-range chromatin interaction data to the paired regions, other omics data (open chromatin region, histone modification data) will be used as filter if you choose to use motif-scanned TFBSs.

6. Data you have: ChIP-X + any of long-range chromatin interaction, open chromatin region or histone modification data

This is like situation 2, but with other omics data available. You do similarily as situation 2, and upload or paste you omics data in "Other Types Of Omics Data" section.

Here TFBSs will be mapped by long-range chromatin interaction data to the paired regions, other omics data (open chromatin region, histone modification data) will be used annotators

7. Data you have: Transcriptome + any of long-range chromatin interaction, open chromatin region or histone modification data

This is like situation 3, but with other omics data available. You do similarily as situation 3, and upload or paste you omics data in "Other Types Of Omics Data" section.

Here TFBSs will be mapped by long-range chromatin interaction data to the paired regions, other omics data (open chromatin region, histone modification data) will be used as filter if you choose to use motif-scanned TFBSs to infer indirect targets.

8. Data you have: ChIP-X and Transcriptome + any of long-range chromatin interaction, open chromatin region or histone modification data

This is like situation 4, but with other omics data available. You do similarily as situation 4, and upload or paste you omics data in "Other Types Of Omics Data" section.

Here TFBSs will be mapped by long-range chromatin interaction data to the paired regions, other omics data (open chromatin region, histone modification data) will be used as filter if you choose to use motif-scanned TFBSs to infer indirect targets.

Results

Regulatory Network

When your job is done, you may see a network above without any indirect targets.

However, you can click “show the entire network” to show the full network

You can check the legend of the network, and see the representation of different type of nodes and edges:

You can also export the network into different type of files: PNG, PDF, XGMML, GRAPHML, SIF

To check the detailed information, you can either click the nodes or edges.

When you click a node, a table of its targets and regulators will be shown. Red represents a down-regulation, while green is for an up-regulation.

Click the regulation is just like click the edge in the table.

If the node is a TF, and its targets are detected by ChIP-X data, we will try to perform motif enrichment analysis using MEME.

It will show you discovered motifs and also the similar motifs in our database, as well as the associated TFs. When hover on the discovered motifs, the detail information including the PWM will show in a tip:

When click the matched motifs in our database, it will also show you the detail information including the weblogo. All the weblogos in our website can be clicked to check the complement reverse motif.

 

If the node is a target, it will show you the common events in its promoter region:

When you click the region, it will jump to the JBrowse tab, and show the region in JBrowse:

When you click an edge, a similar popup window will show.

The difference is, it will show the binding events in the target’s promoter region by the regulator. You can also click those regions to show it in JBrowse.

 

GO & Pathway enrichment analysis

In the right side bar of the result page, we will give top 5 of enriched terms for users to take a quick look of the functions of this network.

Users can also click the details to show them in the tab.

The vertical texts on the top are the name of the terms. P-values are in the middle and sorted ascendingly from left to right. If a gene appears in the term, the square will be filled with dark green, otherwise no filling color.

Users can also download the job and check the details of other information.

Co-occupancy Analysis

Uses can check 2 or 3 jobs in the job list and combine them to perform the co-occupancy analysis. The job name with squares “[]” indicates that this is a combined job.

We only allow the jobs using the same genome assembly to be combined.

The data used in individual jobs will be also incorperated into the combined job, and function enrichment will also be performed on the combined targets. Other functions are similiar as individual jobs.

JBrowse

Moving

  • Move the view by clicking and dragging in the track area, or by clicking the arrows in the navigation bar, or by pressing the left and right arrow keys.
  • Center the view at a point by clicking on either the track scale bar or overview bar, or by shift-clicking in the track area.

Zooming

  • Zoom in and out by clicking "+" or "-" in the navigation bar, or by pressing the up and down arrow keys while holding down "shift".
  • Select a region and zoom to it ("rubber-band" zoom) by clicking and dragging in the overview or track scale bar, or shift-clicking and dragging in the track area.
  • [Double-Click] to zoom in, [Shift] + [Double-Click] to zoom out.

Full screen mode

  • Click the "fullscreen" at the up-right corner, you can switch to fullscreen mode.
  • If it loads slowly, you may try to zoom in first, and then switch to fullscreen mode so that less elements will be loaded.

Showing Tracks

  • Turn a track on by dragging its track label from the "Available Tracks" area into the genome area, or double-clicking it.
  • Turn a track off by dragging its track label from the genome area back into the "Available Tracks" area.

Searching

  • Jump to a feature or reference sequence by typing its name in the location box and pressing Enter.
  • Jump to a specific region by typing the region into the location box as: ref:start..end.

Example Searches

uc0031k.2
searches for the feature named uc0031k.2.
chr4
jumps to chromosome 4
chr4:79,500,000..80,000,000
jumps the region on chromosome 4 between 79.5Mb and 80Mb.
5678
centers the display at base 5,678 on the current sequence

JBrowse Documentation

Login Register