Lp Regularization for Gene Network Inference


Inference of gene regulatory network from gene expression data at whole genome level is a grand challenge, especially in higher organisms, when the number of genes is large but the number of experimental samples is small. It is reported that the accuracy of current methods at genome scale dramatically drops from E. coli to S. cerevisiae due to the increase in the number of genes. This limits the applicability of current methods to more complex genomes, like human and mouse. Least absolute shrinkage and selection operator (LASSO) is widely used for gene regulatory network inference from gene expression profiles. However, the accuracy of LASSO in large genome is not satisfactory. Here, we apply two extended models of LASSO, the L0 and L1/2 regularization models, to infer gene regulatory network from both high-throughput gene expression data and transcription factor binding data (ChIP-seq/chip) in large genomes. Both the L0 and L1/2 regularization models significantly outperform LASSO for network inference, and incorporating interactions between transcription factors and their targets remarkably improves the prediction accuracy. This demonstrates the efficiency and applicability of these two models for gene regulatory network inference from integrating ChIP-seq/chip and transcriptome data in large genomes. The applications of the two models facilitate biologists to study the gene regulation of higher model organisms in a genome-wide scale.

The LpRGNI source code could be downloaded by clicking the hyperlinks: download here


Figure Workflow of gene regulatory network inference with three regularization models. A: matrix A and B containing the expression profiles of TFs and targets respectively are generated from transcriptome data, while ChIP-X identified TF-target interaction are converted into an initial X0; B: three regularization models are applied to solve the Lp (p=1,1/2,0) regularization models; C: output is a sparse matrix X* that describes the TF-target relationships, which is evaluated by two sets of golden standards.



Usage: Output:


This work was supported by funding from the Research Grants Council, Hong Kong SAR, China (grant number 781511M), National Natural Science Foundation of China, China (grant number 11101186 and 91229105).


*Correspondence should be addressed to Junwen Wang
(Tel: +852 2819 2809; Fax: +852 2855 1254)
junwen@hku.hk ).
(Office: L3-80, Laboratory Block, 21 Sassoon Road, Hong Kong).