This tutorial is desgined for users do not have pre-build gene regulatory networks.
Step0. Select motif database based on input species
IReNA contains DNA motif datasets for four species (Homo sapiens, Mus musculus, Zebrafish and Chicken) derived from TRANSFAC version 201803. Following codes are used to get the motif dataset from TRANSFAC or user-defined motif dataset which should have the same format as these from TRANSFAC database. Selected motif database is used to infer gene regulatory networks.
library(IReNA)
###call Mus musculus motif database
motif1 <- Tranfac201803_Mm_MotifTFsF
###call Homo sapiens motif database
motif1 <- Tranfac201803_Hs_MotifTFsF
###call Zebrafish motif database
motif1 <- Tranfac201803_Zf_MotifTFsF
###call Chicken motif database
motif1 <- Tranfac201803_Ch_MotifTFsFThe species of test data used here is Homo sapiens, so we select Tranfac201803_Hs_MotifTFsF as motif database
motif1 <- Tranfac201803_Hs_MotifTFsF
candidate_tfs = unlist(strsplit(motif1$TFs,';'))Step1. Build moduliazed pseudo-bulk expression profile
IReNA utilizes modularized gene regulatory networks to identify enriched transcription factors and simplified networks to represent biological processes. Therefore, before inferring gene regulatory networks, we need to modularize all input features (genes) to obtain a modularized gene regulatory network in subsequent steps. In addition, because scRNA-seq data are highly sparse, this may inhibit the accurate inference of regulatory relationships; therefore, we need to construct pseudo-bulk expression profiles (metacells).
To run this part, you need to prepare a seurat object with ‘Pseudotime’ column in metadata.
obj = readRDS("E:/IReNA/SSC_seurat_with_time.rds")
exp = irena_step1(obj,50,4)## [1] "Perform k-means"
##
## 1 2 3 4
## 557 649 1021 457
## [1] "Sort genes"
## [1] "Revise outlier"
## [1] "Number of outlier: 0" "Number of outlier: 0"
Then, we can visualize the moduliazed pseudo-bulk expression profile
plot_kmeans_pheatmap(exp,ModuleColor1 = c('#67C7C1','#5BA6DA','#FFBF0F','#C067A9'))Step2. infer gene regulatory networks
infer co-expression networks
regulatory_relationships = irena_step2(as.matrix(exp),candidate_tfs=candidate_tfs)## [1] "Using pearson correlation to infer basic grn"
### Then we can filter weak regualtory relationships based on regulatory weight
regulatory_relationships = regulatory_relationships[abs(regulatory_relationships$Correlation)>0.8,]Refine regulatory relationships based on binding motifs of target genes (optional)
For ATAC-seq data is not available. IReNA uses fimo to check whether motifs of transcription factors that regulate the target gene occur in the upstream of target genes. If motifs of transcription factors that regulate the target gene exist, this gene pair will be retained.
Firstly select candidate gene tss database.
### select hg38 gene tss for human data
gene_tss = hg38_gene_tss
### you can also using the following code to select mm10 gene tss for mouse data
# gene_tss = mm10_gene_tssThen calculate potential binding TFs of target genes throguh R package motifmatchr.
library(BSgenome.Hsapiens.UCSC.hg38)
gene_tss = gene_tss[gene_tss$symbol %in% rownames(exp),]
gene_tss_GR <- GRanges(paste0(gene_tss[,1],':',gene_tss[,2],'-',gene_tss[,3]))
fimo_regulation <- identify_region_tfs(gene_tss_GR,gene_tss[,6],motif1,BSdb=BSgenome.Hsapiens.UCSC.hg38)
regulatory_relationships <- filter_regulation_fimo(fimo_regulation, regulatory_relationships)Step3. Decode gene regulatory networks
The primary novelty of IReNA lies in its ability to decode regulatory relationships among modules using a hypergeometric test. Here, you can use the following code to perform IReNA analysis. The output of IReNA analysis contains three most important part:
Enriched TFs that regulate other module
Enriched TFs networks: regulatory network of enriched TFs.
Simplified networks: regualtory network among modules.
IReNA_result <- network_analysis(regulatory_relationships,exp[,1:2],TFFDR1=10,TFFDR2 = 10,ModuleFDR = 0.01)## [1] "Total TFs: 145"
## [1] "Enriched TFs: 69"
## [1] "Significant regulations: 4"
Browse enriched TFs that regulate other module
IReNA_result$TF_module_regulation[1:3,]## TF TFGroup LogFDR TargetGroup RegulationType
## out2 CCNT2 1 -0.744319429562015 Group1 Positive
## out2.1 REST 1 -0.815215941738212 Group1 Positive
## out2.2 APEX1 1 -0.617368646095004 Group1 Positive
Browse enriched TFs network
IReNA_result$TF_network[1:3,]## TF TFGroup TFMinNlogfdr TFMinGroup SigActModules SigRepModules Target
## 34906 CCNT2 1 Inf P1 1 NA CCNT2
## 37591 REST 1 Inf P1 1 NA REST
## 64441 APEX1 1 Inf P1 1 NA APEX1
## Correlation TFGroup TargetGroup Regulation
## 34906 1 1 1 Positive
## 37591 1 1 1 Positive
## 64441 1 1 1 Positive
Browse simplified networks
IReNA_result$intramodular_network[1:3,]## TFGroup TargetGroup Regulation Correlation
## Regulation12Pnum 1 1 Positive 1
## Regulation12Pnum.4 2 2 Positive 0.952059736819138
## Regulation12Pnum.7 3 3 Positive 0.851203572257657
## NumberRegulation Pvalue NlogFdr
## Regulation12Pnum 12;12;42;12 0 Inf
## Regulation12Pnum.4 7;7;47;7 0 Inf
## Regulation12Pnum.7 29;29;25;29 0 Inf
In addition, you can also use our function to visualize enriched tf network
plot_tf_network(IReNA_result)