This tutorial is desgined for users do not have pre-build gene regulatory networks.

Step0. Select motif database based on input species

IReNA contains DNA motif datasets for four species (Homo sapiens, Mus musculus, Zebrafish and Chicken) derived from TRANSFAC version 201803. Following codes are used to get the motif dataset from TRANSFAC or user-defined motif dataset which should have the same format as these from TRANSFAC database. Selected motif database is used to infer gene regulatory networks.

library(IReNA)
###call Mus musculus motif database
motif1 <- Tranfac201803_Mm_MotifTFsF
###call Homo sapiens motif database
motif1 <- Tranfac201803_Hs_MotifTFsF
###call Zebrafish motif database
motif1 <- Tranfac201803_Zf_MotifTFsF
###call Chicken motif database
motif1 <- Tranfac201803_Ch_MotifTFsF

The species of test data used here is Homo sapiens, so we select Tranfac201803_Hs_MotifTFsF as motif database

motif1 <- Tranfac201803_Hs_MotifTFsF
candidate_tfs = unlist(strsplit(motif1$TFs,';'))

Step1. Build moduliazed pseudo-bulk expression profile

IReNA utilizes modularized gene regulatory networks to identify enriched transcription factors and simplified networks to represent biological processes. Therefore, before inferring gene regulatory networks, we need to modularize all input features (genes) to obtain a modularized gene regulatory network in subsequent steps. In addition, because scRNA-seq data are highly sparse, this may inhibit the accurate inference of regulatory relationships; therefore, we need to construct pseudo-bulk expression profiles (metacells).

To run this part, you need to prepare a seurat object with ‘Pseudotime’ column in metadata.

obj = readRDS("E:/IReNA/SSC_seurat_with_time.rds")
exp = irena_step1(obj,50,4)

## [1] "Perform k-means"
## 
##    1    2    3    4 
##  557  649 1021  457 
## [1] "Sort genes"
## [1] "Revise outlier"
## [1] "Number of outlier: 0" "Number of outlier: 0"

Then, we can visualize the moduliazed pseudo-bulk expression profile

plot_kmeans_pheatmap(exp,ModuleColor1 = c('#67C7C1','#5BA6DA','#FFBF0F','#C067A9'))

Step2. infer gene regulatory networks

infer co-expression networks

regulatory_relationships = irena_step2(as.matrix(exp),candidate_tfs=candidate_tfs)

## [1] "Using pearson correlation to infer basic grn"

### Then we can filter weak regualtory relationships based on regulatory weight
regulatory_relationships = regulatory_relationships[abs(regulatory_relationships$Correlation)>0.8,]

Refine regulatory relationships based on binding motifs of target genes (optional)

For ATAC-seq data is not available. IReNA uses fimo to check whether motifs of transcription factors that regulate the target gene occur in the upstream of target genes. If motifs of transcription factors that regulate the target gene exist, this gene pair will be retained.

Firstly select candidate gene tss database.

### select hg38 gene tss for human data
gene_tss = hg38_gene_tss
### you can also using the following code to select mm10 gene tss for mouse data
# gene_tss = mm10_gene_tss

Then calculate potential binding TFs of target genes throguh R package motifmatchr.

library(BSgenome.Hsapiens.UCSC.hg38)
gene_tss = gene_tss[gene_tss$symbol %in% rownames(exp),]
gene_tss_GR <- GRanges(paste0(gene_tss[,1],':',gene_tss[,2],'-',gene_tss[,3]))
fimo_regulation <- identify_region_tfs(gene_tss_GR,gene_tss[,6],motif1,BSdb=BSgenome.Hsapiens.UCSC.hg38)
regulatory_relationships <- filter_regulation_fimo(fimo_regulation, regulatory_relationships)

Step3. Decode gene regulatory networks

The primary novelty of IReNA lies in its ability to decode regulatory relationships among modules using a hypergeometric test. Here, you can use the following code to perform IReNA analysis. The output of IReNA analysis contains three most important part:

Enriched TFs that regulate other module
Enriched TFs networks: regulatory network of enriched TFs.
Simplified networks: regualtory network among modules.

IReNA_result <- network_analysis(regulatory_relationships,exp[,1:2],TFFDR1=10,TFFDR2 = 10,ModuleFDR = 0.01)

## [1] "Total TFs: 145"
## [1] "Enriched TFs: 69"
## [1] "Significant regulations: 4"

Browse enriched TFs that regulate other module

IReNA_result$TF_module_regulation[1:3,]

##           TF TFGroup             LogFDR TargetGroup RegulationType
## out2   CCNT2       1 -0.744319429562015      Group1       Positive
## out2.1  REST       1 -0.815215941738212      Group1       Positive
## out2.2 APEX1       1 -0.617368646095004      Group1       Positive

Browse enriched TFs network

IReNA_result$TF_network[1:3,]

##          TF TFGroup TFMinNlogfdr TFMinGroup SigActModules SigRepModules Target
## 34906 CCNT2       1          Inf         P1             1            NA  CCNT2
## 37591  REST       1          Inf         P1             1            NA   REST
## 64441 APEX1       1          Inf         P1             1            NA  APEX1
##       Correlation TFGroup TargetGroup Regulation
## 34906           1       1           1   Positive
## 37591           1       1           1   Positive
## 64441           1       1           1   Positive

Browse simplified networks

IReNA_result$intramodular_network[1:3,]

##                    TFGroup TargetGroup Regulation       Correlation
## Regulation12Pnum         1           1   Positive                 1
## Regulation12Pnum.4       2           2   Positive 0.952059736819138
## Regulation12Pnum.7       3           3   Positive 0.851203572257657
##                    NumberRegulation Pvalue NlogFdr
## Regulation12Pnum        12;12;42;12      0     Inf
## Regulation12Pnum.4         7;7;47;7      0     Inf
## Regulation12Pnum.7      29;29;25;29      0     Inf

In addition, you can also use our function to visualize enriched tf network

plot_tf_network(IReNA_result)

Run IReNA based on seurat object (without pre-build GRN and gene modules)

Jiang junyao