# Gh_circRNAseq **Repository Path**: shawnmagic/Gh_circRNAseq ## Basic Information - **Project Name**: Gh_circRNAseq - **Description**: analysis and visualization R scripts of my research 「Genome-wide profiling of circular RNAs during the hybridization of two elite varieties of Gossypium hirsutum」 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-22 - **Last Updated**: 2022-07-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # R Code for Data Analysis and Visualization Analysis and visualization R scripts of my research 「Genome-wide profiling of circular RNAs in the hybridization of two elite inbred lines of *Gossypium hirsutum*」 ## Part 1 Identification and characterization of candidate circRNAs **Relationships of expressed circRNA in fiber leaf and ovule** - The upset plot was generated by [TBtools](https://github.com/CJ-Chen/TBtools/releases). - The histogram representing the size of the each set is transformed into a stack barplot produced by ggplot. - The origination of circRNA: `pie diagram`. - CircRNA length density: `ggplot2::geom_density`. - CircRNA distribution on upland cotton chromosome: `ggplot2::geom_bar` - All circRNA expression profile: `pheatmap` - data: `log10(TPM+1)` - scale: `z-score by "row"` **Tissue specific of circRNA expression** - Method: `WGCNA` - Input data:`readcount of junction_site` - Normalization method:`varianceStabilizingTransformation` - Considering `datExpr` included informations from different tissue, sample heterogeneity is inevitable, but our goal is to classify genes into different tissue, and we are not interested with "*hub genes*",so there is no need to demand the network should be `scale-free`. - However, the final result demonstrate that we did get the appropriate power, and GS and MM also showed a significant correlation。 - GO enrichment results from TBtools. ## Part 2 SPE and ENAE circRNA analysis **Method:** ```R ## 2. function to judge expressed or not with cutoff cpm < 0 & cpm > 1 in at least two of three samples of each cultivar AddTags = function(cpm){ for (i in 1:nrow(cpm)) { ## Hybrid if (sum(cpm[i,1:3] > 1) > 1) { cpm[i,10] <- "Ture" }else if (sum(cpm[i,1:3] < ) > 1) { cpm[i,10] <- "False" }else { cpm[i,10] <- "None" } ## Maternal if (sum(cpm[i,4:6] > 1) > 1) { cpm[i,11] <- "Ture" }else if (sum(cpm[i,4:6] < ) > 1) { cpm[i,11] <- "False" }else { cpm[i,11] <- "None" } ## Paternal if (sum(cpm[i,7:9] > 1) > 1) { cpm[i,12] <- "Ture" }else if (sum(cpm[i,7:9] < ) > 1) { cpm[i,12] <- "False" }else { cpm[i,12] <- "None" } } return(cpm) } ## 3. based on the sample tag, divide the circRNAs into SPE and NAE categories. SPEandNAEfilter = function(cpm){ ## 01. add ture or false tag cpm = AddTags(cpm) colnames(cpm)[(ncol(cpm)-2):ncol(cpm)] = c("H.tag","M.tag","P.tag") # change column names ## 02. add tags for each circRNA if it belongs to SPE. cpm[,13] = case_when( cpm$M.tag == "Ture" & cpm$P.tag == "False" ~ "SPE-Maternal", cpm$M.tag == "False" & cpm$P.tag == "Ture" ~ "SPE-Paternal" ) ## 03. same as SPE for NAE cpm[,14] = case_when( cpm$M.tag == "False" & cpm$P.tag == "False" & cpm$H.tag == "Ture" ~ "NAE-Hybrid", cpm$M.tag == "Ture" & cpm$P.tag == "Ture" & cpm$H.tag == "False" ~ "NAE-Parent", cpm$M.tag == "False" & cpm$P.tag == "Ture" & cpm$H.tag == "False" ~ "NAE-Parent", cpm$M.tag == "Ture" & cpm$P.tag == "False" & cpm$H.tag == "False" ~ "NAE-Parent", ) ## finish work, add rownames and circID tag colnames(cpm)[(ncol(cpm)-1):ncol(cpm)] = c("SPE","NAE") cpm = data.frame(row.names = rawcount$circID, circID = rawcount$circID, cpm) return(cpm) } ``` **Figure 2** - UpsetPlot:`TBtools` - Heatmap:`pheatmap` - Half-box-half-jitter: `ggpol::geom_boxjitter` - Gene Structure and CircRNA Structure: `ggplot2::geom_segment` - qRT-PCR result boxplot: `ggplot2::geom_boxplot` ## Part 3 Expression Pattern of different expressed circRNAs **Expression Pattern from [Guanjing Hu-Wendellab](https://github.com/Wendellab/CisTransRegulation)** ```R classDominance<-function(TvsMid, TvsP1, TvsP2, P1vsP2, log2fc.threshold=0, reverseTvsP =FALSE, Pnames=c("Maternal","Paternal")) { # Hybrid/polyploid vs Mid parental val TvsMid <- data.frame(TvsMid[,c("log2FoldChange", "lfcSE", "pvalue")]) names(TvsMid) <- c("TvsMid", "TvsMid.SE", "TvsMid.pvalue") # Hybrid/polyploid vs Parent 1 TvsP1 <- data.frame(TvsP1[,c("log2FoldChange", "lfcSE", "pvalue")]) names(TvsP1) <- c("TvsP1", "TvsP1.SE", "TvsP1.pvalue") # Hybrid/polyploid vs Parent 2 TvsP2 <- data.frame(TvsP2[,c("log2FoldChange", "lfcSE", "pvalue")]) names(TvsP2) <- c("TvsP2", "TvsP2.SE", "TvsP2.pvalue") # Parent 1 vs Parent 2 P1vsP2 <- data.frame(P1vsP2[,c("log2FoldChange", "lfcSE", "pvalue")]) names(P1vsP2) <- c("P1vsP2", "P1vsP2.SE", "P1vsP2.pvalue") if(reverseTvsP==TRUE){ TvsP1$TvsP1 = -TvsP1$TvsP1 TvsP2$TvsP2 = -TvsP2$TvsP2 } tbl = cbind(TvsMid, TvsP1, TvsP2, P1vsP2) tbl$TvsMid[is.na(tbl$TvsMid)] =0 tbl$TvsP1[is.na(tbl$TvsP1)] =0 tbl$TvsP2[is.na(tbl$TvsP2)] =0 tbl$P1vsP2[is.na(tbl$P1vsP2)] =0 # judge tbl$additivity <- ifelse(abs(tbl$TvsMid)>log2fc.threshold & tbl$TvsMid.pvalue<0 & !is.na(tbl$TvsMid.pvalue), "T!=Mid", "T=Mid") tbl$TvsP1.reg <- ifelse(abs(tbl$TvsP1)>log2fc.threshold & tbl$TvsP1.pvalue<0 & !is.na(tbl$TvsP1.pvalue), "T!=P1", "T=P1") tbl$TvsP2.reg <- ifelse(abs(tbl$TvsP2)>log2fc.threshold & tbl$TvsP2.pvalue<0 & !is.na(tbl$TvsP2.pvalue), "T!=P2", "T=P2") tbl$P1vsP2.reg <- ifelse(abs(tbl$P1vsP2)>log2fc.threshold & tbl$P1vsP2.pvalue<0 & !is.na(tbl$P1vsP2.pvalue), ifelse(tbl$P1vsP2>log2fc.threshold & tbl$P1vsP2.pvalue<0 & !is.na(tbl$P1vsP2.pvalue), "P1>P2","P1P2;T!=Mid;T=P1;T!=P2",tbl$class)] = paste0("2.",Pnames[1],"-dominant, higher") tbl$category[grep("P1P2;T!=Mid;T!=P1;T=P2",tbl$class)] = paste0("3.",Pnames[2],"-dominant, lower") tbl$category[grep("P10 & tbl$TvsP2>0 & tbl$P1vsP2>0] = paste0("4.Transgressive Up: ",Pnames[1]," higher") tbl$category[grepl("T!=Mid;T!=P1;T!=P2",tbl$class) & tbl$TvsP1>0 & tbl$TvsP2>0 & tbl$P1vsP2<0] = paste0("4.Transgressive Up: ",Pnames[2]," higher") tbl$category[grepl("T!=Mid;T!=P1;T!=P2",tbl$class) & tbl$TvsP1<0 & tbl$TvsP2<0 & tbl$P1vsP2>0] = paste0("5.Transgressive Down: ",Pnames[1]," higher") tbl$category[grepl("T!=Mid;T!=P1;T!=P2",tbl$class) & tbl$TvsP1<0 & tbl$TvsP2<0 & tbl$P1vsP2<0] = paste0("5.Transgressive Down: ",Pnames[2]," higher") return(tbl) } ``` **Notice:** It seems Transgressive up/down p1=p2 was missing. but it has been included. because when judge `Transgress pattern` The rank of P1 and P2 is only judged by the TPM value, but not corrected by pvalue or log2fc. - FourQuadrantChart: `ggplot2::gemo_point` non-additive circRNAs were labeled. ## Part 5 ceRNA network (as miRNA-sponge) ![Function of circRNA](https://www.embopress.org/cms/asset/9bf961e0-b7cc-4e9a-8a6b-33a953b19846/embj2018100836-fig-0001-m.jpg) From: [Past, present, and future of circRNAs](https://www.embopress.org/doi/full/105252/embj018100836) by *Ines Lucia Patop, Stas Wüst and Sebastian Kadener* **miRNA was the bridge of the network, connected the circRNA and mRNA** - Circos plot: `circlize` - ceRNA network: `gephi` - Protein-protein interaction: [STRING](https://string-db.org/)