The Methylome in Female Adolescent Conduct Disorder: Neural Pathomechanisms and Environmental Risk Factors

Last updated: 2021-07-31

Checks: 7 0

Knit directory: femNATCD_MethSeq/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210128)

The command set.seed(20210128) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: b452d2f

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b452d2f. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory
    Ignored:    code/.Rhistory
    Ignored:    data/Epicounts.csv
    Ignored:    data/Epimeta.csv
    Ignored:    data/Epitpm.csv
    Ignored:    data/KangUnivers.txt
    Ignored:    data/Kang_DataPreprocessing.RData
    Ignored:    data/Kang_dataset_genesMod_version2.txt
    Ignored:    data/PatMeta.csv
    Ignored:    data/ProcessedData.RData
    Ignored:    data/RTrawdata/
    Ignored:    data/SNPCommonFilt.csv
    Ignored:    data/femNAT_PC20.txt
    Ignored:    output/4A76FA10
    Ignored:    output/BrainMod_Enrichemnt.pdf
    Ignored:    output/Brain_Module_Heatmap.pdf
    Ignored:    output/DMR_Results.csv
    Ignored:    output/GOres.xlsx
    Ignored:    output/LME_GOplot.pdf
    Ignored:    output/LME_GOplot.svg
    Ignored:    output/LME_Results.csv
    Ignored:    output/LME_Results_Sig.csv
    Ignored:    output/LME_Results_Sig_mod.csv
    Ignored:    output/LME_tophit.svg
    Ignored:    output/ProcessedData.RData
    Ignored:    output/RNAvsMETplots.pdf
    Ignored:    output/Regions_GOplot.pdf
    Ignored:    output/Regions_GOplot.svg
    Ignored:    output/ResultsgroupComp.txt
    Ignored:    output/SEM_summary_groupEpi_M15.txt
    Ignored:    output/SEM_summary_groupEpi_M2.txt
    Ignored:    output/SEM_summary_groupEpi_M_all.txt
    Ignored:    output/SEM_summary_groupEpi_TopHit.txt
    Ignored:    output/SEM_summary_groupEpi_all.txt
    Ignored:    output/SEMplot_Epi_M15.html
    Ignored:    output/SEMplot_Epi_M15.png
    Ignored:    output/SEMplot_Epi_M15_files/
    Ignored:    output/SEMplot_Epi_M2.html
    Ignored:    output/SEMplot_Epi_M2.png
    Ignored:    output/SEMplot_Epi_M2_files/
    Ignored:    output/SEMplot_Epi_M_all.html
    Ignored:    output/SEMplot_Epi_M_all.png
    Ignored:    output/SEMplot_Epi_M_all_files/
    Ignored:    output/SEMplot_Epi_TopHit.html
    Ignored:    output/SEMplot_Epi_TopHit.png
    Ignored:    output/SEMplot_Epi_TopHit_files/
    Ignored:    output/SEMplot_Epi_all.html
    Ignored:    output/SEMplot_Epi_all.png
    Ignored:    output/SEMplot_Epi_all_files/
    Ignored:    output/barplots.pdf
    Ignored:    output/circos_DMR_tags.svg
    Ignored:    output/circos_LME_tags.svg
    Ignored:    output/clinFact.RData
    Ignored:    output/dds_filt_analyzed.RData
    Ignored:    output/designh0.RData
    Ignored:    output/designh1.RData
    Ignored:    output/envFact.RData
    Ignored:    output/functional_Enrichemnt.pdf
    Ignored:    output/gostres.pdf
    Ignored:    output/modelFact.RData
    Ignored:    output/resdmr.RData
    Ignored:    output/resultsdmr_table.RData
    Ignored:    output/table1_filtered.Rmd
    Ignored:    output/table1_filtered.docx
    Ignored:    output/table1_unfiltered.Rmd
    Ignored:    output/table1_unfiltered.docx
    Ignored:    setup_built.R

Unstaged changes:
    Deleted:    analysis/RNAvsMETplots.pdf

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/03_01_Posthoc_analyses_Gene_Enrichment.Rmd) and HTML (docs/03_01_Posthoc_analyses_Gene_Enrichment.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	b452d2f	achiocch	2021-07-30	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	b452d2f	achiocch	2021-07-30	wflow_publish(c(“analysis/", "code/”, "docs/*"))
Rmd	1a9f36f	achiocch	2021-07-30	reviewed analysis
html	2734c4e	achiocch	2021-05-08	Build site.
html	a847823	achiocch	2021-05-08	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	9cc52f7	achiocch	2021-05-08	Build site.
html	158d0b4	achiocch	2021-05-08	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	0f262d1	achiocch	2021-05-07	Build site.
html	5167b90	achiocch	2021-05-07	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	05aac7f	achiocch	2021-04-23	Build site.
Rmd	5f070a5	achiocch	2021-04-23	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	f5c5265	achiocch	2021-04-19	Build site.
Rmd	dc9e069	achiocch	2021-04-19	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	dc9e069	achiocch	2021-04-19	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	17f1eec	achiocch	2021-04-10	Build site.
Rmd	4dee231	achiocch	2021-04-10	wflow_publish(c(“analysis/", "code/”, "docs/*"))
html	91de221	achiocch	2021-04-05	Build site.
Rmd	b6c6b33	achiocch	2021-04-05	updated GO function, and model def
html	4ea1bba	achiocch	2021-02-25	Build site.
Rmd	6c21638	achiocch	2021-02-25	wflow_publish(c(“analysis/", "code/”, "docs/*"), update = F)
html	6c21638	achiocch	2021-02-25	wflow_publish(c(“analysis/", "code/”, "docs/*"), update = F)

Genomic location enrichment

Significant loci with a p-value <= 0.01 and a absolute log2 fold-change lager 0.5 were tested for enrichment in annotated genomic feature using fisher exact test.

Ranges=rowData(dds_filt)

TotTagsofInterest=sum(Ranges$WaldPvalue_groupCD<=thresholdp & abs(Ranges$groupCD)>thresholdLFC)

Resall=data.frame()
index = Ranges$WaldPvalue_groupCD<=thresholdp& abs(Ranges$groupCD)>thresholdLFC
for (feat in unique(Ranges$feature)){
  tmp=table(Ranges$feature == feat, signif=index)
  resfish=fisher.test(tmp)
  res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
  Resall = rbind(Resall, res)
}
tmp=table(Ranges$tf_binding!="", signif=index)
resfish=fisher.test(tmp)
res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
Resall = rbind(Resall, res)
tmp=table(Ranges$cpg=="cpg", signif=index)
resfish=fisher.test(tmp)
res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
Resall = rbind(Resall, res)
colnames(Resall)=c("OR", "CI95L", "CI95U", "P")
rownames(Resall)=c(unique(Ranges$feature), "TF-binding", "CpG-island")
Resall$Beta = log(Resall$OR)
Resall$SE = (log(Resall$OR)-log(Resall$CI95L))/1.96
Resall$Padj=p.adjust(Resall$P, method = "bonferroni")

Resdown=data.frame()
index = Ranges$WaldPvalue_groupCD<=thresholdp & Ranges$groupCD<thresholdLFC
for (feat in unique(Ranges$feature)){
  tmp=table(Ranges$feature == feat, signif=index)
  resfish=fisher.test(tmp)
  res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
  Resdown = rbind(Resdown, res)
}
tmp=table(Ranges$tf_binding!="", signif=index)
resfish=fisher.test(tmp)
res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
Resdown = rbind(Resdown, res)
tmp=table(Ranges$cpg=="cpg", signif=index)
resfish=fisher.test(tmp)
res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
Resdown = rbind(Resdown, res)
colnames(Resdown)=c("OR", "CI95L", "CI95U", "P")
rownames(Resdown)=c(unique(Ranges$feature), "TF-binding", "CpG-island")
Resdown$Beta = log(Resdown$OR)
Resdown$SE = (log(Resdown$OR)-log(Resdown$CI95L))/1.96
Resdown$Padj=p.adjust(Resdown$P, method = "bonferroni")

Resup=data.frame()
index = Ranges$WaldPvalue_groupCD<=thresholdp & Ranges$groupCD>thresholdLFC
for (feat in unique(Ranges$feature)){
  tmp=table(Ranges$feature == feat, signif=index)
  resfish=fisher.test(tmp)
  res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
  Resup = rbind(Resup, res)
}
tmp=table(Ranges$tf_binding!="", signif=index)
resfish=fisher.test(tmp)
res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
Resup = rbind(Resup, res)
tmp=table(Ranges$cpg=="cpg", signif=index)
resfish=fisher.test(tmp)
res = c(resfish$estimate, unlist(resfish$conf.int), resfish$p.value)
Resup = rbind(Resup, res)
colnames(Resup)=c("OR", "CI95L", "CI95U", "P")
rownames(Resup)=c(unique(Ranges$feature), "TF-binding", "CpG-island")
Resup$Beta = log(Resup$OR)
Resup$SE = (log(Resup$OR)-log(Resup$CI95L))/1.96
Resup$Padj=p.adjust(Resup$P, method = "bonferroni")

multiORplot(Resall, Pval = "P", Padj = "Padj", beta="Beta",SE = "SE", pheno="All diff. methylated loci")

multiORplot(Resup, Pval = "P", Padj = "Padj", beta="Beta",SE = "SE", pheno="hypomethylated loci")

multiORplot(Resdown, Pval = "P", Padj = "Padj", beta="Beta",SE = "SE", pheno="Hypermethylated loci")

pdf(paste0(Home, "/output/functional_Enrichemnt.pdf"))
multiORplot(Resall, Pval = "P", Padj = "Padj", beta="Beta",SE = "SE", pheno="All diff. methylated loci")
multiORplot(Resup, Pval = "P", Padj = "Padj", beta="Beta",SE = "SE", pheno="hypomethylated loci")
multiORplot(Resdown, Pval = "P", Padj = "Padj", beta="Beta",SE = "SE", pheno="Hypermethylated loci")
dev.off()

png 
  2

GO-term Enrichment

Significant loci and differentially methylated regions with a p-value <= 0.01 and an absolute log2 fold-change lager 0.5 were tested for enrichment among GO-terms Molecular Function, Cellular Compartment and Biological Processes, KEGG pathways, Transcription factor Binding sites, Human Protein Atlas Tissue Expression, Human Phenotypes.

getGOresults = function(geneset, genereference){
  resgo = gost(geneset, organism = "hsapiens",
               correction_method = "g_SCS",
               domain_scope = "custom",
               sources = c("GO:BP", "GO:MF", "GO:CC"),
               custom_bg = genereference)
  if(length(resgo) != 0){
    return(resgo)
  } else {
    print("no significant results")
    return(NULL)
  }  
}

gene_univers = getuniquegenes(as.data.frame(rowRanges(dds_filt))$gene)


idx = (results_Deseq$pvalue <= thresholdp & 
         (abs(results_Deseq$log2FoldChange) > thresholdLFC))

genes_reg = getuniquegenes(as.data.frame(rowRanges(dds_filt))$gene[idx])


dmr_genes = unique(resultsdmr_table$name[resultsdmr_table$p.value<=thresholdp & 
                   abs(resultsdmr_table$value)>=thresholdLFC])


Genes_of_interset = list("01_dmregions" = dmr_genes,  
                         "02_dmtag" = genes_reg
                         )

gostres = getGOresults(Genes_of_interset, gene_univers)

gostplot(gostres, capped = TRUE, interactive = T)

p = gostplot(gostres, capped = TRUE, interactive = F)

toptab = gostres$result

pp = publish_gostplot(p, filename = paste0(Home,"/output/gostres.pdf"))

The image is saved to C:/Users/chiocchetti/Projects/femNATCD_MethSeq/output/gostres.pdf

write.xlsx2(toptab, file = paste0(Home,"/output/GOres.xlsx"), sheetName = "GO_enrichment")

Kang Brain Module Enrichments

Gene sets identified to be deferentially methylated with a p-value <= 0.01 and an absolute log2 fold-change larger 0.5 were tested for enrichment among gene-modules coregulated during Brain expression.

# define Reference Universe 

KangUnivers<- read.table(paste0(Home,"/data/KangUnivers.txt"), sep="\t", header=T)
colnames(KangUnivers)<-c("EntrezId","Symbol")

Kang_genes<-read.table(paste0(Home,"/data/Kang_dataset_genesMod_version2.txt"),sep="\t",header=TRUE)

#3)Generate Gene universe to be used for single gene lists
tmp=merge(KangUnivers,Kang_genes,by.y="EntrezGene",by.x="EntrezId",all=TRUE) #18826
KangUni_Final<-tmp[duplicated(tmp$EntrezId)==FALSE,] #18675


# Local analysis gene universe
Annotation_list<-data.frame(Symbol = gene_univers)

# match modules 
Annotation_list$Module = Kang_genes$Module[match(Annotation_list$Symbol,Kang_genes$symbol)]

# check if overlapping in gene universes
Annotation_list$univers = Annotation_list$Symbol %in% KangUni_Final$Symbol

# drop duplicates 
Annotation_list = Annotation_list[duplicated(Annotation_list$Symbol)==FALSE,]

# selct only genes that have been detected on both datasets
Annotation_list = Annotation_list[Annotation_list$univers==T,] 

# final reference 
UniversalGeneset=Annotation_list$Symbol

# define Gene lists to test 
# sort and order Modules to be tested

Modules=unique(Annotation_list$Module)
Modules = Modules[! Modules %in% c(NA, "")]
Modules = Modules[order(as.numeric(gsub("M","",Modules)))]

GL_all=list()

for(i in Modules){
  GL_all[[i]]=Annotation_list$Symbol[Annotation_list$Module%in%i]
}
GL_all[["M_all"]]=Kang_genes$symbol[Kang_genes$Module %in% Modules]


GOI1 = Genes_of_interset

Resultsall=list()
for(j in names(GOI1)){
  Res = data.frame()
  for(i in names(GL_all)){
    Modulegene=GL_all[[i]]
    Factorgene=GOI1[[j]]
    Testframe<-fisher.test(table(factor(UniversalGeneset %in% Factorgene,levels=c("TRUE","FALSE")),
                                 factor(UniversalGeneset %in% Modulegene,levels=c("TRUE","FALSE"))))
    beta=log(Testframe$estimate)
    Res[i, "beta"] =beta
    Res[i, "SE"]=abs(beta-log(Testframe$conf.int[1]))/1.96
    Res[i, "Pval"]=Testframe$p.value
    Res[i, "OR"]=(Testframe$estimate)
    Res[i, "ORL"]=(Testframe$conf.int[1])
    Res[i, "ORU"]=(Testframe$conf.int[2])
  }
  Res$Padj = p.adjust(Res$Pval, method = "bonferroni")
  Resultsall[[j]] = Res
  
}
par(mfrow = c(2,1))
for (i in names(Resultsall)){ 
  multiORplot(datatoplot = Resultsall[[i]], pheno=i)
}

par(mfrow = c(1,1))
pdf(paste0(Home, "/output/BrainMod_Enrichemnt.pdf"))
for (i in names(Resultsall)){
  multiORplot(datatoplot = Resultsall[[i]], pheno=i)
}
dev.off()

png 
  2

Modsig = c()

for(r in names(Resultsall)){
  a=rownames(Resultsall[[r]])[Resultsall[[r]]$Padj<=0.05]
  Modsig = c(Modsig,a)
}

# show brains and expression
Modsig2=unique(Modsig[Modsig!="M_all"])

load(paste0(Home,"/data/Kang_DataPreprocessing.RData")) #Load the Kang expression data of all genes 
datExprPlot=matriz #Expression data of Kang loaded as Rdata object DataPreprocessing.RData


Genes = GL_all[names(GL_all)!="M_all"]

Genes_expression<-list()

pcatest<-list()
for (i in names(Genes)){
  Genes_expression[[i]]<-matriz[,which(colnames(matriz) %in% Genes[[i]])]
  pcatest[[i]]=prcomp(t(as.matrix(Genes_expression[[i]])),retx=TRUE)
}

# PCA test
PCA<-data.frame(pcatest[[1]]$rotation)
PCA$donor_name<-rownames(PCA)
PC1<-data.frame(PCA[,c(1,ncol(PCA))])

#Combining the age with expression data
list <- strsplit(sampleInfo$age, " ")
library("plyr")

------------------------------------------------------------------------------

You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)

------------------------------------------------------------------------------


Attache Paket: 'plyr'

The following object is masked from 'package:matrixStats':

    count

The following object is masked from 'package:IRanges':

    desc

The following object is masked from 'package:S4Vectors':

    rename

The following objects are masked from 'package:dplyr':

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following object is masked from 'package:purrr':

    compact

df <- ldply(list)
colnames(df) <- c("Age", "time")

sampleInfo<-cbind(sampleInfo[,1:9],df)
sampleInfo$Age<-as.numeric(sampleInfo$Age)

sampleInfo$period<-ifelse(sampleInfo$time=="pcw",sampleInfo$Age*7,ifelse(sampleInfo$time=="yrs",sampleInfo$Age*365+270,ifelse(sampleInfo$time=="mos",sampleInfo$Age*30+270,NA)))

#We need it just for the donor names

PCA_matrix<-merge.with.order(PC1,sampleInfo,by.y="SampleID",by.x="donor_name",keep_order=1)

#Select which have phenotype info present 
matriz2<-matriz[which(rownames(matriz) %in% PCA_matrix$donor_name),]
FactorGenes_expression<-list()
#Factors here mean modules
for (i in names(Genes)){
  FactorGenes_expression[[i]]<-matriz2[,which(colnames(matriz2) %in% Genes[[i]])]
}


FactorseGE<-list()
for (i in names(Genes)){
  FactorseGE[[i]]<-FactorGenes_expression[[i]]
}

allModgenes=NULL
colors=vector()
for ( i in names(Genes)){
  allModgenes=cbind(allModgenes,FactorseGE[[i]])
  colors=c(colors, rep(i, ncol(FactorseGE[[i]])))
}


lengths=unlist(lapply(FactorGenes_expression, ncol), use.names = F)

MEorig=moduleEigengenes(allModgenes, colors)

PCA_matrixfreeze=PCA_matrix

index=!PCA_matrix$structure_acronym %in% c("URL", "DTH", "CGE","LGE", "MGE",  "Ocx", "PCx", "M1C-S1C","DIE", "TCx", "CB")
PCA_matrix=PCA_matrix[index,]
ME = MEorig$eigengenes[index,]
matsel = matriz2[index,]

colnames(ME) = gsub("ME", "", colnames(ME))

timepoints=seq(56,15000, length.out=1000)
matrix(c("CB", "THA", "CBC", "MD"), ncol=2 ) -> cnm


brainheatmap=function(Module){
  MEmod=ME[,Module]
  toplot=data.frame(matrix(NA, nrow=length(table(PCA_matrix$structure_acronym)), ncol=998))
  rownames(toplot)=unique(PCA_matrix$structure_acronym)
  target <- c("OFC", "DFC", "VFC", "MFC","M1C","S1C","IPC","A1C","STC","ITC","V1C","HIP","AMY","STR","MD","CBC")
  toplot<-toplot[c(6,2,8,5,11,12,10,9,7,4,14,3,1,13,16,15),]
  
  
  for ( i in unique(PCA_matrix$structure_acronym)){
    index=PCA_matrix$structure_acronym==i
    LOESS=loess(MEmod[index]~PCA_matrix$period[index])
    toplot[i,]=predict(LOESS,newdata = round(exp(seq(log(56),log(15000), length.out=998)),2))
    colnames(toplot)[c(1,77,282,392,640,803,996)]<-
      c("1pcw","21pcw","Birth","1.3years","5.4years","13.6years","40.7years")
  }
  
  
  
  cols=viridis(100)
  labvec <- c(rep(NA, 1000))
  
  labvec[c(1,77,282,392,640,803,996)] <- c("1pcw","21pcw","Birth","1.3years","5.4years","13.6years","40.7years")
  
  
  toplot<-toplot[,1:998]
  date<-c(1:998)
  dateY<-paste0(round(date/365,2),"_Years")
  
  names(toplot)<-dateY
  
  par(xpd=FALSE) 
  heatmap.2(as.matrix(toplot), col = cols, 
            main=Module,
            trace = "none", 
            na.color = "grey",
            Colv = F, Rowv = F,
            labCol = labvec,
            #breaks = seq(-0.1,0.1, length.out=101),
            symkey = T,
            scale = "row",
            key.title = "",
            dendrogram = "none",
            key.xlab = "eigengene",
            density.info = "none",
            #main=paste("Module",1),
            srtCol=90,
            tracecol = "none", 
            cexRow = 1,
            add.expr=eval.parent(abline(v=282),
                                 axis(1,at=c(1,77,282,392,640,803,996),
                                      labels =FALSE)),cexCol = 1)
  
 
}

brainheatmap_gene=function(Genename){
  MEmod=matsel[,Genename]
  toplot=data.frame(matrix(NA, nrow=length(table(PCA_matrix$structure_acronym)), ncol=998))
  rownames(toplot)=unique(PCA_matrix$structure_acronym)
  target <- c("OFC", "DFC", "VFC", "MFC","M1C","S1C","IPC","A1C","STC","ITC","V1C","HIP","AMY","STR","MD","CBC")
  toplot<-toplot[c(6,2,8,5,11,12,10,9,7,4,14,3,1,13,16,15),]


  for ( i in unique(PCA_matrix$structure_acronym)){
    index=PCA_matrix$structure_acronym==i
    LOESS=loess(MEmod[index]~PCA_matrix$period[index])
    toplot[i,]=predict(LOESS,newdata = round(exp(seq(log(56),log(15000), length.out=998)),2))
    colnames(toplot)[c(1,77,282,392,640,803,996)]<-
      c("1pcw","21pcw","Birth","1.3years","5.4years","13.6years","40.7years")
  }



  cols=viridis(100)
  labvec <- c(rep(NA, 1000))

  labvec[c(1,77,282,392,640,803,996)] <- c("1pcw","21pcw","Birth","1.3years","5.4years","13.6years","40.7years")


  toplot<-toplot[,1:998]
  date<-c(1:998)
  dateY<-paste0(round(date/365,2),"_Years")

  names(toplot)<-dateY

  par(xpd=FALSE)
  heatmap.2(as.matrix(toplot), col = cols,
            main=Genename,
            trace = "none",
            na.color = "grey",
            Colv = F, Rowv = F,
            labCol = labvec,
            #breaks = seq(-0.1,0.1, length.out=101),
            symkey = F,
            scale = "none",
            key.title = "",
            dendrogram = "none",
            key.xlab = "eigengene",
            density.info = "none",
            #main=paste("Module",1),
            #srtCol=90,
            tracecol = "none",
            cexRow = 1,
            add.expr=eval.parent(abline(v=282),
                                 axis(1,at=c(1,77,282,392,640,803,996),
                                      labels =FALSE))
            ,cexCol = 1)
}


brainheatmap_gene("SLITRK5")

for(Module in Modsig2){
  brainheatmap(Module)
}

pdf(paste0(Home, "/output/Brain_Module_Heatmap.pdf"))

brainheatmap_gene("SLITRK5")
for(Module in Modsig2){
  brainheatmap(Module)
}
dev.off()

png 
  2

SEM

dropfact=c("site")


Patdata=as.data.frame(colData(dds_filt))

load(paste0(Home, "/output/envFact.RData"))
load(paste0(Home, "/output/modelFact.RData"))

envFact=envFact[!envFact %in% dropfact] 
modelFact=modelFact[!modelFact %in% dropfact] 

EpiMarker = c()

# TopHit
Patdata$Epi_TopHit=log2_cpm[base::which.min(results_Deseq$pvalue),]

# 1PC of all diff met
tmp=glmpca(log2_cpm[base::which(results_Deseq$pvalue<=thresholdp),], 1)

Patdata$Epi_all= tmp$factors$dim1
  
EpiMarker = c(EpiMarker, "Epi_TopHit", "Epi_all")

#Brain Modules

Epitestset=GL_all[Modsig]

for(n in names(Epitestset)){
  index=gettaglistforgenelist(genelist = Epitestset[[n]], dds_filt)
  index = base::intersect(index, base::which(results_Deseq$pvalue<=thresholdp))
  # get eigenvalue
  epiname=paste0("Epi_",n)
  tmp=glmpca(log2_cpm[index,], 1)
  Patdata[,epiname]= tmp$factors$dim1
  EpiMarker = c(EpiMarker, epiname)
}

cormat = cor(apply(Patdata[,c("group", envFact, modelFact, EpiMarker)] %>% mutate_all(as.numeric), 2, minmax_scaling),
             use = "pairwise.complete.obs")

par(mfrow=c(1,2))
corrplot(cormat, main="correlations")
corrplot(cormat, order = "hclust", main="scaled correlations")

fullmodEnv=paste(envFact,modelFact, sep = "+", collapse = "+")

Dataset = Patdata[,c("group", envFact,modelFact,EpiMarker)]

model = paste0("group ~ Epi+", fullmodEnv,"
Epi ~" , fullmodEnv,"
Epi~~Epi
group~~group
")

Netlist = list()

for (marker in EpiMarker) {
  Dataset$Epi = Dataset[,marker]
  Datasetscaled = Dataset %>% mutate_if(is.numeric, minmax_scaling)
  Datasetscaled = Datasetscaled %>% mutate_if(is.factor, ~ as.numeric(.)-1)
  
  fit<-lavaan(model,data=Datasetscaled)
  
  sink(paste0(Home,"/output/SEM_summary_group",marker,".txt"))
  summary(fit)
  print(fitMeasures(fit))
  print(parameterEstimates(fit))
  sink()
  
  #SOURCE FOR PLOT https://stackoverflow.com/questions/51270032/how-can-i-display-only-significant-path-lines-on-a-path-diagram-r-lavaan-sem
  restab=lavaan::standardizedSolution(fit) %>% dplyr::filter(!is.na(pvalue)) %>% 
    arrange(desc(pvalue)) %>% mutate_if("is.numeric","round",3) %>% 
    dplyr::select(-ci.lower,-ci.upper,-z)
  
  pvalue_cutoff <- 0.05
  
  obj <- semPlot:::semPlotModel(fit)
  original_Pars <- obj@Pars
  check_Pars <- obj@Pars %>% dplyr:::filter(!(edge %in% c("int","<->") | lhs == rhs)) # this is the list of parameter to sift thru
  keep_Pars <- obj@Pars %>% dplyr:::filter(edge %in% c("int","<->") | lhs == rhs) # this is the list of parameter to keep asis
  
  test_against <- lavaan::standardizedSolution(fit) %>% dplyr::filter(pvalue < pvalue_cutoff, rhs != lhs)
  
  # for some reason, the rhs and lhs are reversed in the standardizedSolution() output, for some of the values
  # I'll have to reverse it myself, and test against both orders
  
  test_against_rev <- test_against %>% dplyr::rename(rhs2 = lhs, lhs = rhs) %>% dplyr::rename(rhs = rhs2)
  
  checked_Pars <-
    check_Pars %>% semi_join(test_against, by = c("lhs", "rhs")) %>% bind_rows(
      check_Pars %>% semi_join(test_against_rev, by = c("lhs", "rhs"))
    )
  
  obj@Pars <- keep_Pars %>% bind_rows(checked_Pars) %>% 
    mutate_if("is.numeric","round",3) %>% 
    mutate_at(c("lhs","rhs"),~gsub("Epi", marker,.))
  
  obj@Vars = obj@Vars %>% mutate_at(c("name"),~gsub("Epi", marker,.))
  
  DF = obj@Pars
  DF = DF[DF$lhs!=DF$rhs,]
  DF = DF[abs(DF$std)>0.1,]
  
  DF = DF[DF$edge == "~>",] # only include directly modelled effects in figure 
  
  nodes <- data.frame(id=obj@Vars$name, label = obj@Vars$name)
  nodes$color<-Dark8[8]
  nodes$color[nodes$label == "group"] = Dark8[3]
  nodes$color[nodes$label == marker] = Dark8[4]
  nodes$color[nodes$label %in% envFact] = Dark8[5]
  
  edges <- data.frame(from = DF$lhs, 
                      to = DF$rhs, 
                      width=abs(DF$std), 
                      arrows ="to")
  
  edges$dashes = F
  edges$label =  DF$std
  edges$color=c("firebrick", "forestgreen")[1:2][factor(sign(DF$std), levels=c(-1,0,1),labels=c(1,2,2))]
  edges$width=2
  cexlab = 18
  plotnet<- visNetwork(nodes, edges,  
                       main=list(text=marker,
                                 style="font-family:arial;font-size:20px;text-align:center"),
                       submain=list(text="significant paths", 
                                    style="font-family:arial;text-align:center")) %>% 
    visEdges(arrows =list(to = list(enabled = TRUE, scaleFactor = 0.7)),
                       font=list(size=cexlab, style="font-family:arial;text-align:center")) %>%
    visNodes(font=list(size=cexlab, style="font-family:arial;text-align:center")) %>%
    
    visPhysics(enabled = T, solver = "forceAtlas2Based")
  
  
  Netlist[[marker]] = plotnet
  htmlfile = paste0(Home,"/output/SEMplot_",marker,".html")
  visSave(plotnet, htmlfile)
  webshot(paste0(Home,"/output/SEMplot_",marker,".html"), zoom = 1, 
          file = paste0(Home,"/output/SEMplot_",marker,".png"))
}

rmd_paths <-paste0(tempfile(c(names(Netlist))),".Rmd")
names(rmd_paths) <- names(Netlist)

for (n in names(rmd_paths)) {
    sink(file = rmd_paths[n])
    cat("  \n",
        "```{r, echo = FALSE}",
            "Netlist[[n]]",
        "```",
        sep = "  \n")
    sink()
}

Interactive results SEM analysis

only direct effects with a significant standardized effect of p<0.05 are shown.

    for (n in names(rmd_paths)) {
        cat(knitr::knit_child(rmd_paths[[n]],
                              quiet= TRUE))
        file.remove(rmd_paths[[n]])
    }

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] plyr_1.8.6                  scales_1.1.1               
 [3] RCircos_1.2.1               compareGroups_4.4.6        
 [5] webshot_0.5.2               visNetwork_2.0.9           
 [7] org.Hs.eg.db_3.12.0         AnnotationDbi_1.52.0       
 [9] xlsx_0.6.5                  gprofiler2_0.2.0           
[11] BiocParallel_1.24.1         kableExtra_1.3.1           
[13] glmpca_0.2.0                knitr_1.30                 
[15] DESeq2_1.30.0               SummarizedExperiment_1.20.0
[17] Biobase_2.50.0              MatrixGenerics_1.2.0       
[19] matrixStats_0.57.0          GenomicRanges_1.42.0       
[21] GenomeInfoDb_1.26.2         IRanges_2.24.1             
[23] S4Vectors_0.28.1            BiocGenerics_0.36.0        
[25] forcats_0.5.0               stringr_1.4.0              
[27] dplyr_1.0.2                 purrr_0.3.4                
[29] readr_1.4.0                 tidyr_1.1.2                
[31] tibble_3.0.4                tidyverse_1.3.0            
[33] semPlot_1.1.2               lavaan_0.6-7               
[35] viridis_0.5.1               viridisLite_0.3.0          
[37] WGCNA_1.69                  fastcluster_1.1.25         
[39] dynamicTreeCut_1.63-1       ggplot2_3.3.3              
[41] gplots_3.1.1                corrplot_0.84              
[43] RColorBrewer_1.1-2          workflowr_1.6.2            

loaded via a namespace (and not attached):
  [1] coda_0.19-4            bit64_4.0.5            DelayedArray_0.16.0   
  [4] data.table_1.13.6      rpart_4.1-15           RCurl_1.98-1.2        
  [7] doParallel_1.0.16      generics_0.1.0         preprocessCore_1.52.1 
 [10] callr_3.5.1            RSQLite_2.2.2          mice_3.12.0           
 [13] chron_2.3-56           bit_4.0.4              xml2_1.3.2            
 [16] lubridate_1.7.9.2      httpuv_1.5.5           assertthat_0.2.1      
 [19] d3Network_0.5.2.1      xfun_0.20              hms_1.0.0             
 [22] rJava_0.9-13           evaluate_0.14          promises_1.1.1        
 [25] fansi_0.4.1            caTools_1.18.1         dbplyr_2.0.0          
 [28] readxl_1.3.1           igraph_1.2.6           DBI_1.1.1             
 [31] geneplotter_1.68.0     tmvnsim_1.0-2          Rsolnp_1.16           
 [34] htmlwidgets_1.5.3      ellipsis_0.3.1         crosstalk_1.1.1       
 [37] backports_1.2.0        pbivnorm_0.6.0         annotate_1.68.0       
 [40] vctrs_0.3.6            abind_1.4-5            cachem_1.0.1          
 [43] withr_2.4.1            HardyWeinberg_1.7.1    checkmate_2.0.0       
 [46] fdrtool_1.2.16         mnormt_2.0.2           cluster_2.1.0         
 [49] mi_1.0                 lazyeval_0.2.2         crayon_1.3.4          
 [52] genefilter_1.72.0      pkgconfig_2.0.3        nlme_3.1-151          
 [55] nnet_7.3-15            rlang_0.4.10           lifecycle_0.2.0       
 [58] kutils_1.70            modelr_0.1.8           cellranger_1.1.0      
 [61] rprojroot_2.0.2        flextable_0.6.2        Matrix_1.2-18         
 [64] regsem_1.6.2           carData_3.0-4          boot_1.3-26           
 [67] reprex_1.0.0           base64enc_0.1-3        processx_3.4.5        
 [70] whisker_0.4            png_0.1-7              rjson_0.2.20          
 [73] bitops_1.0-6           KernSmooth_2.23-18     blob_1.2.1            
 [76] arm_1.11-2             jpeg_0.1-8.1           rockchalk_1.8.144     
 [79] memoise_2.0.0          magrittr_2.0.1         zlibbioc_1.36.0       
 [82] compiler_4.0.3         lme4_1.1-26            cli_2.2.0             
 [85] XVector_0.30.0         pbapply_1.4-3          ps_1.5.0              
 [88] htmlTable_2.1.0        Formula_1.2-4          MASS_7.3-53           
 [91] tidyselect_1.1.0       stringi_1.5.3          lisrelToR_0.1.4       
 [94] sem_3.1-11             yaml_2.2.1             OpenMx_2.18.1         
 [97] locfit_1.5-9.4         latticeExtra_0.6-29    grid_4.0.3            
[100] tools_4.0.3            matrixcalc_1.0-3       rstudioapi_0.13       
[103] uuid_0.1-4             foreach_1.5.1          foreign_0.8-81        
[106] git2r_0.28.0           gridExtra_2.3          farver_2.0.3          
[109] BDgraph_2.63           digest_0.6.27          shiny_1.6.0           
[112] Rcpp_1.0.5             broom_0.7.3            later_1.1.0.1         
[115] writexl_1.3.1          httr_1.4.2             gdtools_0.2.3         
[118] psych_2.0.12           colorspace_2.0-0       rvest_0.3.6           
[121] XML_3.99-0.5           fs_1.5.0               truncnorm_1.0-8       
[124] splines_4.0.3          statmod_1.4.35         xlsxjars_0.6.1        
[127] plotly_4.9.3           systemfonts_0.3.2      xtable_1.8-4          
[130] jsonlite_1.7.2         nloptr_1.2.2.2         corpcor_1.6.9         
[133] glasso_1.11            R6_2.5.0               Hmisc_4.4-2           
[136] mime_0.9               pillar_1.4.7           htmltools_0.5.1.1     
[139] glue_1.4.2             fastmap_1.1.0          minqa_1.2.4           
[142] codetools_0.2-18       lattice_0.20-41        huge_1.3.4.1          
[145] gtools_3.8.2           officer_0.3.16         zip_2.1.1             
[148] GO.db_3.12.1           openxlsx_4.2.3         survival_3.2-7        
[151] rmarkdown_2.6          qgraph_1.6.5           munsell_0.5.0         
[154] GenomeInfoDbData_1.2.4 iterators_1.0.13       impute_1.64.0         
[157] haven_2.3.1            reshape2_1.4.4         gtable_0.3.0

The Methylome in Female Adolescent Conduct Disorder: Neural Pathomechanisms and Environmental Risk Factors

Post-hoc Analyses

AG Chiocchetti

29 Dezember 2020

Genomic location enrichment

GO-term Enrichment

Kang Brain Module Enrichments

SEM

Interactive results SEM analysis