Last updated: 2021-02-01

Checks: 6 1

Knit directory: myproject/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200813)

The command set.seed(20200813) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 34bf8fb

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 34bf8fb. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/11.24.2020.2.png
    Ignored:    analysis/11.24.2020.3.png
    Ignored:    analysis/11.24.2020.4.png
    Ignored:    analysis/11.24.2020.5.png
    Ignored:    analysis/11.24.2020.6.png
    Ignored:    analysis/11.24.2020.7.png
    Ignored:    analysis/11.24.2020.8.png
    Ignored:    analysis/11.24.2020.9.png
    Ignored:    genotype/

Untracked files:
    Untracked:  ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
    Untracked:  ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
    Untracked:  analysis/1.21.WIP.Rmd
    Untracked:  analysis/12.WIP.Rmd
    Untracked:  getfastqtest2.csv

Unstaged changes:
    Modified:   analysis/12.20.WIP.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/12.20.WIP.Rmd) and HTML (docs/12.20.WIP.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	34bf8fb	mariesaitou	2020-12-26	Build site.
html	7f1d2f2	mariesaitou	2020-12-26	Build site.
html	0abb133	mariesaitou	2020-12-26	Build site.
html	9eb5658	mariesaitou	2020-12-25	Build site.
Rmd	d211d79	mariesaitou	2020-12-25	Add my first analysis

Make LD matrix and eQTL zscore material from the genotype files and fastQTL results

cd /project2/xuanyao/marie/E-GEUV-1/LDSC/PESCA
module load plink
csvfile=/project2/xuanyao/marie/E-GEUV-1/finemap/DEG.susie.1025.BH.csv
for line in `cat ${csvfile} | grep -v ^#`
do
  gene=`echo ${line} |cut -d ',' -f 1`
  chr=`echo ${line} |cut -d ',' -f 9`

  grep '${gene}' /project2/xuanyao/marie/E-GEUV-1/FastQTL/GEUV/EUR.chr${chr}.nominals.all.txt >> /project2/xuanyao/marie/E-GEUV-1/FastQTL/GEUV/PESCA/${gene}.zscore.EUR.txt
  grep '${gene}' /project2/xuanyao/marie/E-GEUV-1/FastQTL/GEUV/EUR.chr${chr}.nominals.all.txt >> /project2/xuanyao/marie/E-GEUV-1/FastQTL/GEUV/PESCA/${gene}.zscore.EUR.txt
  plink --vcf /project2/xuanyao/marie/E-GEUV-1/finemap/EUR/${gene}.EUR.genotype.recode.vcf --r --matrix --out ${gene}.EUR.LD.matrix
  plink --vcf /project2/xuanyao/marie/E-GEUV-1/finemap/YRI/${gene}.Yoruba.genotype.recode.vcf --r --matrix --out ${gene}.Yoruba.LD.matrix
done


csvfile=/project2/xuanyao/marie/E-GEUV-1/finemap/DEG.susie.1025.BH.csv
for line in `cat ${csvfile} | grep -v ^#`
do
  gene=`echo ${line} |cut -d ',' -f 1`
  chr=`echo ${line} |cut -d ',' -f 9`
  grep "${gene}" /project2/xuanyao/marie/E-GEUV-1/FastQTL/GEUV/Yoruba.chr${chr}.nominals.all.txt >> ${gene}.zscore.Yoruba.txt
  grep "${gene}" /project2/xuanyao/marie/E-GEUV-1/FastQTL/GEUV/EUR.chr${chr}.nominals.all.txt >> ${gene}.zscore.EUR.txt
done

Paintor and MsCaviar input file preparation (LD, Z-score, and file index)

# make input dataset of the interested genes
library(dplyr)

genelist=read.csv("/project2/xuanyao/marie/E-GEUV-1/PESCA/input.1025.BH.csv", stringsAsFactors = F)
file.geno.EUR <-paste("/project2/xuanyao/marie/E-GEUV-1/finemap/EUR/",genelist[,"gene"], ".EUR.genotype.recode.vcf", sep="")
file.geno.Yoruba<-paste("/project2/xuanyao/marie/E-GEUV-1/finemap/YRI/",genelist[,"gene"], ".Yoruba.genotype.recode.vcf", sep="")
file.LD.EUR <-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PESCA/",genelist[,"gene"], ".EUR.LD.matrix.ld", sep="")
file.LD.Yoruba<-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PESCA/",genelist[,"gene"], ".Yoruba.LD.matrix.ld", sep="")
file.p.EUR <-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PESCA/",genelist[,"gene"], ".zscore.EUR.txt", sep="")
file.p.Yoruba<-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PESCA/",genelist[,"gene"], ".zscore.Yoruba.txt", sep="")

################### make all the files automatically!
# put geno/LD/zscore in each box
EURgeno <- lapply(file.geno.EUR, FUN=read.table, header = FALSE, sep = "\t", skip = 252, stringsAsFactors = F)
EURLD<-lapply(file.LD.EUR,FUN=read.table,stringsAsFactors = F, header = FALSE, sep = " ")

YRIgeno <- lapply(file.geno.Yoruba, FUN=read.table, header = FALSE, sep = "\t", skip = 252, stringsAsFactors = F)
YRILD<-lapply(file.LD.Yoruba,FUN=read.table,stringsAsFactors = F, header = FALSE, sep = " ")


for(i in 1:length(EURgeno)){
  EURgeno[[i]][["V6"]]<- make.names(EURgeno[[i]][["V3"]], unique=TRUE)}
for(i in 1:length(YRIgeno)){
  YRIgeno[[i]][["V6"]]<- make.names(YRIgeno[[i]][["V3"]], unique=TRUE)}


### name the col and row of the LD matrix from the vcf annotation
for(i in 1:length(EURLD)){
  rownames(EURLD[[i]]) <- make.names(EURgeno[[i]][["V3"]], unique=TRUE)
  colnames(EURLD[[i]]) <- make.names(EURgeno[[i]][["V3"]], unique=TRUE)}

for(i in 1:length(YRILD)){
  rownames(YRILD[[i]]) <- make.names(EURgeno[[i]][["V3"]], unique=TRUE)
  colnames(YRILD[[i]]) <- make.names(EURgeno[[i]][["V3"]], unique=TRUE)}

#make.names(nams, unique=TRUE)
EURp<-lapply(file.p.EUR,FUN=read.table,stringsAsFactors = F, header = FALSE, sep = " ")
YRIp<-lapply(file.p.Yoruba,FUN=read.table,stringsAsFactors = F, header = FALSE, sep = " ")

for(i in 1:length(EURp)){
  EURp[[i]][["V6"]]<- make.names(EURz[[i]][["V2"]], unique=TRUE) }
for(i in 1:length(YRIp)){
  YRIp[[i]][["V6"]]<- make.names(YRIz[[i]][["V2"]], unique=TRUE)}


#(1) Extract the shared SNPs from EURLD and YRILD 
#(2) Extract the shared SNPs of (1) from EURz and YRIz 
#(3) make a new matrix by pasting EURgeno$V1, EURgeno$V2, EURgeno$V3, EURz$V5, YRIz$V5
#(4) make two new matrices by pasting EURgeno$V3, EURgeno$V2,  (EURgeno$V4 and  EURgeno$V5) , EURz$V5, 



remrm <- function(mat){
  dat <- mat[!apply(is.na(mat), 1, all),]
  dat[ ,!apply(is.na(dat), 2, all) ]
}

#(1) LD file
EURLD2 <- lapply(EURLD, remrm)
YRILD2 <- lapply(YRILD, remrm)

shared <- function(mat1, mat2){
   inter <- intersect(names(mat1), names(mat2))
   mat1[inter,inter]
}

#### LD matrix with only shared SNPs in 2 pops
EURLD3<- mapply(FUN=shared, EURLD2, YRILD2)
YRILD3<- mapply(FUN=shared, YRILD2, EURLD2)


#(2) shared gene extraction
subs <- function(mat1, mat2){
  mat1[mat1[["V6"]] %in% names(mat2),]
}

EURp2 <- mapply(FUN=subs, EURp, EURLD3,SIMPLIFY=FALSE)
YRIp2 <- mapply(FUN=subs, YRIp, YRILD3,SIMPLIFY=FALSE)

## add zscore to the p-value table 
for(i in 1:length(EURp2)){
  EURp2[[i]][["V7"]]<-ifelse(sign(EURp2[[i]][["V5"]]) >= 0, -qnorm(EURp2[[i]][["V4"]]/2), qnorm(EURp2[[i]][["V4"]]/2)) }

for(i in 1:length(YRIp2)){
  YRIp2[[i]][["V7"]]<-ifelse(sign(YRIp2[[i]][["V5"]]) >= 0, -qnorm(YRIp2[[i]][["V4"]]/2), qnorm(YRIp2[[i]][["V4"]]/2)) }


subs2 <- function(mat1, mat2){
  mat1[mat1[["V6"]] %in% names(mat2),]
}
#(3) PAINTOR z-score with chrmosome coodinate
EURgeno2<-mapply(FUN=subs2, EURgeno, EURLD3,SIMPLIFY=FALSE)
YRIgeno2<-mapply(FUN=subs2, YRIgeno, EURLD3,SIMPLIFY=FALSE)

### combine everything 
#  Paintor input file (content)

combine <- function(mat1, mat2,mat3){
  temp1<-data.frame(mat1$V1, mat1$V2, mat1$V6, mat2$V7, mat3$V7)
  names(temp1) <- c("CHR","POS","RSID","ZSCORE.EUR","ZSCORE.YRI")
  temp1
}

combined<-mapply(FUN=combine, EURgeno2, EURp2,YRIp2,SIMPLIFY=FALSE)

# paintor annotation file
anno <- function(mat1){
  anno<-data.frame(rep(1,length(mat1$V1)))
  names(anno) <- c("dummy")
  anno
}
anno_dummy<-mapply(FUN=anno, EURgeno2,SIMPLIFY=FALSE)

# paintor LD file
EURLD<-lapply(file.LD.EUR,FUN=read.table,stringsAsFactors = F, header = FALSE, sep = " ")

# set file names
filename.z<-paste(genelist[,"gene"], "Paintor", sep=".")
filename.EUR.LD<-paste(genelist[,"gene"],"Paintor.EUR.LD", sep=".")
filename.YRI.LD<-paste(genelist[,"gene"],"Paintor.YRI.LD", sep=".")
filename.anno<-paste(genelist[,"gene"],"Paintor.annotations", sep=".")
filename.input<-paste("input.files.",genelist[,"gene"],".txt", sep=".")
input.content<-paste(genelist[,"gene"],"Paintor", sep=".")



## output PAINTOR
mapply(FUN=write.table,combined, filename.z,SIMPLIFY=FALSE,sep = " ",quote=FALSE,row.names=FALSE)
mapply(FUN=write.table,EURLD3, filename.EUR.LD,SIMPLIFY=FALSE,sep = " ",quote=FALSE,row.names = FALSE,col.names = FALSE)
mapply(FUN=write.table,YRILD3, filename.YRI.LD,SIMPLIFY=FALSE,sep = " ",quote=FALSE,row.names = FALSE,col.names = FALSE)
mapply(FUN=write.table,anno_dummy, filename.anno,SIMPLIFY=FALSE,sep = " ",quote=FALSE,row.names = FALSE)
mapply(FUN=write.table,input.content, filename.input,SIMPLIFY=FALSE,sep = " ",quote=FALSE,row.names = FALSE,col.names = FALSE)


#(5) Zscore, only three rows (SNO, EUR, YRI)
#(6) input file index of MsCaviar, ldfiles.txt -z zfiles.txt

Reformatt MsCaviar from Paintor files

### 12.23
## Output $gene.EUR.zscores,  $gene.YRI.zscores, $gene.zscores.txt, $gene.LD.txt

#setwd("/Users/saitoumarie/Dropbox/Chicago/RCC/eQTL.practice/LDSC/PESCA")
setwd("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files")


# cd /project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files
library(dplyr)
# input gene names
genelist=read.csv("/Users/saitoumarie/Dropbox/Chicago/RCC/eQTL.practice/LDSC/PESCA/finemapped.location.csv", stringsAsFactors = F)
#genelist=read.csv("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/both.and.diff.genes.csv", stringsAsFactors = F)

# define file names for each gene
file<-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/",genelist[,"gene"], ".Paintor", sep="")
# read each input tiles
input <- lapply(file, FUN=read.table, header = FALSE, sep = " ", stringsAsFactors = F)


# (1) output files
for(i in 1:length(input)){
  variant <- apply(input[[i]][-1,1:3], 1, paste, collapse=".")
  zscore <- sapply(input[[i]][-1,4], toString, USE.NAMES=F)
  output <- paste(variant, zscore, sep="\t")
  write.table(output, paste(genelist[i,"gene"], "EUR.zscores", sep="."), quote=FALSE, row.names=FALSE, col.names=FALSE)
}


for(i in 1:length(input)){
  variant <- apply(input[[i]][-1,1:3], 1, paste, collapse=".")
  zscore <- sapply(input[[i]][-1,5], toString, USE.NAMES=F)
  output <- paste(variant, zscore, sep="\t")
  write.table(output, paste(genelist[i,"gene"], "YRI.zscores", sep="."), quote=FALSE, row.names=FALSE, col.names=FALSE)
}

### 



#(2)  # file content
library(dplyr)
# input gene names
genelist=read.csv("/Users/saitoumarie/Dropbox/Chicago/RCC/eQTL.practice/LDSC/PESCA/finemapped.location.csv", stringsAsFactors = F)
#genelist=read.csv("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/both.and.diff.genes.csv", stringsAsFactors = F)

# define file names for each gene
file<-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/",genelist[,"gene"], ".Paintor", sep="")
# read each input tiles
input <- lapply(file, FUN=read.table, header = FALSE, sep = " ", stringsAsFactors = F)


for(i in 1:length(input)){
  output <- c(paste(genelist[i,"gene"], "EUR_pruned.zscores", sep="."),
              paste(genelist[i,"gene"], "YRI_pruned.zscores", sep="."))
  write.table(output, paste(genelist[i,"gene"], "pruned.zscores.txt", sep="."), quote=FALSE, row.names=FALSE, col.names=FALSE)
}


for(i in 1:length(input)){
  output <- c(paste(genelist[i,"gene"], "Paintor.EUR_pruned.LD", sep="."),
              paste(genelist[i,"gene"], "Paintor.EUR_pruned.LD", sep="."))
  write.table(output, paste(genelist[i,"gene"], "pruned.LD.txt", sep="."), quote=FALSE, row.names=FALSE, col.names=FALSE)
}


#####  LD pruning  (perfect LD)

# input gene names
#genelist=read.csv("/Users/saitoumarie/Dropbox/Chicago/RCC/eQTL.practice/LDSC/PESCA/finemapped.location.csv", stringsAsFactors = F)
genelist=read.csv("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/both.and.diff.genes.csv", stringsAsFactors = F)

# define file names for each gene
file<-paste("/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/",genelist[,"gene"], ".Paintor", sep="")


####  LD pruning (duplicated elements)

for (i in 1:length(genelist[,"gene"])){

  # EUR
  ld <- read.table(paste("./", genelist[i,"gene"], "Paintor.EUR.LD", sep=""))
  ld2 <- ld[!duplicated(ld),!duplicated(ld)] # no duplicate
  zsco <- read.table(paste("./", genelist[i,"gene"],".EUR.zscores", sep=""))
  zsco2 <- zsco[!duplicated(ld),]

  # YRI
  ld3 <- read.table(paste("./", genelist[i,"gene"], "Paintor.YRI.LD", sep=""))
  ld4 <- ld3[!duplicated(ld3),!duplicated(ld3)] # no duplicate
  zsco3 <- read.table(paste("./", genelist[i,"gene"],".YRI.zscores", sep=""))
  zsco4 <- zsco[!duplicated(ld3),]

  # get genes that are shared
  shared <- intersect(zsco2[,1],zsco4[,1])
  zsco5 <- zsco2[zsco2[,1] %in% shared,]
  ld5 <- ld2[zsco2[,1] %in% shared, zsco2[,1] %in% shared]
  zsco6 <- zsco4[zsco4[,1] %in% shared,]
  ld6 <- ld4[zsco4[,1] %in% shared, zsco4[,1] %in% shared]

  # output
  write.table(ld5, paste("./", genelist[i,"gene"], ".Paintor.EUR_pruned.LD", sep=""), quote=FALSE, row.names=FALSE, col.names=FALSE)
  write.table(zsco5, paste("./", genelist[i,"gene"],".EUR_pruned.zscores", sep=""), quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t")
  write.table(ld6, paste("./", genelist[i,"gene"], ".Paintor.YRI_pruned.LD", sep=""), quote=FALSE, row.names=FALSE, col.names=FALSE)
  write.table(zsco6, paste("./", genelist[i,"gene"],".YRI_pruned.zscores", sep=""), quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t")

}

MScaviar

# Slurm
#!/bin/bash
#SBATCH --job-name=MsCAVIAR_sbatch
#SBATCH --output=MsCAVIAR_sbatch.out
#SBATCH --error=MsCAVIAR_sbatch.err
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=20
#SBATCH --mem-per-cpu=2000

### 
csvfile=/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/both.and.diff.genes.csv
for line in `cat ${csvfile} | grep -v ^#`
do
  gene=`echo ${line} | cut -d ',' -f 1`
  /project2/xuanyao/marie/E-GEUV-1/MsCAVIAR/MsCAVIAR -l ${gene}.pruned.LD.txt -z ${gene}.pruned.zscores.txt -n 358,87 -o mscaviar/mscaviar_results_${gene}  -c 3 -f 1
done


##########

result # Questions and Comments.

It takes one hour to run it on one gene, so I am trying to make the pipeline more efficient.
I have several questions and am asking the developer about them:

In the example output, I found “Causal_Post._Prob.” = 2 in three SNPs. How can I interpret it? The “probability” looks larger than 2. What is “Causal_Post._Prob”? Are three SNPs in strong LD or polygenic and equally contributing to the trait?

Also, in my data, I set “causal SNP=3” but I got five SNPs.

Prob_in_pCausalSet Causal_Post._Prob. 0.0255851 0.151775 0.0877824 0.520739 0.0948189 0.56248 0.205362 1.21824 0.227096 1.34717 0.325682 1.932

Is that expected? I wonder how MSCaviar selects a credible set based on what threshold. I saw “with rho% probability, contains all causal SNPs. The default is 0.95” in the GitHub manual but could not find values around 0.95 in the output.

Heritability (GCTA) graph

setwd("/Users/saitoumarie/Dropbox/Chicago/RCC/eQTL.practice/LDSC")
X1=read.csv("/Users/saitoumarie/Dropbox/Chicago/RCC/eQTL.practice/LDSC/GCTA1224.csv",stringsAsFactors = F)
library(ggplot2)
library(ggpubr)

ggscatterhist (X1, x="V.G..Vp_tr1", y="V.G..Vp_tr2",color="susie.BH",alpha=0.7, shape="susie.BH",margin.params = list(color="susie.BH"))

result # Comments I am not allowed to browse and copy a file from /project2/gilad. Could you copy the Rscripts or the directory into somewhere in /project2/xuanyao? Alternatively, I will get permission of Yoav and will talk with IT team to give me permission to enter the directory.

GCTA cvblup

https://cnsgenomics.com/software/gcta/#BLUP

module load plink
export PATH="$PATH:/home/maries1/gcta_1.93.2beta"
# csvfile=/project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/both.and.diff.genes.csv

# REML analysis with the --reml-pred-rand option to output the BLUP solutions of the individuals (i.e. estimate of total genetic value of each individual)
gcta64 --reml --grm ENSG00000067113 --pheno phen/ENSG00000067113.phen --cvblup  --out cvBLUP/ENSG00000067113
 # To obtain cvBLUP solutions for the SNP effects
gcta64 --bfile ENSG00000067113 --blup-snp cvBLUP/ENSG00000067113.indi.cvblp --out cvBLUP/ENSG00000067113
gcta64 --bfile ENSG00000067113.EandY --blup-snp cvBLUP/ENSG00000067113.indi.cvblp --out cvBLUP/ENSG00000067113

# remove duplicated SNP (sometimes multiallelic ) https://www.biostars.org/p/267255/
cut -f 2 ENSG00000067113.EandY.bim | sort | uniq -d > 1.dups
plink --bfile ENSG00000067113.EandY  --exclude 1.dups --make-bed --out ENSG00000067113.EandY;

# To compute the polygenic risk score (PRS)
plink --bfile ENSG00000067113.EandY --pheno phen/ENSG00000067113.phen --score cvBLUP/ENSG00000067113.snp.blp 1 2 3 --out cvBLUP/ENSG00000067113EUR

# reads variant IDs from column 1, allele codes from column 2, and scores from column 3.
plink --bfile test --score test.snp.blp 1 2 3

result result result

Comments

I am not sure how to interpret the output files. Can we go over the columns together?
In the previous GCTA analysis (genetic correletion), I userd a merged file of two populations’ gene expression as two phenotypes. But in this time, there seem to be no results for two traits (it seems to be a result for one trait). Should I separate the input file into EUR and YRI?

MsCAVIAR on array

#!/bin/bash
#SBATCH --job-name=MsCAVIAR_array.sbatch
#SBATCH --output=arrayMsCAVIAR-%j-%a.out
#SBATCH --error=MsCAVIAR_sbatch.err
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --mem-per-cpu=2000

# 12.26.2020
# sbatch --array=1-222 MsCAVIAR.array.slurm
echo This is array job number $SLURM_ARRAY_TASK_ID

# Read line number $SLURM_ARRAY_TASK_ID from an input file:
gene=$(awk ' NR=='$SLURM_ARRAY_TASK_ID'' /project2/xuanyao/marie/E-GEUV-1/LDSC/PAINTOR_V3.0/files/both.and.diff.genes.csv | awk -F","  '{print $1}')
echo Line number $SLURM_ARRAY_TASK_ID contains the gene ${gene}.
/project2/xuanyao/marie/E-GEUV-1/MsCAVIAR/MsCAVIAR -l ${gene}.pruned.LD.txt -z ${gene}.pruned.zscores.txt -n 358,87 -o mscaviar/mscaviar_results_${gene}  -c 3 -f 1

# summarize results

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS  10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       rstudioapi_0.11  whisker_0.4      knitr_1.30      
 [5] magrittr_1.5     R6_2.4.1         rlang_0.4.8      stringr_1.4.0   
 [9] tools_4.0.2      xfun_0.18        git2r_0.27.1     htmltools_0.5.0 
[13] ellipsis_0.3.1   rprojroot_1.3-2  yaml_2.2.1       digest_0.6.27   
[17] tibble_3.0.4     lifecycle_0.2.0  crayon_1.3.4     later_1.1.0.1   
[21] vctrs_0.3.4      promises_1.1.1   fs_1.5.0         glue_1.4.2      
[25] evaluate_0.14    rmarkdown_2.5    stringi_1.5.3    compiler_4.0.2  
[29] pillar_1.4.6     backports_1.1.10 httpuv_1.5.4     pkgconfig_2.0.3

WP12.20

Marie Saitou

12/25/2020