Last updated: 2018-09-24
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(12345)
The command set.seed(12345)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: ed0a1eb
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: output/.DS_Store
Untracked files:
Untracked: analysis/ncbiRefSeq_sm.sort.mRNA.bed
Untracked: analysis/snake.config.notes.Rmd
Untracked: analysis/verifyBAM.Rmd
Untracked: data/18486.genecov.txt
Untracked: data/APApeaksYL.total.inbrain.bed
Untracked: data/NuclearApaQTLs.txt
Untracked: data/RNAkalisto/
Untracked: data/TotalApaQTLs.txt
Untracked: data/Totalpeaks_filtered_clean.bed
Untracked: data/YL-SP-18486-T-combined-genecov.txt
Untracked: data/YL-SP-18486-T_S9_R1_001-genecov.txt
Untracked: data/bedgraph_peaks/
Untracked: data/bin200.5.T.nuccov.bed
Untracked: data/bin200.Anuccov.bed
Untracked: data/bin200.nuccov.bed
Untracked: data/clean_peaks/
Untracked: data/comb_map_stats.csv
Untracked: data/comb_map_stats.xlsx
Untracked: data/combined_reads_mapped_three_prime_seq.csv
Untracked: data/gencov.test.csv
Untracked: data/gencov.test.txt
Untracked: data/gencov_zero.test.csv
Untracked: data/gencov_zero.test.txt
Untracked: data/gene_cov/
Untracked: data/joined
Untracked: data/leafcutter/
Untracked: data/merged_combined_YL-SP-threeprimeseq.bg
Untracked: data/nom_QTL/
Untracked: data/nom_QTL_opp/
Untracked: data/nuc6up/
Untracked: data/other_qtls/
Untracked: data/peakPerRefSeqGene/
Untracked: data/perm_QTL/
Untracked: data/perm_QTL_opp/
Untracked: data/reads_mapped_three_prime_seq.csv
Untracked: data/smash.cov.results.bed
Untracked: data/smash.cov.results.csv
Untracked: data/smash.cov.results.txt
Untracked: data/smash_testregion/
Untracked: data/ssFC200.cov.bed
Untracked: data/temp.file1
Untracked: data/temp.file2
Untracked: data/temp.gencov.test.txt
Untracked: data/temp.gencov_zero.test.txt
Untracked: output/picard/
Untracked: output/plots/
Untracked: output/qual.fig2.pdf
Unstaged changes:
Modified: analysis/28ind.peak.explore.Rmd
Modified: analysis/cleanupdtseq.internalpriming.Rmd
Modified: analysis/dif.iso.usage.leafcutter.Rmd
Modified: analysis/diff_iso_pipeline.Rmd
Modified: analysis/explore.filters.Rmd
Modified: analysis/overlap_qtls.Rmd
Modified: analysis/peakOverlap_oppstrand.Rmd
Modified: analysis/pheno.leaf.comb.Rmd
Modified: analysis/test.max2.Rmd
Modified: code/Snakefile
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | ed0a1eb | Briana Mittleman | 2018-09-24 | add fix command to full pipeline |
html | a0a73b5 | Briana Mittleman | 2018-09-24 | Build site. |
Rmd | 6242ff6 | Briana Mittleman | 2018-09-24 | add filter command to full pipeline |
html | 617d032 | brimittleman | 2018-08-08 | Build site. |
Rmd | 6be219c | brimittleman | 2018-08-08 | add final pipeline |
html | 5566fd6 | brimittleman | 2018-08-02 | Build site. |
Rmd | 0f79304 | brimittleman | 2018-08-02 | fix cov to peak file problem |
html | efad657 | Briana Mittleman | 2018-07-31 | Build site. |
Rmd | 7c203e4 | Briana Mittleman | 2018-07-31 | format files for yangs peak script |
html | 7fc2ce7 | Briana Mittleman | 2018-07-30 | Build site. |
Rmd | 782320d | Briana Mittleman | 2018-07-30 | look at coverage in merged bw |
html | e5a8da6 | Briana Mittleman | 2018-07-30 | Build site. |
Rmd | 422a428 | Briana Mittleman | 2018-07-30 | add peak cove pipeline and combined lane qc |
I need to create a processing pipeline that I can run each time I get more individuals that will do the following:
combine all total and nuclear libraries (as a bigwig/genome coverage)
call peaks with Yang’s script
filter peaks with Yang’s script
clean peaks
run feature counts on these peaks for all fo the individuals
I can do this step in my snakefile. First, I added the following to my environemnt.
I want to create bedgraph for each file. I will add a rule to my snakefile that does this and puts them in the bedgraph directory.
I want to add more memory for this rule in the cluster.json
"bedgraph" :
{
"mem": 16000
},
"bedgraph_5" :
{
"mem": 16000
}
I will use the bedgraphtobigwig tool.
#add to directory
dir_bedgraph= dir_data + "bedgraph/"
dir_bigwig= dir_data + "bigwig/"
dir_sortbg= dir_data + "bedgraph_sort/"
dir_bedgraph_5= dir_data + "bedgraph_5prime/"
#add to rule_all
expand(dir_bedgraph + "{samples}.split.bg", samples=samples)
expand(dir_sortbg + "{samples}.sort.bg", samples=samples)
expand(dir_bigwig + "{samples}.bw", samples=samples)
expand(dir_bedgraph_5 + "{samples}.5.bg", samples=samples)
#rule
rule bedgraph_5:
input:
bam = dir_sort + "{samples}-sort.bam"
output: dir_bedgraph_5 + "{samples}.5.bg"
shell: "bedtools genomecov -ibam {input.bam} -bg -5 > {output}"
rule bedgraph:
input:
bam = dir_sort + "{samples}-sort.bam"
output: dir_bedgraph + "{samples}.split.bg"
shell: "bedtools genomecov -ibam {input.bam} -bg -split > {output}"
rule sort_bg:
input: dir_bedgraph + "{samples}.split.bg"
output: dir_sortbg + "{samples}.sort.bg"
shell: "sort -k1,1 -k2,2n {input} > {output}"
rule bg_to_bw:
input:
bg=dir_sortbg + "{samples}.sort.bg"
len= chrom_length
output: dir_bigwig + "{samples}.bw"
shell: "bedGraphToBigWig {input.bg} {input.len} {output}"
This next step will take all of the files in the bigwig directory and merge them. To do this I will create a script that creates a list of all of the files then uses this list in the merge script.
mergeBW.sh
#!/bin/bash
#SBATCH --job-name=mergeBW
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=mergeBW.out
#SBATCH --error=mergeBW.err
#SBATCH --partition=broadwl
#SBATCH --mem=40G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
ls -d -1 /project2/gilad/briana/threeprimeseq/data/bigwig/* | tail -n +2 > /project2/gilad/briana/threeprimeseq/data/list_bw/list_of_bigwig.txt
bigWigMerge -inList /project2/gilad/briana/threeprimeseq/data/list_bw/list_of_bigwig.txt /project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.bg
The result of this script will be a merged bedgraph of all of the files.
library(workflowr)
This is workflowr version 1.1.1
Run ?workflowr for help getting started
library(ggplot2)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
#!/usr/bin/env python
main(inFile, outFile):
fout = open(outFile,'w')
for ind,ln in enumerate(open(inFile)):
print(ind)
chrom, start, end, count = ln.split()
i2=int(start)
while i2 < int(end):
fout.write("%s\t%d\t%s\n"%(chrom, i2 + 1, count))
fout.flush()
i2 += 1
fout.close()
if __name__ == "__main__":
import numpy as np
from misc_helper import *
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
main(inFile, outFile)
Create a bash script to run this. I want the input and output files to be arguments in the python script.
#!/bin/bash
#SBATCH --job-name=run_bgtocov
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=run_bgtocov.out
#SBATCH --error=run_bgtocov.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
python bg_to_cov.py "/project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.bg" "/project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.coverage.txt"
Sort result with:
sort -k1,1 -k2,2n merged_combined_YL-SP-threeprimeseq.coverage.txt > merged_combined_YL-SP-threeprimeseq.coverage.sort.txt
def main(inFile, outFile, ctarget):
fout = open(outFile,'w')
mincount = 10
ov = 20
current_peak = []
currentChrom = None
prevPos = 0
for ln in open(inFile):
chrom, pos, count = ln.split()
if chrom != ctarget: continue
count = float(count)
if currentChrom == None:
currentChrom = chrom
if count == 0 or currentChrom != chrom or int(pos) > prevPos + 1:
if len(current_peak) > 0:
print (current_peak)
M = max([x[1] for x in current_peak])
if M > mincount:
all_peaks = refine_peak(current_peak, M, M*0.1,M*0.05)
#refined_peaks = [(x[0][0],x[-1][0], np.mean([y[1] for y in x])) for x in all_peaks]
rpeaks = [(int(x[0][0])-ov,int(x[-1][0])+ov, np.mean([y[1] for y in x])) for x in all_peaks]
if len(rpeaks) > 1:
for clu in cluster_intervals(rpeaks)[0]:
M = max([x[2] for x in clu])
merging = []
for x in clu:
if x[2] > M *0.5:
#print x, M
merging.append(x)
c, s,e,mean = chrom, min([x[0] for x in merging])+ov, max([x[1] for x in merging])-ov, np.mean([x[2] for x in merging])
#print c,s,e,mean
fout.write("chr%s\t%d\t%d\t%d\t+\t.\n"%(c,s,e,mean))
fout.flush()
elif len(rpeaks) == 1:
s,e,mean = rpeaks[0]
fout.write("chr%s\t%d\t%d\t%f\t+\t.\n"%(chrom,s+ov,e-ov,mean))
print("chr%s"%chrom+"\t%d\t%d\t%f\t+\t.\n"%rpeaks[0])
#print refined_peaks
current_peak = [(pos,count)]
else:
current_peak.append((pos,count))
currentChrom = chrom
prevPos = int(pos)
def refine_peak(current_peak, M, thresh, noise, minpeaksize=30):
cpeak = []
opeak = []
allcpeaks = []
allopeaks = []
for pos, count in current_peak:
if count > thresh:
cpeak.append((pos,count))
opeak = []
continue
elif count > noise:
opeak.append((pos,count))
else:
if len(opeak) > minpeaksize:
allopeaks.append(opeak)
opeak = []
if len(cpeak) > minpeaksize:
allcpeaks.append(cpeak)
cpeak = []
if len(cpeak) > minpeaksize:
allcpeaks.append(cpeak)
if len(opeak) > minpeaksize:
allopeaks.append(opeak)
allpeaks = allcpeaks
for opeak in allopeaks:
M = max([x[1] for x in opeak])
allpeaks += refine_peak(opeak, M, M*0.3, noise)
#print [(x[0],x[-1]) for x in allcpeaks], [(x[0],x[-1]) for x in allopeaks], [(x[0],x[-1]) for x in allpeaks]
#print '---\n'
return(allpeaks)
if __name__ == "__main__":
import numpy as np
from misc_helper import *
import sys
chrom = sys.argv[1]
inFile = "/project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.coverage.sort.txt" # "/project2/yangili1/threeprimeseq/gencov/TotalBamFiles.split.genomecov.bed"
outFile = "/project2/gilad/briana/threeprimeseq/data/mergedPeaks/APApeaks_chr%s.bed"%chrom
main(inFile, outFile, chrom)
#!/bin/bash
#SBATCH --job-name=w_getpeakYLgen
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=w_getpeakYLgen.out
#SBATCH --error=w_getpeakYLgen.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
for i in $(seq 1 22); do
python callPeaksYL_GEN.py $i
done
Run the file with : sbatch w_getpeakYLGEN.sh
After I have the peaks I will need to use Yangs filter peak function.
Update each of the following scripts:
cat /project2/gilad/briana/threeprimeseq/data/mergedPeaks/*.bed > /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/APApeaks_merged_allchrom.bed
bed2saf.py
fout = file("/project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/APApeaks_merged_allchrom.SAF",'w')
fout.write("GeneID\tChr\tStart\tEnd\tStrand\n")
for ln in open("/project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/APApeaks_merged_allchrom.bed"):
chrom, start, end, score, strand, score2 = ln.split()
ID = "peak_%s_%s_%s"%(chrom,start, end)
fout.write("%s\t%s\t%s\t%s\t+\n"%(ID+"_+", chrom.replace("chr",""), start, end))
fout.write("%s\t%s\t%s\t%s\t-\n"%(ID+"_-", chrom.replace("chr",""), start, end))
fout.close()
Run this with run_bed2saf.sh. I did this because I need to load python2 rather than using the environment,
#!/bin/bash
#SBATCH --job-name=peak_fc
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=peak_fc.out
#SBATCH --error=peak_fc.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
featureCounts -a /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/APApeaks_merged_allchrom.SAF -F SAF -o /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/APAquant.fc /project2/gilad/briana/threeprimeseq/data/sort/*-sort.bam -s 1
This script is peak_fc.sh
filter_peaks.py
I should run this in a bash script with python 2 as well.
#!/bin/bash
#SBATCH --job-name=filter_peak
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=filet_peak.out
#SBATCH --error=filter_peak.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load python
python filter_peaks.py
Name the peaks for the cleanup:
x = wc -l /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.bed
seq 1 x > peak.num.txt
paste /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.bed peak.num.txt | column -s $'\t' -t > temp
awk '{print $1 "\t" $2 "\t" $3 "\t" $7 "\t" $4 "\t" $5 "\t" $6}' temp > /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.named.bed
#!/bin/rscripts
# usage: ./cleanupdtseq.R in_bedfile, outfile, cuttoff
#this script takes a putative peak file, and output file name and a cuttoff for classification and outputs the file with all of the seqs classified.
#use optparse for management of input arguments I want to be able to imput the 6up nuc file and write out a filter file
#script needs to run outside of conda env. should module load R in bash script when I submit it
library(optparse)
library(dplyr)
library(tidyr)
library(ggplot2)
library(cleanUpdTSeq)
library(GenomicRanges)
library(BSgenome.Hsapiens.UCSC.hg19)
option_list = list(
make_option(c("-f", "--file"), action="store", default=NA, type='character',
help="input file"),
make_option(c("-o", "--output"), action="store", default=NA, type='character',
help="output file"),
make_option(c("-c", "--cutoff"), action="store", default=NA, type='double',
help="assignment cuttoff")
)
opt_parser <- OptionParser(option_list=option_list)
opt <- parse_args(opt_parser)
#interrupt execution if no file is supplied
if (is.null(opt$file)){
print_help(opt_parser)
stop("Need input file", call.=FALSE)
}
#imput file for test data
testSet <- read.table(file = opt$file, sep="\t", col.names =c("chr", "start", "end", "PeakName", "Cov", "Strand", "score"))
peaks <- BED2GRangesSeq(testSet, withSeq=FALSE)
#build vector with human genome
testSet.NaiveBayes <- buildFeatureVector(peaks, BSgenomeName=Hsapiens,
upstream=40, downstream=30,
wordSize=6, alphabet=c("ACGT"),
sampleType="unknown",
replaceNAdistance=30,
method="NaiveBayes",
ZeroBasedIndex=1, fetchSeq=TRUE)
#classfy sites with built in classsifer
data(classifier)
testResults <- predictTestSet(testSet.NaiveBayes=testSet.NaiveBayes,
classifier=classifier,
outputFile=NULL,
assignmentCutoff=opt$cutoff)
true_peaks=testResults %>% filter(pred.class==1)
#write results
write.table(true_peaks, file=opt$output, quote = F, row.names = F, col.names = T)
I will create a bash script to run the cleanupdtseq.R code.
#!/bin/bash
#SBATCH --job-name=cleanup_comb
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=cleanup_comb.out
#SBATCH --error=cleanup_comb.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load R
Rscript cleanupdtseq.R -f /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.named.bed -o /project2/gilad/briana/threeprimeseq/data/clean.peaks_comb/truePeaks_clean.bed -c .5
Do this after. filter_peaksClean.R, run with run_filter_peaksClean.sh
library(dplyr)
clean=read.table("/project2/gilad/briana/threeprimeseq/data/clean.peaks_comb/truePeaks_clean.bed", header=F, col.names=c("PeakName", "probFalse", "probTrue", "predClass", "UP", "Down"), skip=1)
peaks=read.table("/project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.named.bed", header=F, col.names=c("Chr", "Start", "End", "PeakName", "Cov", "Strand", "Score"))
true_peaks=clean %>% filter(predClass==1)
true_peak_bed=semi_join(peaks, clean, by="PeakName")
write.table(true_peak_bed, file="/project2/gilad/briana/threeprimeseq/data/clean.peaks_comb/APApeaks_combined_clean.bed", row.names = F, col.names = F, quote = F)
#!/bin/bash
#SBATCH --job-name=filter_clean
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=filter_clean.out
#SBATCH --error=filter_clean.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load R
Rscript filter_peaksClean.R
may have to run bed to SAF again. bed2saf.peaks.py
from misc_helper import *
fout = file("/project2/gilad/briana/threeprimeseq/data/clean.peaks_comb/APApeaks_combined_clean.saf",'w')
fout.write("GeneID\tChr\tStart\tEnd\tStrand\n")
for ln in open("/project2/gilad/briana/threeprimeseq/data/clean.peaks_comb/APApeaks_combined_clean.bed"):
chrom, start, end, name, score, strand, score2 = ln.split()
ID = "peak_%s_%s_%s"%(chrom,start, end)
fout.write("%s\t%s\t%s\t%s\t+\n"%(ID+"_+", chrom.replace("chr",""), start, end))
fout.write("%s\t%s\t%s\t%s\t-\n"%(ID+"_-", chrom.replace("chr",""), start, end))
fout.close()
#!/bin/bash
#SBATCH --job-name=bed2saf_peaks
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=bed2saf_peak.out
#SBATCH --error=bed2saf_peak.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load python
python bed2saf.peaks.py
#!/bin/bash
#SBATCH --job-name=clean_peak_fc
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=clean_peak_fc.out
#SBATCH --error=clean_peak_fc.err
#SBATCH --partition=broadwl
#SBATCH --mem=12G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
featureCounts -a /project2/gilad/briana/threeprimeseq/data/clean.peaks_comb/APApeaks_combined_clean.saf -F SAF -o /project2/gilad/briana/threeprimeseq/data/clean_peaks_comb_quant/APAquant.fc.cleanpeaks.fc /project2/gilad/briana/threeprimeseq/data/sort/*-sort.bam -s 1
Should make this a snake file!!!
mergeBW.sh
run_bgtocov.sh
sort -k1,1 -k2,2n /project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.coverage.txt > /project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.coverage.sort.txt
w_getpeakYLGEN.sh
cat /project2/gilad/briana/threeprimeseq/data/mergedPeaks/*.bed > /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/APApeaks_merged_allchrom.bed
run_bed2saf.sh
peak_fc.sh
run_filterPeak.sh
x = wc -l /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.bed
seq 1 x > /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/peak.num.txt
paste /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.bed /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/peak.num.txt | column -s $'\t' -t > /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/temp
awk '{print $1 "\t" $2 "\t" $3 "\t" $7 "\t" $4 "\t" $5 "\t" $6}' /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/temp > /project2/gilad/briana/threeprimeseq/data/mergedPeaks_comb/filtered_APApeaks_merged_allchrom.named.bed
if no cleanning:
cleanup_comb.sh
run_filter_peaksClean.sh
run_bed2saf_peaks.sh
clean_peak_fc.sh
#!/bin/bash
#SBATCH --job-name=comb_gencov
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=comb_gencov.out
#SBATCH --error=comb_gencov.err
#SBATCH --partition=bigmem2
#SBATCH --mem=100G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
samtools merge /project2/gilad/briana/threeprimeseq/data/comb_bam/all_total.nuc_comb.bam /project2/gilad/briana/threeprimeseq/data/sort/*.bam
bedtools genomecov -ibam /project2/gilad/briana/threeprimeseq/data/comb_bam/all_total.nuc_comb.bam -d -split > /project2/gilad/briana/threeprimeseq/data/comb_bam/all_total.nuc_comb.split.genomecov.bed
Will need to run mergeBW.sh and run_bgtocov.sh then sort with
sort -k1,1 -k2,2n merged_combined_YL-SP-threeprimeseq.coverage.txt > merged_combined_YL-SP-threeprimeseq.coverage.sort.txt
then call peaks with the updated callpeaks script from yang (get_APA_peaks.py) I run this with w_getpeakYLGEN.sh.
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.7.6 ggplot2_3.0.0 workflowr_1.1.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 compiler_3.5.1 pillar_1.3.0
[4] git2r_0.23.0 plyr_1.8.4 bindr_0.1.1
[7] R.methodsS3_1.7.1 R.utils_2.7.0 tools_3.5.1
[10] digest_0.6.16 evaluate_0.11 tibble_1.4.2
[13] gtable_0.2.0 pkgconfig_2.0.2 rlang_0.2.2
[16] yaml_2.2.0 bindrcpp_0.2.2 withr_2.1.2
[19] stringr_1.3.1 knitr_1.20 rprojroot_1.3-2
[22] grid_3.5.1 tidyselect_0.2.4 glue_1.3.0
[25] R6_2.2.2 rmarkdown_1.10 purrr_0.2.5
[28] magrittr_1.5 whisker_0.3-2 backports_1.1.2
[31] scales_1.0.0 htmltools_0.3.6 assertthat_0.2.0
[34] colorspace_1.3-2 stringi_1.2.4 lazyeval_0.2.1
[37] munsell_0.5.0 crayon_1.3.4 R.oo_1.22.0
This reproducible R Markdown analysis was created with workflowr 1.1.1