Last updated: 2021-02-12

Checks: 7 0

Knit directory: hesc-epigenomics/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210202)

The command set.seed(20210202) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 9caed99

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 9caed99. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/bed/
    Ignored:    data/bw
    Ignored:    data/meta/
    Ignored:    data/peaks
    Ignored:    data/rnaseq/

Untracked files:
    Untracked:  output/Collier_2017/Naive_low.bed
    Untracked:  output/Collier_2017/Primed_low.bed
    Untracked:  output/Collier_2017/ensembl/Naive_low.bed
    Untracked:  output/Collier_2017/ensembl/Primed_low.bed

Unstaged changes:
    Modified:   analysis/gene_expression_groups.Rmd
    Deleted:    output/Collier_2017/Naive_bottom.bed
    Modified:   output/Collier_2017/Naive_med.bed
    Modified:   output/Collier_2017/Naive_top.bed
    Deleted:    output/Collier_2017/Primed_bottom.bed
    Modified:   output/Collier_2017/Primed_med.bed
    Modified:   output/Collier_2017/Primed_top.bed
    Deleted:    output/Collier_2017/ensembl/Naive_bottom.bed
    Modified:   output/Collier_2017/ensembl/Naive_med.bed
    Modified:   output/Collier_2017/ensembl/Naive_top.bed
    Deleted:    output/Collier_2017/ensembl/Primed_bottom.bed
    Modified:   output/Collier_2017/ensembl/Primed_med.bed
    Modified:   output/Collier_2017/ensembl/Primed_top.bed

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/rna_seq_collier_2017.Rmd) and HTML (docs/rna_seq_collier_2017.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	9caed99	cnluzon	2021-02-12	Smaller quantile groups
html	a28edd5	cnluzon	2021-02-07	RNA-seq analysis update
Rmd	d025037	cnluzon	2021-02-07	wflow_publish(“analysis/rna_seq_collier_2017.Rmd”)
Rmd	0481b52	cnluzon	2021-02-05	RNA seq first analysis
html	0481b52	cnluzon	2021-02-05	RNA seq first analysis

Summary

Data: Used public data from (Collier et al. 2017): H9 embryonic stem cells in Naïve and Primed state, three biological replicates.

Sample info
	dataset_id	gsm_id	description	filename	run_id	input	layout	experiment
26	Collier_2017	GSM2449055	hESC_H9_naive_1	hESC_H9_naive_1	SRR5151102	-	single	RNA-seq
27	Collier_2017	GSM2449056	hESC_H9_naive_2	hESC_H9_naive_2	SRR5151103	-	single	RNA-seq
28	Collier_2017	GSM2449057	hESC_H9_naive_3	hESC_H9_naive_3	SRR5151104	-	single	RNA-seq
29	Collier_2017	GSM2449058	hESC_H9_primed_1	hESC_H9_primed_1	SRR5151105	-	single	RNA-seq
30	Collier_2017	GSM2449059	hESC_H9_primed_2	hESC_H9_primed_2	SRR5151106	-	single	RNA-seq
31	Collier_2017	GSM2449060	hESC_H9_primed_3	hESC_H9_primed_3	SRR5151107	-	single	RNA-seq

Primary analysis

RNA-seq data was analysed by a standard RNA-seq pipeline v2.0, available on nf-core (Ewels et al. 2019) with mostly default settings. For more information about the specific steps involved in this analysis, refer to the repository documentation: https://nf-co.re/rnaseq/2.0.

Reference genome

Due to some difficulties in using hg38 with iGenomes, and since the rest of the analysis has been done on such reference (more specifically, GRCh38.p12). An annotation was downloaded from Ensembl and used as matching custom --fasta and --gtf parameters:

ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.gtf.gz

ftp://ftp.ensembl.org/pub/release-95/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Running environment

RNA-seq analysis was run on Uppmax HPC environment using standard parameters: nextflow run nf-core/rnaseq -profile uppmax -revision 2.0 --public_data_ids ids_file.txt. ids_file.txt contained only the GSM ids as plain text.

This run would produce the corresponding FASTQ files and then a further run with the generated samplesheet_grouped.tsv file, the pipeline was run.

Note on pipeline version

There is a newer RNA-seq version: https://github.com/nf-core/rnaseq/releases/tag/3.0. This affects the quantification, so it may be the case that results are better using Salmon instead of featureCounts. However at this point there is a problem with Singularity on Uppmax and this version cannot be easily run. If necessary, I will re-do this analysis in the future, once these problems are solved.

Downstream analysis

Helper functions

color_list <- c(Naive=global_palette$teal,
                Primed=global_palette$coral)

#' Select by quantile
#'
#' @param counts Counts table.
#' @param length_cutoff Gene length cutoff.
#' @param value_cutoff Value cutoff
#' @param min_q Minimum quantile to consider
#' @param max_q Maximum quantile to consider
#' @param field Which field to use (default = TPM).
#'
#' @return filtered data.frame
quantile_group <- function(counts, min_q, max_q, value_cutoff = 0, length_cutoff = 10000, field = "TPM") {
  counts$length <- (counts$End - counts$Start)
  counts <- dplyr::filter(counts, .data[[field]] > value_cutoff & length > length_cutoff)
  
  q <- quantile(counts[, field], probs = c(min_q, max_q))
  
  dplyr::filter(counts, .data[[field]] >= q[1] & .data[[field]] <= q[2])
}

#' Make a venn diagram out ouf a list of selected gene counts
#'
#' @param gene_counts of count tables
#' @param title Plot title
#' @param color Base color

#' @return ggplot object
genes_venn <- function(gene_counts, title, color) {
  field_list <- list(r1 = gene_counts[[1]]$Gene.Name, 
                     r2 = gene_counts[[2]]$Gene.Name,
                     r3 = gene_counts[[3]]$Gene.Name)
  
  clist <- rep(color, 3)
  ggvenn(field_list, fill_color = clist, fill_alpha = 0.5, text_size = 5, stroke_color = "#ffffff") + 
    ggtitle(title)
}

intersect_genes_3 <- function(glist) {
  intersect(intersect(glist[[1]]$Gene.Name, 
                             glist[[2]]$Gene.Name),
                             glist[[3]]$Gene.Name)
}

#' Subset a named granges by name
#'
#' @param names Names list
#' @param granges GRanges to subset.
#'
#' @return GRanges
select_genes <- function(names, granges) {
  sort(granges[granges$name %in% names, ])
}

#' Get GRanges with gene symbols for hg38
#'
#' @return A GRanges where name field is Gene symbol
genes_hg38 <- function() {
  # This matches gene IDs to symbols
  x <- org.Hs.egSYMBOL
  
  # Get the gene names that are mapped to an entrez gene identifier
  mapped_genes <- mappedkeys(x)
  # Convert to a list
  xx <- as.list(x[mapped_genes])
  
  # Apparently there are some pathway crap that puts NAs in the list
  xx <- xx[!is.na(xx)]
  
  gene_id_to_name <- unlist(xx)
  
  genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)
  genes$name <- gene_id_to_name[genes$gene_id]
  genes
}

#' Get GRanges with gene symbols for hg38
#'
#' @return A GRanges where name field is Gene symbol
genes_hg38_ensembl <- function() {
  # This matches gene IDs to symbols
  edb <- EnsDb.Hsapiens.v86
  genes(edb)
}

export_gene_set <- function(replicates, name, path, annot_func, name_field) {
  gene_names <- intersect_genes_3(replicates)
  
  genes_all <- annot_func()
  genes_all <- genes_all[, name_field]
  names(mcols(genes_all)) <- "name"
  
  result <- select_genes(gene_names, genes_all)
  
  out_file <- file.path(path, name)
  export(result, out_file)
}

Gene counts

Gene counts output file from featureCounts are used.

counts_file <- file.path(params$rnaseqdir,
                         "Collier_2017_featurecounts.tsv")

counts <- read.table(counts_file, sep = "\t", header = T)
kable(head(counts))

Geneid	gene_name	Naive_R1	Naive_R2	Naive_R3	Primed_R1	Primed_R2	Primed_R3
ENSG00000223972	DDX11L1	1	0	0	0	0	2
ENSG00000227232	WASH7P	14	14	10	18	20	24
ENSG00000278267	MIR6859-1	4	3	2	7	6	9
ENSG00000243485	MIR1302-2HG	3	0	0	4	0	4
ENSG00000284332	MIR1302-2	0	0	0	0	0	0
ENSG00000237613	FAM138A	0	0	0	0	0	0

Counts distribution

counts_long <- pivot_longer(counts,
                            cols = -c(Geneid, gene_name),
                            names_to = c("sample", "replicate"),
                            names_sep = "_",
                            values_to = "count",
                            names_ptypes = list(sample = factor(),
                                                replicate = factor()))

color_scale <- scale_color_manual(name = "sample",
                                  values = unlist(color_list))


ggplot(counts_long,
       aes(x = interaction(sample, replicate, lex.order =T),
           y = count, color = sample)) + 
  labs(x = "Sample", y = "counts",
     title = "featureCounts distribution per sample") +
  geom_boxplot() +  
  color_scale + 
  theme_default()

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

Log scaled + 1:

library(tidyr)
counts_long <- pivot_longer(counts,
                            cols = -c(Geneid, gene_name),
                            names_to = c("sample", "replicate"),
                            names_sep = "_",
                            values_to = "count",
                            names_ptypes = list(sample = factor(),
                                                replicate = factor()))

color_scale <- scale_color_manual(name = "sample",
                                  values = unlist(color_list))


ggplot(counts_long,
       aes(x = interaction(sample, replicate, lex.order =T),
           y = log10(count+1), color = sample)) + 
  geom_boxplot() +  
  color_scale + 
  labs(x = "Sample", y = "log10(counts + 1)",
       title = "featureCounts distribution per sample") +
  theme_default()

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

As expected, there are many zero values in all the distributions, and long tails since there is no upper limit in RNA-seq data.

Stringtie output

stringtie (Kovaka et al. 2019) https://ccb.jhu.edu/software/stringtie/) is a tool for quantitation of full-length transcripts (taking variants into account) per gene.

This output is also provided as standard output for RNA-seq pipeline v2.0.

stringtie_files <- list(
  naive = list.files(params$rnaseqdir,
                     pattern = "Naive.*gene_abundance.txt", full.names = T),
  primed = list.files(params$rnaseqdir,
                      pattern = "Primed.*gene_abundance.txt", full.names = T)
)

kable(head(read.csv(stringtie_files$primed[1], header = T, sep = "\t")))

Gene.ID	Gene.Name	Reference	Strand	Start	End	Coverage	FPKM	TPM
ENSG00000223972	DDX11L1	1	+	11869	14409	0.368065	0.080012	0.162290
ENSG00000227232	WASH7P	1	-	14404	29570	36.709293	6.948467	14.093770
ENSG00000278267	MIR6859-1	1	-	17369	17436	21.787683	4.124051	8.364929
ENSG00000243485	MIR1302-2HG	1	+	29554	31109	1.308571	0.379904	0.770571
ENSG00000284332	MIR1302-2	1	+	30366	30503	0.062616	0.011852	0.024040
ENSG00000237613	FAM138A	1	-	34554	36081	0.173298	0.067773	0.137466

Gene expression groups

To get sets of high - med -low expressed:

1. Remove genes that are shorter than 10kb.

2. Take top 5%, around-median 5% and lowest 5% genes.

In this case, we have 3 biological replicates per condition.

So one possible approach is do this per replicate and see how they intersect.

counts_naive <- lapply(stringtie_files$naive,
                       read.csv,
                       header = T,
                       sep = "\t")

counts_primed <- lapply(stringtie_files$primed,
                        read.csv,
                        header = T,
                        sep = "\t")

# Size of the quantile group
size = 0.1

min_qs <- c(0, 0.5 - (size/2), 1-size)
max_qs <- c(0+size, 0.5 + (size/2), 1)

quantile_low <- purrr::partial(
  quantile_group,
  min_q = min_qs[1],
  max_q = max_qs[1],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = params$deciding_stat
)

quantile_med <- purrr::partial(
  quantile_group,
  min_q = min_qs[2],
  max_q = max_qs[2],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = params$deciding_stat
)

quantile_high <- purrr::partial(
  quantile_group,
  min_q = min_qs[3],
  max_q = max_qs[3],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = params$deciding_stat
)

naive_top <- map(counts_naive, quantile_high)
naive_med <- map(counts_naive, quantile_med)
naive_low <- map(counts_naive, quantile_low)

primed_top <- map(counts_primed, quantile_high)
primed_med <- map(counts_primed, quantile_med)
primed_low <- map(counts_primed, quantile_low)

c <- color_list[["Naive"]]

Naïve expressed gene groups

Top expressed genes:

genes_venn(naive_top,
           "Naïve top expressed genes across replicates",
           color = c)

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

Medium expressed genes:

genes_venn(naive_med,
           "Naïve med expressed genes across replicates",
           color = c)

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

Low expressed genes:

genes_venn(naive_low,
           "Naïve low expressed genes across replicates",
           color = c)

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

Primed expressed gene groups

c <- color_list[["Primed"]]

genes_venn(primed_top,
           "Primed top expressed genes across replicates",
           color = c)

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

genes_venn(primed_med,
           "Primed med expressed genes across replicates",
           color = c)

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

genes_venn(primed_low,
           "Primed low expressed genes across replicates",
           color = c)

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

Primed vs Naive

I assume here there should be some overlap

naive_top_genes <- intersect_genes_3(naive_top)
primed_top_genes <- intersect_genes_3(primed_top)

field_list <- list(naive=naive_top_genes,
                   primed=primed_top_genes)

ggvenn(field_list, fill_color = c(color_list[["Naive"]], color_list[["Primed"]]), fill_alpha = 0.5, text_size = 5, stroke_color = "#ffffff") + 
    ggtitle("Naive vs primed top expressed genes")

Version	Author	Date
a28edd5	cnluzon	2021-02-07
0481b52	cnluzon	2021-02-05

TPM vs FPKM

There is maximum overlap in the top genes. On the lower side, they tend to disagree more.

quantile_high_tpm <- purrr::partial(
  quantile_group,
  min_q = min_qs[3],
  max_q = max_qs[3],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = "TPM"
)

quantile_high_fpkm <- purrr::partial(
  quantile_group,
  min_q = min_qs[3],
  max_q = max_qs[3],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = "FPKM"
)

naive_top_fpkm <-  intersect_genes_3(map(counts_naive, quantile_high_fpkm))
naive_top_tpm <-  intersect_genes_3(map(counts_naive, quantile_high_tpm))


field_list <- list(fpkm=naive_top_fpkm,
                   tpm=naive_top_tpm)

ggvenn(field_list, fill_color = c(color_list[["Naive"]], color_list[["Naive"]]), fill_alpha = 0.5, text_size = 5, stroke_color = "#ffffff") + 
    ggtitle("Naive top expressed genes (TPM vs FPKM)")

primed_top_fpkm <-  intersect_genes_3(map(counts_primed, quantile_high_fpkm))
primed_top_tpm <-  intersect_genes_3(map(counts_primed, quantile_high_tpm))

field_list <- list(fpkm=primed_top_fpkm,
                   tpm=primed_top_tpm)

ggvenn(field_list, fill_color = c(color_list[["Primed"]], color_list[["Primed"]]), fill_alpha = 0.5, text_size = 5, stroke_color = "#ffffff") + 
    ggtitle("Primed top expressed genes (TPM vs FPKM)")

quantile_tpm <- purrr::partial(
  quantile_group,
  min_q = min_qs[1],
  max_q = max_qs[1],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = "TPM"
)

quantile_fpkm <- purrr::partial(
  quantile_group,
  min_q = min_qs[1],
  max_q = max_qs[1],
  length_cutoff = params$length_cutoff,
  value_cutoff = params$value_cutoff,
  field = "FPKM"
)

naive_low_fpkm <-  intersect_genes_3(map(counts_naive, quantile_fpkm))
naive_low_tpm <-  intersect_genes_3(map(counts_naive, quantile_tpm))

field_list <- list(fpkm=naive_low_fpkm,
                   tpm=naive_low_tpm)

ggvenn(field_list, fill_color = c(color_list[["Naive"]], color_list[["Naive"]]), fill_alpha = 0.5, text_size = 5, stroke_color = "#ffffff") + 
    ggtitle("Naive low expressed genes (TPM vs FPKM)")

primed_low_fpkm <-  intersect_genes_3(map(counts_primed, quantile_fpkm))
primed_low_tpm <-  intersect_genes_3(map(counts_primed, quantile_tpm))

field_list <- list(fpkm=primed_low_fpkm,
                   tpm=primed_low_tpm)

ggvenn(field_list, fill_color = c(color_list[["Primed"]], color_list[["Primed"]]), fill_alpha = 0.5, text_size = 5, stroke_color = "#ffffff") + 
    ggtitle("Primed low expressed genes (TPM vs FPKM)")

Gene set enrichment

naive_only <- setdiff(naive_top_genes, primed_top_genes)
primed_only <- setdiff(primed_top_genes, naive_top_genes)
both <- intersect(naive_top_genes, primed_top_genes)

genes <- list(naive_only = naive_only,
              primed_only = primed_only,
              shared = both)

# Do a gprofiler analysis
gostres <- gost(query = genes, 
                organism = "hsapiens", ordered_query = FALSE, 
                multi_query = FALSE, significant = TRUE, exclude_iea = FALSE, 
                measure_underrepresentation = FALSE, evcodes = FALSE, 
                user_threshold = 0.05, correction_method = "g_SCS", 
                domain_scope = "annotated", custom_bg = NULL, 
                numeric_ns = "", sources = NULL, as_short_link = FALSE)

gostplot(gostres, capped = TRUE, interactive = TRUE)

This is a bit too much at the moment, we can probably refine this.

Annotation of gene groups

Now I annotate the intersection of these sets in the reference genome, to be able to look at chromatin marks there.

Download BED files

Download Collier_2017.zip

RNA-seq bigwig signal at selected groups

This is sort of a validation of the groups I get:

export_dir <- "./output/Collier_2017/ensembl/"
bedfiles_top <- list.files(export_dir, full.names = T, pattern = "_top")
bedfiles_med <- list.files(export_dir, full.names = T, pattern = "_med")
bedfiles_bottom <- list.files(export_dir, full.names = T, pattern = "_low")

# Ensures proper order
bedfiles <- c(bedfiles_top, bedfiles_med, bedfiles_bottom)
bwsignal <- list.files(file.path(params$datadir, "bw/Collier_2017"), full.names = T)

colors <- c(rep(color_list[["Naive"]], 3), rep(color_list[["Primed"]], 3))
# labels <- style_info[basename(c(bwsignal, bwinput)), "label"]

# Use tss for this
tss_window <- 2500

bed_ranges <- lapply(bedfiles, import)

# Take just downstream from TSS
tss_ranges <- lapply(bed_ranges, promoters, upstream = 0, downstream = tss_window)

this_plot <- partial(bw_loci, bwfiles = bwsignal)
# 
plots <- purrr::map(tss_ranges, this_plot)
names(plots) <- basename(bedfiles)

# Pivot 1 item:
library(tidyr)

pivot_bed <- function(granges, loci_name) {
  bin_ids <- c("seqnames", "start", "end", "width", "strand")
  pivoted <- pivot_longer(data = data.frame(granges),
                             cols = contains("_R"),
                             names_to = "sample",
                             values_to = "value")
  pivoted[["group"]] <- loci_name
  pivoted
}

pivoted_list <- map2(plots, names(plots), pivot_bed)
all_vals <- do.call(rbind, pivoted_list)

ggplot(all_vals, aes(x = sample, y = log2(value), color = sample, group = sample)) + geom_boxplot() + facet_wrap(group ~ .)  + theme_default() + theme(axis.text.x = element_text(angle=45, hjust = 1)) + scale_color_manual(name = "sample", values = colors)

Version	Author	Date
a28edd5	cnluzon	2021-02-07

Profile is not very informative, I think.

# Need to match ref names and so on -> ENSEMBL
export_dir <- "./output/Collier_2017/ensembl/"
bedfiles_top <- list.files(export_dir, full.names = T, pattern = "_top")
bedfiles_med <- list.files(export_dir, full.names = T, pattern = "_med")
bedfiles_bottom <- list.files(export_dir, full.names = T, pattern = "_low")

# Ensures proper order
bedfiles <- c(bedfiles_top, bedfiles_med, bedfiles_bottom)
bwsignal <- list.files(file.path(params$datadir, "bw/Collier_2017"), full.names = T)

colors <- c(rep(color_list[["Naive"]], 3), rep(color_list[["Primed"]], 3))

up <- 500
dw <- 500
bs <- 500
md <- "stretch"

this_plot <- partial(plot_bw_profile, bwfiles = bwsignal,
                     bin_size = bs,
                     mode = md,
                     upstream = up,
                     downstream = dw,
                     colors = colors,
                     verbose = TRUE)

plots <- purrr::map(bedfiles, this_plot)

plots[[1]] <- plots[[1]] + ggtitle("RNA-seq per gene class: Naive")
plots[[2]] <- plots[[2]] + ggtitle("- Primed")

plot_grid(plotlist=plots, nrow=3)

Version	Author	Date
a28edd5	cnluzon	2021-02-07

Bibliography

Collier, Amanda J, Sarita P Panula, John Paul Schell, Peter Chovanec, Alvaro Plaza Reyes, Sophie Petropoulos, Anne E Corcoran, et al. 2017. “Comprehensive Cell Surface Protein Profiling Identifies Specific Markers of Human Naive and Primed Pluripotent States.” Cell Stem Cell 20 (6): 874–90.

Ewels, Philip A, Alexander Peltzer, Sven Fillinger, Johannes Alneberg, Harshil Patel, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso, and Sven Nahnsen. 2019. “Nf-Core: Community Curated Bioinformatics Pipelines.” BioRxiv, 610741.

Kovaka, Sam, Aleksey V Zimin, Geo M Pertea, Roham Razaghi, Steven L Salzberg, and Mihaela Pertea. 2019. “Transcriptome Assembly from Long-Read RNA-Seq Alignments with StringTie2.” Genome Biology 20 (1): 1–13.

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=sv_SE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=sv_SE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=sv_SE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v86_2.99.0               
 [2] ensembldb_2.12.1                        
 [3] AnnotationFilter_1.12.0                 
 [4] gprofiler2_0.2.0                        
 [5] tidyr_1.1.2                             
 [6] cowplot_1.1.1                           
 [7] xfun_0.20                               
 [8] purrr_0.3.4                             
 [9] rtracklayer_1.50.0                      
[10] org.Hs.eg.db_3.11.4                     
[11] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0
[12] GenomicFeatures_1.40.1                  
[13] AnnotationDbi_1.52.0                    
[14] Biobase_2.50.0                          
[15] GenomicRanges_1.42.0                    
[16] GenomeInfoDb_1.26.2                     
[17] IRanges_2.24.1                          
[18] S4Vectors_0.28.1                        
[19] BiocGenerics_0.36.0                     
[20] ggvenn_0.1.8                            
[21] dplyr_1.0.4                             
[22] knitr_1.31                              
[23] ggplot2_3.3.3                           
[24] wigglescout_0.12.8                      
[25] workflowr_1.6.2                         

loaded via a namespace (and not attached):
 [1] colorspace_2.0-0            ellipsis_0.3.1             
 [3] rprojroot_2.0.2             XVector_0.30.0             
 [5] fs_1.5.0                    farver_2.0.3               
 [7] listenv_0.8.0               furrr_0.2.2                
 [9] bit64_4.0.5                 xml2_1.3.2                 
[11] codetools_0.2-18            cachem_1.0.3               
[13] jsonlite_1.7.2              Rsamtools_2.6.0            
[15] dbplyr_2.1.0                shiny_1.6.0                
[17] compiler_4.0.3              httr_1.4.2                 
[19] assertthat_0.2.1            Matrix_1.3-2               
[21] fastmap_1.1.0               lazyeval_0.2.2             
[23] later_1.1.0.1               htmltools_0.5.1.1          
[25] prettyunits_1.1.1           tools_4.0.3                
[27] gtable_0.3.0                glue_1.4.2                 
[29] GenomeInfoDbData_1.2.4      reshape2_1.4.4             
[31] rappdirs_0.3.3              Rcpp_1.0.6                 
[33] vctrs_0.3.6                 Biostrings_2.58.0          
[35] crosstalk_1.1.1             stringr_1.4.0              
[37] globals_0.14.0              mime_0.9                   
[39] lifecycle_0.2.0             XML_3.99-0.5               
[41] future_1.21.0               zlibbioc_1.36.0            
[43] scales_1.1.1                hms_1.0.0                  
[45] promises_1.1.1              MatrixGenerics_1.2.0       
[47] ProtGenerics_1.20.0         SummarizedExperiment_1.20.0
[49] RColorBrewer_1.1-2          yaml_2.2.1                 
[51] curl_4.3                    memoise_2.0.0              
[53] biomaRt_2.44.4              stringi_1.5.3              
[55] RSQLite_2.2.3               highr_0.8                  
[57] BiocParallel_1.24.1         rlang_0.4.10               
[59] pkgconfig_2.0.3             matrixStats_0.58.0         
[61] bitops_1.0-6                evaluate_0.14              
[63] lattice_0.20-41             GenomicAlignments_1.26.0   
[65] htmlwidgets_1.5.3           labeling_0.4.2             
[67] bit_4.0.4                   tidyselect_1.1.0           
[69] parallelly_1.23.0           plyr_1.8.6                 
[71] magrittr_2.0.1              R6_2.5.0                   
[73] generics_0.1.0              DelayedArray_0.16.0        
[75] DBI_1.1.1                   pillar_1.4.7               
[77] whisker_0.4                 withr_2.4.1                
[79] RCurl_1.98-1.2              tibble_3.0.6               
[81] crayon_1.4.0                BiocFileCache_1.12.1       
[83] plotly_4.9.3                rmarkdown_2.6              
[85] progress_1.2.2              data.table_1.13.6          
[87] blob_1.2.1                  git2r_0.28.0               
[89] digest_0.6.27               xtable_1.8-4               
[91] httpuv_1.5.5                openssl_1.4.3              
[93] munsell_0.5.0               viridisLite_0.3.0          
[95] askpass_1.1

RNA-seq analysis on public data

Carmen Navarro

2021-02-05