Last updated: 2023-02-01
Checks: 7 0
Knit directory: muse/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200712)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version fb91538. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: r_packages_4.1.2/
Ignored: r_packages_4.2.0/
Untracked files:
Untracked: analysis/cell_ranger.Rmd
Untracked: analysis/tss_xgboost.Rmd
Untracked: data/HG00702_SH089_CHSTrio.chr1.vcf.gz
Untracked: data/HG00702_SH089_CHSTrio.chr1.vcf.gz.tbi
Untracked: data/ncrna_NONCODE[v3.0].fasta.tar.gz
Untracked: data/ncrna_noncode_v3.fa
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/fgsea.Rmd
) and HTML
(docs/fgsea.html
) files. If you’ve configured a remote Git
repository (see ?wflow_git_remote
), click on the hyperlinks
in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | fb91538 | Dave Tang | 2023-02-01 | fgsea example data |
From the original paper describing the Gene Set Enrichment Analysis (GSEA):
The goal of GSEA is to determine whether members of a gene set S tend to occur toward the top (or bottom) of the list L, in which case the gene set is correlated with the phenotypic class distinction.
First install fgsea.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require("fgsea", quietly = TRUE))
BiocManager::install("fgsea")
library(fgsea)
The example pathways are packaged with fgsea
and can be
loaded with data()
. The example pathways are stored in a
list.
data("examplePathways", package = "fgsea")
class(examplePathways)
[1] "list"
There are 1457 example pathways.
length(examplePathways)
[1] 1457
The first pathway 1221633_Meiotic_Synapsis contains Entrez Gene IDs that belong to this gene set.
examplePathways[1]
$`1221633_Meiotic_Synapsis`
[1] "12189" "13006" "15077" "15078" "15270" "15512"
[7] "16905" "16906" "19357" "20842" "20843" "20957"
[13] "20962" "21749" "21750" "22196" "23856" "24061"
[19] "28113" "50878" "56739" "57321" "64009" "66654"
[25] "69386" "71846" "74075" "77053" "94244" "97114"
[31] "97122" "97908" "101185" "140557" "223697" "260423"
[37] "319148" "319149" "319150" "319151" "319152" "319153"
[43] "319154" "319155" "319156" "319157" "319158" "319159"
[49] "319160" "319161" "319565" "320332" "320558" "326619"
[55] "326620" "360198" "497652" "544973" "625328" "667250"
[61] "100041230" "102641229" "102641751" "102642045"
The gene ranks are also packaged with fgsea
but we will
re-generate the ranks based on the author’s
code to see how the ranks were created. Several other Bioconductor
packages are required.
bioc_pac <- c("GEOquery", "limma", "org.Mm.eg.db")
cran_pac <- c("data.table", "pheatmap")
install_pac <- function(x, repo){
if (!require(x, quietly = TRUE, character.only = TRUE)){
if(repo == "bioc"){
BiocManager::install(x, character.only = TRUE)
} else if (repo == "cran"){
install.packages(x, character.only = TRUE)
} else {
stop("Unknown repo")
}
}
}
sapply(bioc_pac, install_pac, repo = "bioc")
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Attaching package: 'limma'
The following object is masked from 'package:BiocGenerics':
plotMA
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
expand.grid, I, unname
$GEOquery
NULL
$limma
NULL
$org.Mm.eg.db
NULL
sapply(cran_pac, install_pac, repo = "cran")
Attaching package: 'data.table'
The following object is masked from 'package:IRanges':
shift
The following objects are masked from 'package:S4Vectors':
first, second
$data.table
NULL
$pheatmap
NULL
Load packages.
library(GEOquery)
library(limma)
library(org.Mm.eg.db)
library(data.table)
# for collapseBy
source("https://raw.githubusercontent.com/assaron/r-utils/master/R/exprs.R")
Download mouse microarray data.
gse14308 <- getGEO("GSE14308")[[1]]
Found 1 file(s)
GSE14308_series_matrix.txt.gz
Add condition to the phenotypic data.
pData(gse14308)$condition <- sub("-.*$", "", gse14308$title)
pData(gse14308)[, c('platform_id', 'type', 'condition')]
platform_id type condition
GSM357839 GPL1261 RNA Th2
GSM357841 GPL1261 RNA Th2
GSM357842 GPL1261 RNA Th1
GSM357843 GPL1261 RNA Th17
GSM357844 GPL1261 RNA Th1
GSM357845 GPL1261 RNA Th17
GSM357847 GPL1261 RNA Naive
GSM357848 GPL1261 RNA Naive
GSM357849 GPL1261 RNA iTreg
GSM357850 GPL1261 RNA iTreg
GSM357852 GPL1261 RNA nTreg
GSM357853 GPL1261 RNA nTreg
fData
retrieves information on features.
feature_dat <- fData(gse14308)
colnames(feature_dat)
[1] "ID" "GB_ACC"
[3] "SPOT_ID" "Species Scientific Name"
[5] "Annotation Date" "Sequence Type"
[7] "Sequence Source" "Target Description"
[9] "Representative Public ID" "Gene Title"
[11] "Gene Symbol" "ENTREZ_GENE_ID"
[13] "RefSeq Transcript ID" "Gene Ontology Biological Process"
[15] "Gene Ontology Cellular Component" "Gene Ontology Molecular Function"
The collapseBy
function is sourced from exprs.R
and the source is shown below.
collapseBy_ <- function(es, factor, FUN=median) {
ranks <- apply(exprs(es), 1, FUN)
t <- data.frame(f=factor, i=seq_along(ranks), r=ranks)
t <- t[order(t$r, decreasing=T), ]
keep <- t[!duplicated(t$f) & !is.na(t$f),]$i
res <- es[keep, ]
fData(res)$origin <- rownames(res)
rownames(res) <- factor[keep]
res
}
ranks
contains the median probe intensity.
ranks <- apply(exprs(gse14308), 1, median)
head(ranks)
1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at 1415675_at
3324.740 4933.035 11202.750 3239.865 4007.050 1830.655
The ENTREZ_GENE_ID
, index, and median get saved into a
data frame. (I’ve used my_df
here because t
is
the name of the transpose function.) The data frame is then ordered from
highest intensity to lowest.
my_df <- data.frame(
f=fData(gse14308)$ENTREZ_GENE_ID,
i=seq_along(ranks),
r=ranks
)
my_df <- my_df[order(my_df$r, decreasing=TRUE), ]
head(my_df)
f i r
1438859_x_at 20090 23165 99737.85
1454859_a_at 65019 /// 100044627 /// 100862455 39154 99078.75
1416404_s_at 20055 /// 100039355 /// 100862433 735 97947.30
1455485_x_at 22121 39780 97510.75
1422475_a_at 20091 6781 97177.05
1435873_a_at 22121 /// 100504632 20179 96558.85
A vector called keep
is created to keep only rows with
non-duplicated and non-missing Entrez Gene IDs.
keep <- my_df[!duplicated(my_df$f) & !is.na(my_df$f),]$i
length(keep)
[1] 21603
Finally, keep
is used to subset the data;
origin
is created to store the original probe IDs before
replacing the row names with Entrez Gene IDs.
res <- gse14308[keep, ]
fData(res)$origin <- rownames(res)
rownames(res) <- fData(gse14308)$ENTREZ_GENE_ID[keep]
res
ExpressionSet (storageMode: lockedEnvironment)
assayData: 21603 features, 12 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM357839 GSM357841 ... GSM357853 (12 total)
varLabels: title geo_accession ... condition (34 total)
varMetadata: labelDescription
featureData
featureNames: 20090 65019 /// 100044627 /// 100862455 ... 194227
(21603 total)
fvarLabels: ID GB_ACC ... origin (17 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
pubMedIds: 19144320
Annotation: GPL1261
Now that we know what collapseBy
does, we can use
it.
es <- collapseBy(gse14308, fData(gse14308)$ENTREZ_GENE_ID, FUN=median)
es
ExpressionSet (storageMode: lockedEnvironment)
assayData: 21603 features, 12 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM357839 GSM357841 ... GSM357853 (12 total)
varLabels: title geo_accession ... condition (34 total)
varMetadata: labelDescription
featureData
featureNames: 20090 65019 /// 100044627 /// 100862455 ... 194227
(21603 total)
fvarLabels: ID GB_ACC ... origin (17 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
pubMedIds: 19144320
Annotation: GPL1261
Probe IDs that mapped to several Entrez Gene IDs and empty entries are removed.
es <- es[!grepl("///", rownames(es)), ]
es <- es[rownames(es) != "", ]
dim(exprs(es))
[1] 20770 12
Quantile normalisation is carried out.
exprs(es) <- normalizeBetweenArrays(log2(exprs(es)+1), method="quantile")
A design matrix is defined.
es.design <- model.matrix(~0+condition, data=pData(es))
es.design
conditioniTreg conditionNaive conditionnTreg conditionTh1
GSM357839 0 0 0 0
GSM357841 0 0 0 0
GSM357842 0 0 0 1
GSM357843 0 0 0 0
GSM357844 0 0 0 1
GSM357845 0 0 0 0
GSM357847 0 1 0 0
GSM357848 0 1 0 0
GSM357849 1 0 0 0
GSM357850 1 0 0 0
GSM357852 0 0 1 0
GSM357853 0 0 1 0
conditionTh17 conditionTh2
GSM357839 0 1
GSM357841 0 1
GSM357842 0 0
GSM357843 1 0
GSM357844 0 0
GSM357845 1 0
GSM357847 0 0
GSM357848 0 0
GSM357849 0 0
GSM357850 0 0
GSM357852 0 0
GSM357853 0 0
attr(,"assign")
[1] 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"
A linear model is fit given the design.
fit <- lmFit(es, es.design)
A contrasts matrix is used to compute contrasts using our fitted linear model. Here we’re contrasting naive T-cells to T-helper 1 cells.
makeContrasts(
conditionTh1-conditionNaive,
levels=es.design
)
Contrasts
Levels conditionTh1 - conditionNaive
conditioniTreg 0
conditionNaive -1
conditionnTreg 0
conditionTh1 1
conditionTh17 0
conditionTh2 0
fit2 <- contrasts.fit(
fit,
makeContrasts(
conditionTh1-conditionNaive,
levels=es.design
)
)
fit2
An object of class "MArrayLM"
$coefficients
Contrasts
conditionTh1 - conditionNaive
20090 0.011485524
22121 -0.003602942
20091 0.035573263
67671 0.092303053
19241 0.013452656
20765 more rows ...
$rank
[1] 6
$assign
[1] 1 1 1 1 1 1
$qr
$qr
conditioniTreg conditionNaive conditionnTreg conditionTh1
GSM357839 -1.414214 0.000000 0.000000 0.0000000
GSM357841 0.000000 -1.414214 0.000000 0.0000000
GSM357842 0.000000 0.000000 -1.414214 0.0000000
GSM357843 0.000000 0.000000 0.000000 -1.4142136
GSM357844 0.000000 0.000000 0.000000 0.7071068
conditionTh17 conditionTh2
GSM357839 0.000000 0
GSM357841 0.000000 0
GSM357842 0.000000 0
GSM357843 0.000000 0
GSM357844 1.414214 0
7 more rows ...
$qraux
[1] 1.0 1.0 1.0 1.0 1.5 1.0
$pivot
[1] 1 2 3 4 5 6
$tol
[1] 1e-07
$rank
[1] 6
$df.residual
[1] 6 6 6 6 6
20765 more elements ...
$sigma
20090 22121 20091 67671 19241
0.03820456 0.03984248 0.05377004 0.03935491 0.03240994
20765 more elements ...
$cov.coefficients
Contrasts
Contrasts conditionTh1 - conditionNaive
conditionTh1 - conditionNaive 1
$stdev.unscaled
Contrasts
conditionTh1 - conditionNaive
20090 1
22121 1
20091 1
67671 1
19241 1
20765 more rows ...
$pivot
[1] 1 2 3 4 5 6
$genes
ID GB_ACC SPOT_ID Species Scientific Name Annotation Date
20090 1438859_x_at AV037157 Mus musculus Oct 6, 2014
22121 1455485_x_at AI324936 Mus musculus Oct 6, 2014
20091 1422475_a_at NM_016959 Mus musculus Oct 6, 2014
67671 1433472_x_at AA050777 Mus musculus Oct 6, 2014
19241 1415906_at NM_021278 Mus musculus Oct 6, 2014
Sequence Type Sequence Source
20090 Consensus sequence GenBank
22121 Consensus sequence GenBank
20091 Consensus sequence GenBank
67671 Consensus sequence GenBank
19241 Consensus sequence GenBank
Target Description
20090 gb:AV037157 /DB_XREF=gi:4856822 /DB_XREF=AV037157 /CLONE=1600022O04 /FEA=EST /CNT=10 /TID=Mm.154915.5 /TIER=Stack /STK=9 /UG=Mm.154915 /LL=20090 /UG_GENE=Rps29 /UG_TITLE=ribosomal protein S29
22121 gb:AI324936 /DB_XREF=gi:4059365 /DB_XREF=mb49d08.x1 /CLONE=IMAGE:332751 /FEA=EST /CNT=24 /TID=Mm.13020.6 /TIER=Stack /STK=12 /UG=Mm.13020 /LL=22121 /UG_GENE=Rpl13a /UG_TITLE=ribosomal protein L13a
20091 gb:NM_016959.1 /DB_XREF=gi:8394217 /GEN=Rps3a /FEA=FLmRNA /CNT=122 /TID=Mm.6957.1 /TIER=FL+Stack /STK=78 /UG=Mm.6957 /LL=20091 /DEF=Mus musculus ribosomal protein S3a (Rps3a), mRNA. /PROD=ribosomal protein S3a /FL=gb:NM_016959.1
67671 gb:AA050777 /DB_XREF=gi:1530594 /DB_XREF=mg72a01.r1 /CLONE=IMAGE:438504 /FEA=EST /CNT=215 /TID=Mm.43330.3 /TIER=Stack /STK=151 /UG=Mm.43330 /LL=67671 /UG_GENE=0610025G13Rik /UG_TITLE=RIKEN cDNA 0610025G13 gene
19241 gb:NM_021278.1 /DB_XREF=gi:10946577 /GEN=Tmsb4x /FEA=FLmRNA /CNT=360 /TID=Mm.142729.1 /TIER=FL+Stack /STK=172 /UG=Mm.142729 /LL=19241 /DEF=Mus musculus thymosin, beta 4, X chromosome (Tmsb4x), mRNA. /PROD=thymosin, beta 4, X chromosome /FL=gb:NM_021278.1 gb:BC018286.1
Representative Public ID Gene Title Gene Symbol
20090 AV037157 ribosomal protein S29 Rps29
22121 AI324936 ribosomal protein L13A Rpl13a
20091 NM_016959 ribosomal protein S3A1 Rps3a1
67671 AA050777 ribosomal protein L38 Rpl38
19241 NM_021278 thymosin, beta 4, X chromosome Tmsb4x
ENTREZ_GENE_ID
20090 20090
22121 22121
20091 20091
67671 67671
19241 19241
RefSeq Transcript ID
20090 NM_009093
22121 NM_009438
20091 NM_016959
67671 NM_001048057 /// NM_001048058 /// NM_023372 /// XM_006534007 /// XM_006534008 /// XM_006534009
19241 NM_021278 /// XM_006528759
Gene Ontology Biological Process
20090 0006412 // translation // not recorded
22121 0006351 // transcription, DNA-templated // inferred from electronic annotation /// 0006355 // regulation of transcription, DNA-templated // inferred from electronic annotation /// 0006412 // translation // inferred from electronic annotation /// 0006417 // regulation of translation // inferred from electronic annotation /// 0017148 // negative regulation of translation // inferred from direct assay /// 0017148 // negative regulation of translation // not recorded /// 0032496 // response to lipopolysaccharide // inferred from mutant phenotype /// 0032844 // regulation of homeostatic process // inferred from mutant phenotype /// 0048246 // macrophage chemotaxis // inferred from mutant phenotype /// 0060425 // lung morphogenesis // inferred from mutant phenotype /// 0071346 // cellular response to interferon-gamma // inferred from direct assay /// 0071346 // cellular response to interferon-gamma // not recorded /// 1901194 // negative regulation of formation of translation preinitiation complex // not recorded
20091 0002181 // cytoplasmic translation // not recorded /// 0006412 // translation // inferred from electronic annotation /// 0030154 // cell differentiation // inferred from electronic annotation /// 0043066 // negative regulation of apoptotic process // inferred from sequence or structural similarity /// 0097194 // execution phase of apoptosis // inferred from sequence or structural similarity
67671 0001501 // skeletal system development // inferred from mutant phenotype /// 0001501 // skeletal system development // traceable author statement /// 0001503 // ossification // inferred from mutant phenotype /// 0006412 // translation // inferred from electronic annotation /// 0006417 // regulation of translation // inferred from mutant phenotype /// 0007605 // sensory perception of sound // inferred from mutant phenotype /// 0034463 // 90S preribosome assembly // inferred from direct assay /// 0042474 // middle ear morphogenesis // inferred from mutant phenotype /// 0048318 // axial mesoderm development // inferred from mutant phenotype
19241 0007010 // cytoskeleton organization // inferred from electronic annotation /// 0014911 // positive regulation of smooth muscle cell migration // traceable author statement /// 0030036 // actin cytoskeleton organization // inferred from electronic annotation /// 0030334 // regulation of cell migration // inferred from direct assay /// 0042989 // sequestering of actin monomers // inferred from electronic annotation /// 0045893 // positive regulation of transcription, DNA-templated // traceable author statement /// 0051152 // positive regulation of smooth muscle cell differentiation // traceable author statement
Gene Ontology Cellular Component
20090 0005622 // intracellular // inferred from electronic annotation /// 0005737 // cytoplasm // not recorded /// 0005840 // ribosome // inferred from electronic annotation /// 0015935 // small ribosomal subunit // not recorded /// 0022627 // cytosolic small ribosomal subunit // not recorded /// 0030529 // ribonucleoprotein complex // inferred from electronic annotation /// 0070062 // extracellular vesicular exosome // not recorded
22121 0005576 // extracellular region // inferred from electronic annotation /// 0005634 // nucleus // not recorded /// 0005840 // ribosome // inferred from electronic annotation /// 0015934 // large ribosomal subunit // inferred from electronic annotation /// 0016020 // membrane // inferred from electronic annotation /// 0022625 // cytosolic large ribosomal subunit // not recorded /// 0030529 // ribonucleoprotein complex // inferred from direct assay /// 0030529 // ribonucleoprotein complex // not recorded /// 0097452 // GAIT complex // inferred from direct assay /// 0097452 // GAIT complex // not recorded
20091 0005622 // intracellular // inferred from electronic annotation /// 0005634 // nucleus // inferred from direct assay /// 0005737 // cytoplasm // inferred from electronic annotation /// 0005829 // cytosol // inferred from direct assay /// 0005840 // ribosome // inferred from electronic annotation /// 0022627 // cytosolic small ribosomal subunit // not recorded /// 0030529 // ribonucleoprotein complex // not recorded
67671 0005622 // intracellular // inferred from electronic annotation /// 0005840 // ribosome // inferred from electronic annotation /// 0022625 // cytosolic large ribosomal subunit // not recorded /// 0030529 // ribonucleoprotein complex // inferred from electronic annotation /// 0033291 // eukaryotic 80S initiation complex // inferred from direct assay
19241 0005634 // nucleus // inferred from direct assay /// 0005737 // cytoplasm // inferred from electronic annotation /// 0005829 // cytosol // inferred from direct assay /// 0005856 // cytoskeleton // inferred from electronic annotation
Gene Ontology Molecular Function
20090 0003735 // structural constituent of ribosome // not recorded /// 0008270 // zinc ion binding // not recorded /// 0046872 // metal ion binding // inferred from electronic annotation
22121 0003676 // nucleic acid binding // inferred from electronic annotation /// 0003677 // DNA binding // inferred from electronic annotation /// 0003729 // mRNA binding // inferred from direct assay /// 0003735 // structural constituent of ribosome // inferred from electronic annotation /// 0005125 // cytokine activity // inferred from electronic annotation /// 0044822 // poly(A) RNA binding // not recorded /// 0046872 // metal ion binding // inferred from electronic annotation
20091 0003735 // structural constituent of ribosome // not recorded /// 0005515 // protein binding // inferred from physical interaction /// 0031369 // translation initiation factor binding // not recorded
67671 0003735 // structural constituent of ribosome // inferred from electronic annotation
19241 0003779 // actin binding // inferred from electronic annotation /// 0005515 // protein binding // inferred from physical interaction
origin
20090 1438859_x_at
22121 1455485_x_at
20091 1422475_a_at
67671 1433472_x_at
19241 1415906_at
20765 more rows ...
$Amean
20090 22121 20091 67671 19241
16.28272 16.27241 16.25759 16.23408 16.23061
20765 more elements ...
$method
[1] "ls"
$design
conditioniTreg conditionNaive conditionnTreg conditionTh1
GSM357839 0 0 0 0
GSM357841 0 0 0 0
GSM357842 0 0 0 1
GSM357843 0 0 0 0
GSM357844 0 0 0 1
conditionTh17 conditionTh2
GSM357839 0 1
GSM357841 0 1
GSM357842 0 0
GSM357843 1 0
GSM357844 0 0
7 more rows ...
$contrasts
Contrasts
Levels conditionTh1 - conditionNaive
conditioniTreg 0
conditionNaive -1
conditionnTreg 0
conditionTh1 1
conditionTh17 0
conditionTh2 0
Finally, the differential expression analysis is carried out and the results saved. The results are ranked by limma’s moderated t-statistic.
fit2 <- eBayes(fit2)
names(topTable(fit2, adjust.method="BH", number=12000, sort.by = "B"))
[1] "ID" "GB_ACC"
[3] "SPOT_ID" "Species.Scientific.Name"
[5] "Annotation.Date" "Sequence.Type"
[7] "Sequence.Source" "Target.Description"
[9] "Representative.Public.ID" "Gene.Title"
[11] "Gene.Symbol" "ENTREZ_GENE_ID"
[13] "RefSeq.Transcript.ID" "Gene.Ontology.Biological.Process"
[15] "Gene.Ontology.Cellular.Component" "Gene.Ontology.Molecular.Function"
[17] "origin" "logFC"
[19] "AveExpr" "t"
[21] "P.Value" "adj.P.Val"
[23] "B"
de <- data.table(topTable(fit2, adjust.method="BH", number=12000, sort.by = "B"), keep.rownames = TRUE)
ranks <- de[order(t), list(rn, t)]
ranks
rn t
1: 170942 -62.22877
2: 109711 -49.47829
3: 18124 -43.40540
4: 12775 -41.16952
5: 72148 -33.23463
---
11996: 58801 49.10222
11997: 13730 50.02863
11998: 15937 50.29120
11999: 12772 50.52651
12000: 80876 52.59930
Load exampleRanks
.
data("exampleRanks", package = "fgsea")
head(exampleRanks)
170942 109711 18124 12775 72148 16010
-63.33703 -49.74779 -43.63878 -41.51889 -33.26039 -32.77626
Compare with our results.
wanted <- head(names(exampleRanks))
ranks[rn %in% wanted]
rn t
1: 170942 -62.22877
2: 109711 -49.47829
3: 18124 -43.40540
4: 12775 -41.16952
5: 72148 -33.23463
6: 16010 -32.38200
Visualise the six most significantly down- and up-regulated genes.
library(pheatmap)
my_sample <- pData(es)$condition == "Th1" | pData(es)$condition == "Naive"
my_group <- data.frame(group = pData(es)$condition[my_sample])
row.names(my_group) <- colnames(exprs(es))[my_sample]
pheatmap(
mat = es[c(head(de[order(t), 1])$rn, tail(de[order(t), 1])$rn), my_sample],
annotation_col = my_group,
cluster_rows = FALSE,
cellwidth=25,
cellheight=15,
scale = "row"
)
The following section is based on the fgsea
tutorial but with my elaborations. The pathways are stored in
examplePathways
and the ranked gene list in
exampleRanks
.
TBC.
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] pheatmap_1.0.12 data.table_1.14.6 org.Mm.eg.db_3.16.0
[4] AnnotationDbi_1.60.0 IRanges_2.32.0 S4Vectors_0.36.1
[7] limma_3.54.1 GEOquery_2.66.0 Biobase_2.58.0
[10] BiocGenerics_0.44.0 fgsea_1.24.0 BiocManager_1.30.19
[13] workflowr_1.7.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.6.0 bit64_4.0.5
[4] RColorBrewer_1.1-3 httr_1.4.4 rprojroot_2.0.3
[7] GenomeInfoDb_1.34.7 tools_4.2.0 bslib_0.4.2
[10] utf8_1.2.2 R6_2.5.1 DBI_1.1.3
[13] colorspace_2.1-0 withr_2.5.0 tidyselect_1.2.0
[16] processx_3.8.0 curl_5.0.0 bit_4.0.5
[19] compiler_4.2.0 git2r_0.31.0 cli_3.6.0
[22] xml2_1.3.3 sass_0.4.5 scales_1.2.1
[25] readr_2.1.3 callr_3.7.3 stringr_1.5.0
[28] digest_0.6.31 R.utils_2.12.2 rmarkdown_2.20
[31] XVector_0.38.0 pkgconfig_2.0.3 htmltools_0.5.4
[34] highr_0.10 fastmap_1.1.0 rlang_1.0.6
[37] rstudioapi_0.14 RSQLite_2.2.20 farver_2.1.1
[40] jquerylib_0.1.4 generics_0.1.3 jsonlite_1.8.4
[43] BiocParallel_1.32.5 R.oo_1.25.0 dplyr_1.1.0
[46] RCurl_1.98-1.10 magrittr_2.0.3 GenomeInfoDbData_1.2.9
[49] Matrix_1.5-3 Rcpp_1.0.10 munsell_0.5.0
[52] fansi_1.0.4 R.methodsS3_1.8.2 lifecycle_1.0.3
[55] stringi_1.7.12 whisker_0.4.1 yaml_2.3.7
[58] zlibbioc_1.44.0 grid_4.2.0 blob_1.2.3
[61] parallel_4.2.0 promises_1.2.0.1 crayon_1.5.2
[64] lattice_0.20-45 Biostrings_2.66.0 cowplot_1.1.1
[67] hms_1.1.2 KEGGREST_1.38.0 knitr_1.42
[70] ps_1.7.2 pillar_1.8.1 codetools_0.2-18
[73] fastmatch_1.1-3 glue_1.6.2 evaluate_0.20
[76] getPass_0.2-2 png_0.1-8 vctrs_0.5.2
[79] tzdb_0.3.0 httpuv_1.6.8 gtable_0.3.1
[82] purrr_1.0.1 tidyr_1.3.0 cachem_1.0.6
[85] ggplot2_3.4.0 xfun_0.36 later_1.3.0
[88] tibble_3.1.8 memoise_2.0.1 ellipsis_0.3.2