Analysis of TCGA RPPA LIHC samples

Last updated: 2021-08-02

Checks: 7 0

Knit directory: Multispectral HCC/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20210728)

The command set.seed(20210728) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 21cf743

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 21cf743. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    code/
    Ignored:    data/
    Ignored:    old/
    Ignored:    output/

Untracked files:
    Untracked:  analysis/cores.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/tcga.Rmd) and HTML (docs/tcga.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	21cf743	Jovan Tanevski	2021-08-02	rank normalization, leiden analysis, diff exp
html	a386eaa	Jovan Tanevski	2021-07-28	Build site.
Rmd	9ab4acc	Jovan Tanevski	2021-07-28	Build site.
Rmd	bd05dbf	Jovan Tanevski	2021-07-28	add expression profiles
html	e45bf1d	Jovan Tanevski	2021-07-28	Build site.
Rmd	dcab1bf	Jovan Tanevski	2021-07-28	set figure output to svg
html	33e71a0	Jovan Tanevski	2021-07-28	Build site.
Rmd	bb998b8	Jovan Tanevski	2021-07-28	hclust with factoextra
html	1cb80af	Jovan Tanevski	2021-07-28	Build site.
Rmd	d0b6e0f	Jovan Tanevski	2021-07-28	add pca to tcga analysis
html	d73f586	Jovan Tanevski	2021-07-28	Build site.
Rmd	c4324f5	Jovan Tanevski	2021-07-28	add umap
html	0bfae68	Jovan Tanevski	2021-07-28	Build site.
Rmd	82e3712	Jovan Tanevski	2021-07-28	add basic hclust

Setup

Load required libraries.

library(tidyverse)
library(skimr)
library(uwot)
library(factoextra)
library(cowplot)
library(FNN)
library(igraph)
library(leiden)
library(limma)

Read filtered TCGA RRPA data and display summary statistics.

tcga.raw <- read_csv("data/TCGA-RPPA-LIHC_selected.csv", col_types = cols()) %>%
  select(-TumorType) %>%
  column_to_rownames("SampleID")

skim(tcga.raw)

Data summary
Name	tcga.raw
Number of rows	184
Number of columns	12
_______________________
Column type frequency:
numeric	12
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
P53	1	-1.13	0.46	-2.54	-1.40	-0.96	-0.76	-0.51	▁▂▂▅▇
AKT_pS473	1	-0.63	0.71	-2.76	-1.02	-0.41	-0.15	0.96	▁▂▃▇▁
AKT_pT308	1	0.07	0.42	-1.16	-0.07	0.09	0.22	1.72	▁▃▇▁▁
BETACATENIN	1	1.50	0.72	-1.30	1.08	1.51	1.87	3.70	▁▁▇▅▁
JNK_pT183Y185	1	-0.21	0.29	-1.14	-0.36	-0.17	-0.04	0.48	▁▂▇▇▂
MEK1_pS217S221	1	-0.17	0.36	-0.82	-0.35	-0.23	-0.10	2.58	▇▃▁▁▁
P38_pT180Y182	1	0.49	0.57	-1.45	0.31	0.50	0.72	2.95	▁▂▇▁▁
P70S6K_pT389	1	-1.21	0.75	-3.22	-1.58	-0.95	-0.65	1.09	▂▃▇▃▁
PDK1_pS241	1	0.40	0.31	-0.56	0.23	0.37	0.55	1.62	▁▇▇▂▁
S6_pS235S236	1	-0.72	0.75	-3.53	-1.06	-0.67	-0.36	1.12	▁▁▅▇▂
YAP_pS127	1	2.21	0.61	0.68	1.78	2.10	2.54	4.05	▁▇▇▂▁
TRANSGLUTAMINASE	1	-0.42	0.56	-1.24	-0.83	-0.59	-0.20	2.72	▇▃▁▁▁

Rank normalize as suggested in https://gdac.broadinstitute.org/runs/analyses__2016_01_28/reports/cancer/LIHC/RPPA_Clustering_CNMF/nozzle.html

tcga <- mutate_all(tcga.raw, rank)

Dimensionality reduction

tcga.pca <- prcomp(tcga)
summary(tcga.pca)

Importance of components:
                            PC1     PC2      PC3      PC4      PC5      PC6
Standard deviation     112.2545 66.8808 57.91412 54.79530 52.30290 45.98226
Proportion of Variance   0.3702  0.1314  0.09853  0.08821  0.08036  0.06211
Cumulative Proportion    0.3702  0.5016  0.60012  0.68833  0.76869  0.83081
                           PC7      PC8      PC9     PC10     PC11    PC12
Standard deviation     41.5038 35.65454 34.96764 28.14257 23.00992 14.8757
Proportion of Variance  0.0506  0.03735  0.03592  0.02327  0.01555  0.0065
Cumulative Proportion   0.8814  0.91876  0.95468  0.97795  0.99350  1.0000

tcga.umap <- umap(tcga, n_neighbors = 5, n_epochs = 1000)

Hierarchical clustering

Perform hierarchical clustering of the data and plot the resulting dendrogram

tcga.hclust <- eclust(tcga, "hclust")

Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

fviz_dend(tcga.hclust, rect = TRUE)

Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
"none")` instead.

Version	Author	Date
a386eaa	Jovan Tanevski	2021-07-28

fviz_gap_stat(tcga.hclust$gap_stat)

fviz_silhouette(tcga.hclust) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

  cluster size ave.sil.width
1       1   43          0.05
2       2   23          0.25
3       3   17          0.34
4       4   40          0.03
5       5   35          0.02
6       6   12          0.16
7       7   14          0.01

Plot in 2D PCA and UMAP

fviz_pca_ind(tcga.pca, geom = "point", col.ind = as.factor(tcga.hclust$cluster)) +
  theme_classic()

tcga.umap.clus <-
  tcga.umap %>%
  cbind(tcga.hclust$cluster) %>%
  `colnames<-`(c("U1", "U2", "Cluster")) %>%
  as_tibble() %>%
  mutate_at("Cluster", as.factor)

ggplot(tcga.umap.clus, aes(x = U1, y = U2, color = Cluster)) +
  geom_point() +
  theme_classic()

Expression profiles per cluster

tcga.clustered <- tcga.raw %>%
  mutate(Cluster = as.factor(tcga.hclust$cluster)) %>%
  pivot_longer(names_to = "Marker", values_to = "Z", -Cluster)

profiles <- seq_len(tcga.hclust$nbclust) %>% map(~
ggplot(
  tcga.clustered %>% filter(Cluster == .x),
  aes(x = Marker, y = Z, color = Marker)
) +
  stat_summary(fun.data = mean_sdl, show.legend = FALSE) +
  ylim(-3, 3) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)))

plot_grid(plotlist = profiles, labels = paste("Cluster", seq_len(tcga.hclust$nbclust)))

Warning: Removed 17 rows containing non-finite values (stat_summary).

Warning: Removed 2 rows containing missing values (geom_segment).

Warning: Removed 2 rows containing non-finite values (stat_summary).

Warning: Removed 6 rows containing non-finite values (stat_summary).

Warning: Removed 2 rows containing missing values (geom_segment).

Warning: Removed 4 rows containing non-finite values (stat_summary).

Warning: Removed 1 rows containing missing values (geom_segment).

Warning: Removed 4 rows containing non-finite values (stat_summary).

Warning: Removed 1 rows containing missing values (geom_segment).

Differential expression analysis

design <- model.matrix(~0 + as.factor(tcga.hclust$cluster))
colnames(design) <- paste0("Cluster", seq_len(tcga.hclust$nbclust))

tcga.hclust.limma <- lmFit(t(tcga.raw), design = design)

tcga.hclust.limma$coefficients

                    Cluster1    Cluster2     Cluster3    Cluster4    Cluster5
P53              -0.83287575 -1.63930495 -0.686920525 -0.77840188 -1.45188350
AKT_pS473        -0.30863644 -1.33879677 -0.006580675 -0.21834276 -1.32477306
AKT_pT308         0.08732521 -0.27177766  0.218564880  0.18469631 -0.29203272
BETACATENIN       1.57155636  2.31044844  1.039230839  1.56262878  1.21849982
JNK_pT183Y185    -0.11316242 -0.53603513 -0.020171999 -0.06052635 -0.32400742
MEK1_pS217S221   -0.24976273 -0.31018141 -0.253137701 -0.23604609  0.05052373
P38_pT180Y182     0.51223883  0.02056018  0.769874692  0.58845685  0.63984603
P70S6K_pT389     -0.77571779 -1.94795561 -0.534009436 -0.69535853 -1.62072380
PDK1_pS241        0.38756470  0.59045467  0.116197084  0.16495428  0.63423716
S6_pS235S236     -0.67758055 -1.43636690 -0.361153007 -0.40006282 -0.97855764
YAP_pS127         2.03602902  3.00731625  1.850386691  2.08600134  2.04093606
TRANSGLUTAMINASE -0.74859832 -0.22338223 -0.911213400 -0.13653922 -0.45593106
                    Cluster6    Cluster7
P53              -1.59199405 -1.55125427
AKT_pS473        -1.36858380 -0.01857768
AKT_pT308         0.16397916  0.89903051
BETACATENIN       0.69812217  1.74893447
JNK_pT183Y185    -0.18383498 -0.40070734
MEK1_pS217S221    0.26605581 -0.29766097
P38_pT180Y182    -0.02873231  0.58069604
P70S6K_pT389     -2.07460770 -1.91867762
PDK1_pS241        0.31715543  0.63407898
S6_pS235S236     -0.18619595 -0.86118068
YAP_pS127         2.57126673  2.33398982
TRANSGLUTAMINASE  0.18560297 -0.40427359

tests.hclust <- decideTests(eBayes(tcga.hclust.limma))

tests.hclust@.Data

                 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
P53                    -1       -1       -1       -1       -1       -1       -1
AKT_pS473              -1       -1        0       -1       -1       -1        0
AKT_pT308               1       -1        1        1       -1        0        1
BETACATENIN             1        1        1        1        1        1        1
JNK_pT183Y185          -1       -1        0        0       -1       -1       -1
MEK1_pS217S221         -1       -1       -1       -1        0        1       -1
P38_pT180Y182           1        0        1        1        1        0        1
P70S6K_pT389           -1       -1       -1       -1       -1       -1       -1
PDK1_pS241              1        1        0        1        1        1        1
S6_pS235S236           -1       -1       -1       -1       -1        0       -1
YAP_pS127               1        1        1        1        1        1        1
TRANSGLUTAMINASE       -1       -1       -1        0       -1        0       -1

summary(tests.hclust)

       Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
Down          7        8        5        5        7        4        6
NotSig        0        1        3        2        1        4        1
Up            5        3        4        5        4        4        5

Similarity and graph based clustering

Calculate leiden clustering based on weighted shared nearest neighbor graph

tcga.knn <- knn.index(tcga, 10)
jaccard.cuttoff <- 0.1

snn <- seq_len(nrow(tcga.knn) - 1) %>% map_dfr(\(id){
  to <- seq(id + 1, nrow(tcga.knn))
  jaccard <- to %>%
    map_dbl(~ length(intersect(tcga.knn[id, ], tcga.knn[.x, ])) /
      length(union(tcga.knn[id, ], tcga.knn[.x, ])))
  
  tibble(from = id, to = to, weight = jaccard) %>% filter(jaccard >= jaccard.cuttoff)
})

leiden.clusters <- graph_from_data_frame(snn, directed = FALSE) %>% leiden()