repeatmasker

Last updated: 2025-09-23

Checks: 7 0

Knit directory: DXR_continue/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20250701)

The command set.seed(20250701) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 6b65c26

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 6b65c26. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/Cormotif_data/
    Ignored:    data/DER_data/
    Ignored:    data/Other_paper_data/
    Ignored:    data/TE_annotation/
    Ignored:    data/alignment_summary.txt
    Ignored:    data/all_peak_final_dataframe.txt
    Ignored:    data/cell_line_info_.tsv
    Ignored:    data/full_summary_QC_metrics.txt
    Ignored:    data/motif_lists/
    Ignored:    data/number_frag_peaks_summary.txt

Untracked files:
    Untracked:  analysis/Top2a_Top2b_expression.Rmd
    Untracked:  analysis/maps_and_plots.Rmd
    Untracked:  analysis/proteomics.Rmd
    Untracked:  code/making_analysis_file_summary.R

Unstaged changes:
    Modified:   analysis/Outlier_removal.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/repeatmasker_data.Rmd) and HTML (docs/repeatmasker_data.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	6b65c26	reneeisnowhere	2025-09-23	wflow_publish("analysis/repeatmasker_data.Rmd")
html	fe42ebc	reneeisnowhere	2025-09-16	Build site.
Rmd	6e58d85	reneeisnowhere	2025-09-16	first setup
html	967b7a8	reneeisnowhere	2025-09-16	Build site.
Rmd	48f3b75	reneeisnowhere	2025-09-16	first setup
Rmd	06d2af1	reneeisnowhere	2025-09-16	wflow_git_commit("analysis/repeatmasker_data.Rmd")

libraries

library(tidyverse)
library(DT)
library(genomation)
library(GenomicRanges)
library(BiocParallel)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(ChIPseeker)
library(plyranges)
library(ggsignif)
library(ggpubr)

Loading data files

repeatmasker <- read_delim("data/Other_paper_data/repeatmasker_20250911.txt", 
    delim = "\t", escape_double = FALSE, 
    trim_ws = TRUE)

peakAnnoList_H3K27ac <- readRDS("data/motif_lists/H3K27ac_annotated_peaks.RDS")
peakAnnoList_H3K27me3 <- readRDS("data/motif_lists/H3K27me3_annotated_peaks.RDS")
peakAnnoList_H3K36me3 <- readRDS("data/motif_lists/H3K36me3_annotated_peaks.RDS")
peakAnnoList_H3K9me3 <- readRDS("data/motif_lists/H3K9me3_annotated_peaks.RDS")

rmskr_std_gr <- repeatmasker %>% 
   makeGRangesFromDataFrame(., seqnames="genoName", keep.extra.columns=TRUE,
                                      start.field="genoStart",
                                      end.field="genoEnd",
                                      starts.in.df.are.0based=TRUE) %>% 
  keepStandardChromosomes(., pruning.mode = "coarse")

##checking chromosomes
#seqlevels(rmskr_std_gr)

###making granges lists out of motifs

H3K27ac_sets_gr <- lapply(peakAnnoList_H3K27ac, function(df) {
  as_granges(df)
})

H3K27me3_sets_gr <- lapply(peakAnnoList_H3K27me3, function(df) {
  as_granges(df)
})

H3K36me3_sets_gr <- lapply(peakAnnoList_H3K36me3, function(df) {
  as_granges(df)
})

H3K9me3_sets_gr <- lapply(peakAnnoList_H3K9me3, function(df) {
  as_granges(df)
})

H3K27ac and TEs

## this is now taking the overlaps and widths of the TE and roi, then calculating percentage overlaps for each (pct_te_overlap and pct_roi_overlap)
# H3K27ac_ols <- lapply(H3K27ac_sets_gr, function(df) {
#     overlaps <- join_overlap_inner(df, rmskr_std_gr)
#    # overlap base pairs
#   })
H3K27ac_ols <- lapply(H3K27ac_sets_gr, function(df) {
  
   # 1️⃣ find overlaps
  hits <- findOverlaps(df, rmskr_std_gr)
  
  # 2️⃣ intersect the overlapping ranges
  overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
  
  # 3️⃣ convert to data frame
  overlaps_df <- as.data.frame(overlaps_gr)
  
  # 4️⃣ add overlap length
  overlaps_df$overlap_bp <- width(overlaps_gr)
  
  # 5️⃣ ROI lengths & % coverage
  roi_widths <- width(df)
  names(roi_widths) <- df$Peakid
  overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
  overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
  
  # 6️⃣ TE lengths & % coverage
  te_widths <- width(rmskr_std_gr)
  overlaps_df$te_len <- te_widths[subjectHits(hits)]
  overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
  
  # 7️⃣ Add TE metadata from rmsk GRanges
  overlaps_df$repName   <- rmskr_std_gr$repName[subjectHits(hits)]
  overlaps_df$repClass  <- rmskr_std_gr$repClass[subjectHits(hits)]
  overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
  overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
  overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
  overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
  
  overlaps_df
})


H3K27ac_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 8 × 2
      n    nn
  <int> <int>
1     1 40854
2     2 30144
3     3 12604
4     4  4120
5     5  1127
6     6   171
7     7     5
8     8     1

H3K27ac_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 89,026 × 2
   Peakid                    repClass                       
   <chr>                     <chr>                          
 1 chr10:100005111-100005718 LINE;SINE                      
 2 chr10:100006009-100007728 SINE;LINE                      
 3 chr10:100008216-100011014 SINE;Simple_repeat;Unknown;LINE
 4 chr10:100043350-100044800 SINE;DNA;Low_complexity        
 5 chr10:100046129-100046853 LTR;SINE                       
 6 chr10:100184879-100187424 DNA                            
 7 chr10:100199091-100199540 SINE                           
 8 chr10:10020802-10021452   SINE;LTR                       
 9 chr10:100208189-100209822 SINE;LTR                       
10 chr10:100226355-100227311 LINE                           
# ℹ 89,016 more rows

H3K27ac_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 6 × 2
      n    nn
  <int> <int>
1     1   513
2     2   411
3     3   195
4     4    38
5     5     8
6     6     1

H3K27ac_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 1,166 × 2
   Peakid                    repClass                   
   <chr>                     <chr>                      
 1 chr10:103196972-103198089 LINE;SINE;Low_complexity   
 2 chr10:103898324-103899678 LINE;SINE                  
 3 chr10:104775314-104776084 SINE;LINE                  
 4 chr10:105604321-105604967 SINE;LINE;LTR;Simple_repeat
 5 chr10:11058814-11060583   SINE;DNA                   
 6 chr10:110845092-110847119 SINE;LINE;Simple_repeat    
 7 chr10:110865305-110866546 SINE                       
 8 chr10:110946871-110947695 LINE                       
 9 chr10:113326142-113327412 SINE                       
10 chr10:113524679-113525161 SINE                       
# ℹ 1,156 more rows

H3K27ac_ols$Set_3 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 6 × 2
      n    nn
  <int> <int>
1     1  2303
2     2  2054
3     3   974
4     4   317
5     5    67
6     6     9

H3K27ac_ols$Set_3 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 5,724 × 2
   Peakid                    repClass                    
   <chr>                     <chr>                       
 1 chr10:100738942-100739973 Simple_repeat               
 2 chr10:100790295-100791877 Low_complexity;Simple_repeat
 3 chr10:101132954-101134846 Simple_repeat               
 4 chr10:101157763-101159378 RC;LTR                      
 5 chr10:101227112-101229171 Low_complexity;Simple_repeat
 6 chr10:101230211-101230842 Simple_repeat               
 7 chr10:101557797-101558624 SINE                        
 8 chr10:101995380-101996279 LINE                        
 9 chr10:102143712-102144701 LINE                        
10 chr10:102158807-102161487 SINE;Simple_repeat;LTR      
# ℹ 5,714 more rows

All regions profile

H3K27ac_ols$all_H3K27ac %>% 
   as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 8 × 2
      n    nn
  <int> <int>
1     1 55458
2     2 41836
3     3 17564
4     4  5519
5     5  1416
6     6   203
7     7     7
8     8     1

H3K27ac_ols$all_H3K27ac %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 122,004 × 2
   Peakid                    repClass                       
   <chr>                     <chr>                          
 1 chr10:100005111-100005718 LINE;SINE                      
 2 chr10:100006009-100007728 SINE;LINE                      
 3 chr10:100008216-100011014 SINE;Simple_repeat;Unknown;LINE
 4 chr10:100043350-100044800 SINE;DNA;Low_complexity        
 5 chr10:100046129-100046853 LTR;SINE                       
 6 chr10:100179689-100180264 SINE                           
 7 chr10:100184879-100187424 DNA                            
 8 chr10:100199091-100199540 SINE                           
 9 chr10:10020802-10021452   SINE;LTR                       
10 chr10:100208189-100209822 SINE;LTR                       
# ℹ 121,994 more rows

length(H3K27ac_sets_gr$all_H3K27ac)

[1] 150071

H3K27me3 and TEs

H3K27me3_ols <- lapply(H3K27me3_sets_gr, function(df) {
   # 1️⃣ find overlaps
  hits <- findOverlaps(df, rmskr_std_gr)
  
  # 2️⃣ intersect the overlapping ranges
  overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
  
  # 3️⃣ convert to data frame
  overlaps_df <- as.data.frame(overlaps_gr)
  
  # 4️⃣ add overlap length
  overlaps_df$overlap_bp <- width(overlaps_gr)
  
  # 5️⃣ ROI lengths & % coverage
  roi_widths <- width(df)
  names(roi_widths) <- df$Peakid
  overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
  overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
  
  # 6️⃣ TE lengths & % coverage
  te_widths <- width(rmskr_std_gr)
  overlaps_df$te_len <- te_widths[subjectHits(hits)]
  overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
  
  # 7️⃣ Add TE metadata from rmsk GRanges
  overlaps_df$repName   <- rmskr_std_gr$repName[subjectHits(hits)]
  overlaps_df$repClass  <- rmskr_std_gr$repClass[subjectHits(hits)]
  overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
  overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
  overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
  overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
  
  overlaps_df
})


H3K27me3_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 8 × 2
      n    nn
  <int> <int>
1     1 59299
2     2 34090
3     3 12830
4     4  5071
5     5  1774
6     6   379
7     7    35
8     8     1

H3K27me3_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 113,479 × 2
   Peakid                    repClass                    
   <chr>                     <chr>                       
 1 chr10:100046102-100046460 LTR;SINE                    
 2 chr10:100054231-100054520 SINE                        
 3 chr10:100078858-100079126 LINE                        
 4 chr10:100084784-100089581 SINE;LINE;LTR;Low_complexity
 5 chr10:100113114-100113630 SINE                        
 6 chr10:100113867-100114131 SINE                        
 7 chr10:100114436-100115458 LINE                        
 8 chr10:100127919-100128315 LTR;LINE                    
 9 chr10:100129022-100129787 LINE;SINE                   
10 chr10:100229003-100229494 Simple_repeat               
# ℹ 113,469 more rows

H3K27me3_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 3 × 2
      n    nn
  <int> <int>
1     1   104
2     2    41
3     3     8

H3K27me3_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 153 × 2
   Peakid                    repClass               
   <chr>                     <chr>                  
 1 chr10:104424708-104425325 LINE;SINE              
 2 chr10:113524664-113525075 SINE                   
 3 chr10:129076374-129076858 Simple_repeat          
 4 chr10:130134682-130135251 LINE                   
 5 chr10:132175296-132177004 LINE;Simple_repeat;tRNA
 6 chr10:133324189-133324746 Simple_repeat          
 7 chr10:1398800-1399241     LINE                   
 8 chr10:27256098-27256641   SINE;LTR               
 9 chr10:29968825-29969238   Simple_repeat          
10 chr10:30348778-30349159   Simple_repeat          
# ℹ 143 more rows

All regions profile

H3K27me3_ols$all_H3K27me3 %>% 
   as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 8 × 2
      n    nn
  <int> <int>
1     1 59987
2     2 34344
3     3 12893
4     4  5078
5     5  1775
6     6   379
7     7    35
8     8     1

H3K27me3_ols$all_H3K27me3 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 114,492 × 2
   Peakid                    repClass                    
   <chr>                     <chr>                       
 1 chr10:100046102-100046460 LTR;SINE                    
 2 chr10:100054231-100054520 SINE                        
 3 chr10:100078858-100079126 LINE                        
 4 chr10:100084784-100089581 SINE;LINE;LTR;Low_complexity
 5 chr10:100113114-100113630 SINE                        
 6 chr10:100113867-100114131 SINE                        
 7 chr10:100114436-100115458 LINE                        
 8 chr10:100127919-100128315 LTR;LINE                    
 9 chr10:100129022-100129787 LINE;SINE                   
10 chr10:100229003-100229494 Simple_repeat               
# ℹ 114,482 more rows

length(H3K27me3_sets_gr$all_H3K27me3)

[1] 150464

H3K36me3 and TEs

H3K36me3_ols <- lapply(H3K36me3_sets_gr, function(df) {
  # 1️⃣ find overlaps
  hits <- findOverlaps(df, rmskr_std_gr)
  
  # 2️⃣ intersect the overlapping ranges
  overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
  
  # 3️⃣ convert to data frame
  overlaps_df <- as.data.frame(overlaps_gr)
  
  # 4️⃣ add overlap length
  overlaps_df$overlap_bp <- width(overlaps_gr)
  
  # 5️⃣ ROI lengths & % coverage
  roi_widths <- width(df)
  names(roi_widths) <- df$Peakid
  overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
  overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
  
  # 6️⃣ TE lengths & % coverage
  te_widths <- width(rmskr_std_gr)
  overlaps_df$te_len <- te_widths[subjectHits(hits)]
  overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
  
  # 7️⃣ Add TE metadata from rmsk GRanges
  overlaps_df$repName   <- rmskr_std_gr$repName[subjectHits(hits)]
  overlaps_df$repClass  <- rmskr_std_gr$repClass[subjectHits(hits)]
  overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
  overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
  overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
  overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
  
  overlaps_df
})

H3K36me3_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 7 × 2
      n    nn
  <int> <int>
1     1 61918
2     2 24560
3     3  5127
4     4  1062
5     5   192
6     6    22
7     7     1

H3K36me3_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 92,882 × 2
   Peakid                    repClass          
   <chr>                     <chr>             
 1 chr10:100009339-100010339 SINE;Simple_repeat
 2 chr10:100139015-100139306 LINE              
 3 chr10:100147443-100147738 LINE              
 4 chr10:100196474-100197370 SINE;DNA;LINE     
 5 chr10:100198988-100199626 SINE              
 6 chr10:100201468-100201937 DNA;SINE          
 7 chr10:100202789-100203153 SINE;LINE         
 8 chr10:100203832-100204198 LTR               
 9 chr10:100208051-100208777 SINE;LTR          
10 chr10:100214861-100215206 LINE;SINE         
# ℹ 92,872 more rows

H3K36me3_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 6 × 2
      n    nn
  <int> <int>
1     1   628
2     2   359
3     3   148
4     4    43
5     5    18
6     6     5

H3K36me3_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 1,201 × 2
   Peakid                    repClass               
   <chr>                     <chr>                  
 1 chr10:110729433-110730934 LINE;SINE;Simple_repeat
 2 chr10:110803615-110804488 SINE;LTR               
 3 chr10:114151594-114151957 DNA                    
 4 chr10:117139090-117139567 Simple_repeat          
 5 chr10:117221071-117221394 DNA                    
 6 chr10:118698372-118698876 SINE;LTR               
 7 chr10:122207358-122208228 DNA                    
 8 chr10:123157127-123157564 LINE                   
 9 chr10:124613974-124614749 SINE                   
10 chr10:124670751-124671297 LINE;LTR               
# ℹ 1,191 more rows

All regions profile

H3K36me3_ols$all_H3K36me3 %>% 
   as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 7 × 2
      n    nn
  <int> <int>
1     1 86790
2     2 34976
3     3  7463
4     4  1502
5     5   284
6     6    39
7     7     3

H3K36me3_ols$all_H3K36me3 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 131,057 × 2
   Peakid                    repClass          
   <chr>                     <chr>             
 1 chr10:100009339-100010339 SINE;Simple_repeat
 2 chr10:100139015-100139306 LINE              
 3 chr10:100147443-100147738 LINE              
 4 chr10:100153009-100153887 SINE;LINE         
 5 chr10:100173135-100173409 DNA               
 6 chr10:100186943-100187317 DNA               
 7 chr10:100196474-100197370 SINE;DNA;LINE     
 8 chr10:100198988-100199626 SINE              
 9 chr10:100201468-100201937 DNA;SINE          
10 chr10:100202789-100203153 SINE;LINE         
# ℹ 131,047 more rows

length(H3K36me3_sets_gr$all_H3K36me3)

[1] 186724

H3K9me3 and TEs

H3K9me3_ols <- lapply(H3K9me3_sets_gr, function(df) {
  # 1️⃣ find overlaps
  hits <- findOverlaps(df, rmskr_std_gr)
  
  # 2️⃣ intersect the overlapping ranges
  overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
  
  # 3️⃣ convert to data frame
  overlaps_df <- as.data.frame(overlaps_gr)
  
  # 4️⃣ add overlap length
  overlaps_df$overlap_bp <- width(overlaps_gr)
  
  # 5️⃣ ROI lengths & % coverage
  roi_widths <- width(df)
  names(roi_widths) <- df$Peakid
  overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
  overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
  
  # 6️⃣ TE lengths & % coverage
  te_widths <- width(rmskr_std_gr)
  overlaps_df$te_len <- te_widths[subjectHits(hits)]
  overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
  
  # 7️⃣ Add TE metadata from rmsk GRanges
  overlaps_df$repName   <- rmskr_std_gr$repName[subjectHits(hits)]
  overlaps_df$repClass  <- rmskr_std_gr$repClass[subjectHits(hits)]
  overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
  overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
  overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
  overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
  
  overlaps_df
})

H3K9me3_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 6 × 2
      n     nn
  <int>  <int>
1     1 107939
2     2  32414
3     3   4976
4     4    574
5     5     69
6     6      2

H3K9me3_ols$Set_1 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 145,974 × 2
   Peakid                    repClass              
   <chr>                     <chr>                 
 1 chr10:100009524-100010017 Simple_repeat         
 2 chr10:100025478-100025829 LTR                   
 3 chr10:100027743-100029126 LTR;SINE              
 4 chr10:100046119-100046433 LTR;SINE              
 5 chr10:100086477-100087363 LINE;LTR;SINE         
 6 chr10:100095279-100095524 LINE                  
 7 chr10:100097281-100098436 SINE;LTR              
 8 chr10:100133860-100134292 LINE;Simple_repeat;LTR
 9 chr10:10020906-10021393   SINE;LTR              
10 chr10:100229039-100229275 Simple_repeat         
# ℹ 145,964 more rows

H3K9me3_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 4 × 2
      n    nn
  <int> <int>
1     1  1362
2     2   266
3     3    26
4     4     2

H3K9me3_ols$Set_2 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 1,656 × 2
   Peakid                    repClass      
   <chr>                     <chr>         
 1 chr10:101006128-101006429 SINE          
 2 chr10:101777876-101778344 LINE          
 3 chr10:101951337-101951612 SINE          
 4 chr10:102418095-102418389 Simple_repeat 
 5 chr10:104329400-104329651 LINE          
 6 chr10:1101897-1102240     LINE          
 7 chr10:110227445-110227710 Low_complexity
 8 chr10:114786610-114786850 Retroposon    
 9 chr10:117531528-117532237 SINE          
10 chr10:11893983-11894609   SINE          
# ℹ 1,646 more rows

H3K9me3_ols$Set_3 %>% 
  as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 5 × 2
      n    nn
  <int> <int>
1     1   405
2     2   119
3     3    26
4     4     4
5     5     1

H3K9me3_ols$Set_3 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 555 × 2
   Peakid                    repClass          
   <chr>                     <chr>             
 1 chr10:114252483-114252701 SINE              
 2 chr10:115903919-115904555 LTR               
 3 chr10:124611053-124611641 SINE              
 4 chr10:1298991-1300057     SINE;LTR          
 5 chr10:1300214-1301077     LINE;SINE         
 6 chr10:132172535-132172962 LINE;Simple_repeat
 7 chr10:132279104-132279514 SINE              
 8 chr10:132363329-132364326 Simple_repeat     
 9 chr10:132757314-132757720 DNA               
10 chr10:24446759-24447078   SINE              
# ℹ 545 more rows

All regions profile

H3K9me3_ols$all_H3K9me3 %>% 
   as.data.frame() %>% 
  group_by(Peakid,repClass) %>% 
  summarise() %>% 
  group_by(Peakid) %>% tally %>% 
  group_by(n) %>% tally

# A tibble: 6 × 2
      n     nn
  <int>  <int>
1     1 117258
2     2  34507
3     3   5210
4     4    591
5     5     70
6     6      2

H3K9me3_ols$all_H3K9me3 %>% 
  as.data.frame() %>% 
  group_by(Peakid) %>%
  summarize(., 
            repClass=paste0(unique(repClass), collapse=";"))

# A tibble: 157,638 × 2
   Peakid                    repClass              
   <chr>                     <chr>                 
 1 chr10:100006461-100006612 SINE;LINE             
 2 chr10:100009524-100010017 Simple_repeat         
 3 chr10:100025478-100025829 LTR                   
 4 chr10:100027743-100029126 LTR;SINE              
 5 chr10:100046119-100046433 LTR;SINE              
 6 chr10:100086477-100087363 LINE;LTR;SINE         
 7 chr10:100095279-100095524 LINE                  
 8 chr10:100097281-100098436 SINE;LTR              
 9 chr10:100133860-100134292 LINE;Simple_repeat;LTR
10 chr10:10020906-10021393   SINE;LTR              
# ℹ 157,628 more rows

length(H3K9me3_sets_gr$all_H3K9me3)

[1] 218647

Counting TE types:

Below I am setting up the data frames, first I take all regions and annotate with the overlapping dataframe. Then I make new columns, TE_status, SINE_status, LINE_status, LTR_status, DNA_status, SVA_status. These columns mark Peakids if they have this attribute. Last, I am annotating a cluster column to add which set the Peakid exists in by histone. (field will be Set1, Set2, Set3, no_cluster)

All_H3K27ac <- H3K27ac_sets_gr$all_H3K27ac %>% as.data.frame()
All_H3K27me3 <- H3K27me3_sets_gr$all_H3K27me3 %>% as.data.frame()
All_H3K36me3 <- H3K36me3_sets_gr$all_H3K36me3 %>% as.data.frame()
All_H3K9me3 <- H3K9me3_sets_gr$all_H3K9me3 %>% as.data.frame()


H3K27ac_lookup <- imap_dfr(peakAnnoList_H3K27ac[1:3], ~
  tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
H3K27me3_lookup <- imap_dfr(peakAnnoList_H3K27me3[1:2], ~
  tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
H3K36me3_lookup <- imap_dfr(peakAnnoList_H3K36me3[1:2], ~
  tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
H3K9me3_lookup <- imap_dfr(peakAnnoList_H3K9me3[1:3], ~
  tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
  
annotated_H3K27ac <- All_H3K27ac %>% 
  left_join(., (H3K27ac_ols$all_H3K27ac %>% 
                  as.data.frame() %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass), collapse=";"),
                            repFamily=paste0(unique(repFamily),collapse=";"),
                            repName=paste0(unique(repName),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K27ac_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


annotated_H3K27me3 <- All_H3K27me3 %>% 
  left_join(., (H3K27me3_ols$all_H3K27me3 %>% 
                  as.data.frame() %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass), collapse=";"),
                            repFamily=paste0(unique(repFamily),collapse=";"),
                            repName=paste0(unique(repName),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K27me3_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


annotated_H3K36me3 <- All_H3K36me3 %>% 
  left_join(., (H3K36me3_ols$all_H3K36me3 %>% 
                  as.data.frame() %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass), collapse=";"),
                            repFamily=paste0(unique(repFamily),collapse=";"),
                            repName=paste0(unique(repName),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K36me3_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


annotated_H3K9me3 <- All_H3K9me3 %>% 
  left_join(., (H3K9me3_ols$all_H3K9me3 %>% 
                  as.data.frame() %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass), collapse=";"),
                            repFamily=paste0(unique(repFamily),collapse=";"),
                            repName=paste0(unique(repName),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K9me3_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))

# annotated_tables <- list("annotated_H3K27ac"=annotated_H3K27ac,
#                          "annotated_H3K27me3"=annotated_H3K27me3,
#                          "annotated_H3K36me3"=annotated_H3K36me3,
#                          "annotated_H3K9me3"=annotated_H3K9me3)
# 
# saveRDS(annotated_tables,"data/TE_annotation/annotated_tables_1bp_ol.RDS")

50% counting histones

annotated_H3K27ac_50_per <- All_H3K27ac %>%
  left_join(., (H3K27ac_ols$all_H3K27ac %>% 
                  as.data.frame() %>%
                  dplyr::filter(pct_te_overlap>50) %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
                            repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
                            repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K27ac_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


annotated_H3K27me3_50_per <- All_H3K27me3 %>% 
  left_join(., (H3K27me3_ols$all_H3K27me3 %>% 
                  as.data.frame() %>%
                  dplyr::filter(pct_te_overlap>50) %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
                            repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
                            repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K27me3_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


annotated_H3K36me3_50_per <- All_H3K36me3 %>% 
  left_join(., (H3K36me3_ols$all_H3K36me3 %>% 
                  as.data.frame() %>% 
                  dplyr::filter(pct_te_overlap>50) %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
                            repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
                            repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K36me3_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


annotated_H3K9me3_50_per <- All_H3K9me3 %>% 
  left_join(., (H3K9me3_ols$all_H3K9me3 %>% 
                  as.data.frame() %>% 
                  dplyr::filter(pct_te_overlap>50) %>% 
                  group_by(Peakid) %>% 
                  summarize(., 
                            repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
                            repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
                            repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
              mutate(TE_status = if_else(is.na(repClass),   "wnot_TE","TE")) %>%
              mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
                                             str_detect(repClass, "SINE") ~ "SINE",
                                             TRUE ~"wnot_SINE")) %>% 
              mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
                                             str_detect(repClass, "LINE") ~ "LINE",
                                             TRUE ~"wnot_LINE")) %>%
              mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
                                            str_detect(repClass, "LTR") ~ "LTR",
                                            TRUE ~"wnot_LTR")) %>%
               mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
                                 str_detect(repClass, "DNA") ~ "DNA",
                                 TRUE ~"wnot_DNA")) %>% 
          mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
                                        str_detect(repClass, "Retroposon") ~ "SVA",
                                        TRUE ~"wnot_SVA")) %>% 
  left_join(H3K9me3_lookup, by = "Peakid") %>%
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))


# annotated_tables_50_per <- list("annotated_H3K27ac_50_per"=annotated_H3K27ac_50_per,
#                          "annotated_H3K27me3_50_per"=annotated_H3K27me3_50_per,
#                          "annotated_H3K36me3_50_per"=annotated_H3K36me3_50_per,
#                          "annotated_H3K9me3_50_per"=annotated_H3K9me3_50_per)
# 
# saveRDS(annotated_tables_50_per,"data/TE_annotation/annotated_tables_50per_ol.RDS")

te_cols <- c("TE_status","SINE_status","LINE_status","DNA_status","LTR_status","SVA_status")
H3K27ac_counts <- map(te_cols, ~ {
  annotated_H3K27ac %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)


H3K27ac_counts_long <- map2_dfr(
  H3K27ac_counts,      # the named list of wide tables
  names(H3K27ac_counts), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / wnot_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)
H3K27ac_counts_long <- H3K27ac_counts_long %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()


H3K27me3_counts <-  map(te_cols, ~ {
  annotated_H3K27me3 %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)

H3K27me3_counts_long <- map2_dfr(
  H3K27me3_counts,      # the named list of wide tables
  names(H3K27me3_counts), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / not_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)

H3K27me3_counts_long <- H3K27me3_counts_long %>%
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()

H3K36me3_counts <-  map(te_cols, ~ {
  annotated_H3K36me3 %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)


H3K36me3_counts_long <- map2_dfr(
  H3K36me3_counts,      # the named list of wide tables
  names(H3K36me3_counts), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / not_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)

H3K36me3_counts_long <- H3K36me3_counts_long %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()

H3K9me3_counts <- map(te_cols, ~ {
  annotated_H3K9me3 %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)

H3K9me3_counts_long <- map2_dfr(
  H3K9me3_counts,      # the named list of wide tables
  names(H3K9me3_counts), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / wnot_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)

H3K9me3_counts_long <- H3K9me3_counts_long %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()

te_cols <- c("TE_status","SINE_status","LINE_status","DNA_status","LTR_status","SVA_status")
H3K27ac_counts_50per <- map(te_cols, ~ {
  annotated_H3K27ac_50_per %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)


H3K27ac_counts_long_50per <- map2_dfr(
  H3K27ac_counts_50per,      # the named list of wide tables
  names(H3K27ac_counts_50per), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / wnot_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)
H3K27ac_counts_long_50per <- H3K27ac_counts_long_50per %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()


H3K27me3_counts_50per <-  map(te_cols, ~ {
  annotated_H3K27me3_50_per %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)

H3K27me3_counts_long_50per <- map2_dfr(
  H3K27me3_counts_50per,      # the named list of wide tables
  names(H3K27me3_counts_50per), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / not_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)

H3K27me3_counts_long_50per <- H3K27me3_counts_long_50per %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()

H3K36me3_counts_50per <-  map(te_cols, ~ {
  annotated_H3K36me3_50_per %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)


H3K36me3_counts_long_50per <- map2_dfr(
  H3K36me3_counts_50per,      # the named list of wide tables
  names(H3K36me3_counts_50per), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / not_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)

H3K36me3_counts_long_50per <- H3K36me3_counts_long_50per %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()

H3K9me3_counts_50per <- map(te_cols, ~ {
  annotated_H3K9me3_50_per %>%
    group_by(cluster, !!sym(.x)) %>%  # dynamically use column
    tally() %>%
    pivot_wider(
      id_cols = cluster,
      names_from = !!sym(.x),
      values_from = n
    )
}) %>% set_names(te_cols)

H3K9me3_counts_long_50per <- map2_dfr(
  H3K9me3_counts_50per,      # the named list of wide tables
  names(H3K9me3_counts_50per), # names of TE columns
  ~ .x %>%
    pivot_longer(
      cols = -cluster,       # everything except 'cluster'
      names_to = "status",   # new column for TE status (e.g., TE / wnot_TE)
      values_to = "count"
    ) %>%
    mutate(TE_type = .y)      # add TE_type column based on list element name
)

H3K9me3_counts_long_50per <- H3K9me3_counts_long_50per %>% 
group_by(TE_type) %>%
  tidyr::complete(cluster, status, fill = list(count = 0)) %>%
  ungroup()

making the function

# # function to test enrichment for one TE type, one cluster pair
# test_pair_TE <- function(df_long, te_name, cluster1, cluster2) {
#   
#   # filter for TE type
#   sub_df <- df_long %>% filter(TE_type == te_name)
#   
#   # get counts for cluster1 and cluster2
#   c1 <- sub_df %>% dplyr::filter(cluster == cluster1)
#   c2 <- sub_df %>% dplyr::filter(cluster == cluster2)
#   
#   # 2x2 table: rows = clusters, cols = TE / wnot_TE
#   mat <- matrix(
#     c(c2$count, sum(c2$count) - c2$count,
#       c1$count, sum(c1$count) - c1$count),
#     nrow = 2,
#     byrow = TRUE,
#     dimnames = list(
#       cluster = c(cluster2, cluster1),
#       category = c("TE", "wnot_TE")
#     )
#   )
#   
#   ft <- fisher.test(mat)
#   
#   tibble(
#     TE_type     = te_name,
#     comparison  = paste(cluster2, "vs", cluster1),
#     odds_ratio  = ft$estimate,
#     lower_CI    = ft$conf.int[1],
#     upper_CI    = ft$conf.int[2],
#     p_value     = ft$p.value
#   )
# }


# Generic pairwise Fisher test
test_pair_TE_generic <- function(df_long, te_name, cluster1, cluster2) {
  
  sub_df <- df_long %>% filter(TE_type == te_name)
  
  # assume "status" column has TE vs wnot_TE automatically
  statuses <- unique(sub_df$status)
  
  if(length(statuses) != 2) {
    # ensure we have exactly two categories, fill missing with 0
    sub_df <- sub_df %>%
      complete(cluster, status, fill = list(count = 0))
    statuses <- unique(sub_df$status)
  }
  
  # extract counts for cluster1
  c1_counts <- sub_df %>%
    filter(cluster == cluster1) %>%
    arrange(match(status, statuses)) %>%   # ensure same order
    pull(count)
  
  # extract counts for cluster2
  c2_counts <- sub_df %>%
    filter(cluster == cluster2) %>%
    arrange(match(status, statuses)) %>%
    pull(count)
  
  # build 2x2 matrix
  mat <- matrix(
    c(c2_counts, c1_counts),
    nrow = 2,
    byrow = TRUE,
    dimnames = list(
      cluster = c(cluster2, cluster1),
      category = statuses
    )
  )
  
  ft <- tryCatch(
    fisher.test(mat, workspace = 2e8),
    error = function(e) fisher.test(mat, simulate.p.value = TRUE, B = 1e5)
  )
  
  tibble(
    TE_type     = te_name,
    comparison  = paste(cluster2, "vs", cluster1),
    odds_ratio  = ft$estimate,
    lower_CI    = ft$conf.int[1],
    upper_CI    = ft$conf.int[2],
    p_value     = ft$p.value
  )
}

applying to dataframes

H3K27ac proportions

results for 1bp overlap H3K27ac

TE_types <- te_cols

cluster_pairs_3 <- list(
  c("Set_2","Set_1"),
  c("Set_3","Set_1")
  # add more pairs if needed or
)
cluster_pairs_2 <- list(
  c("Set_2","Set_1"))


H3K27ac_results_1bp <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_3, function(pair){
    test_pair_TE_generic(H3K27ac_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K27ac_results_1bp"), 
datatable(H3K27ac_results_1bp,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K27ac_results_1bp

H3K27ac_counts_long %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K27ac 1 bp overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

results for 50% TE overlap H3K27ac

###
H3K27ac_results_50per <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_3, function(pair){
    test_pair_TE_generic(H3K27ac_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K27ac_results 50% overlap"),
datatable(H3K27ac_results_50per,
          options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K27ac_results 50% overlap

H3K27ac_counts_long_50per %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K27ac 50% TE overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

H3K27me3 proportions

results for 1bp overlap H3K27me3

H3K27me3_results_1bp <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_2, function(pair){
    test_pair_TE_generic(H3K27me3_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K27me3_results_1bp"), 
datatable(H3K27me3_results_1bp,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K27me3_results_1bp

H3K27me3_counts_long %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K27me3 1 bp overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

results for 50% TE overlap H3K27me3

###
H3K27me3_results_50per <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_2, function(pair){
    test_pair_TE_generic(H3K27me3_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K27me3_results 50% overlap"),
datatable(H3K27me3_results_50per,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K27me3_results 50% overlap

H3K27me3_counts_long_50per %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K27me3 50% TE overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

H3K36me3 proportions

results for 1bp overlap H3K36me3

H3K36me3_results_1bp <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_2, function(pair){
    test_pair_TE_generic(H3K36me3_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K36me3_results_1bp"), 
datatable(H3K36me3_results_1bp,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K36me3_results_1bp

H3K36me3_counts_long %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K36me3 1 bp overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

results for 50% TE overlap H3K36me3

###
H3K36me3_results_50per <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_2, function(pair){
    test_pair_TE_generic(H3K36me3_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K36me3_results 50% overlap"),
datatable(H3K36me3_results_50per,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K36me3_results 50% overlap

H3K36me3_counts_long_50per %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K36me3 50% TE overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

H3K9me3 proportions

results for 1bp overlap H3K9me3

H3K9me3_results_1bp <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_3, function(pair){
    test_pair_TE_generic(H3K9me3_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K9me3_results_1bp"), 
datatable(H3K9me3_results_1bp,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K9me3_results_1bp

H3K9me3_counts_long %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K9me3 1 bp overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

results for 50% TE overlap H3K9me3

###
H3K9me3_results_50per <- map_dfr(TE_types, function(te){
  map_dfr(cluster_pairs_3, function(pair){
    test_pair_TE_generic(H3K9me3_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
  })
})

htmltools::tagList(
  htmltools::h3("H3K9me3_results 50% overlap"),
datatable(H3K9me3_results_50per,options = list(
    scrollY = "600px",
    scrollCollapse = TRUE,
    pageLength = 20,
    autoWidth = TRUE
  ),
  rownames = FALSE))

H3K9me3_results 50% overlap

H3K9me3_counts_long_50per %>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  ggplot(aes(x=cluster, y=count, fill=status)) +
  geom_bar(stat="identity", position = "fill") +
  facet_wrap(~TE_type, scales="free_y")+
  ggtitle("H3K9me3 50% TE overlap enrichment")+
  theme_bw() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(face = "bold")
  )

Version	Author	Date
967b7a8	reneeisnowhere	2025-09-16

H3K27ac divergence from consensus

  H3K27ac_ols$all_H3K27ac %>% 
    left_join(., H3K27ac_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(x=per_div))+
  geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
  facet_wrap(~special_type, scales="free_y") +
    theme_classic()

  H3K27ac_ols$all_H3K27ac %>% 
  as.data.frame() %>% 
    left_join(., H3K27ac_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K27ac 1bp")

  H3K27ac_ols$all_H3K27ac %>% 
  as.data.frame() %>% 
    dplyr::filter(pct_te_overlap> 50) %>% 
    left_join(., H3K27ac_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>%
    ggplot(aes(y=per_div,x=special_type, fill=cluster))+
  # geom_boxplot(aes(fill=cluster) ) +
    geom_boxplot()+
stat_compare_means(
    aes(group = cluster),               # tell ggpubr we want to compare clusters
    method = "wilcox.test",             # same test as before
    comparisons = list(c("Set1", "Set2"), c("Set1", "Set3")),
    label = "p.signif",                 # show stars instead of p-values
    hide.ns = TRUE                      # hide non-significant comparisons
  ) +
    theme_classic()+
    # facet_wrap(~ special_type, scales = "free_y") +
    ggtitle("Divergence from overlapping TE H3K27ac 50% overlap")

### try to calculate wilcox.test
  comparisons <- list(
  c("Set_1", "Set_2"),
  c("Set_1", "Set_3")
)

  # 
  H3K27ac_wilcox_results <- H3K27ac_ols$all_H3K27ac %>%
    as.data.frame() %>%
    left_join(., H3K27ac_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
  group_by(special_type) %>%
  group_map(~ {
    stype <- unique(.x$special_type)  # capture the TE type
    map_dfr(comparisons, function(comp) {
      dat <- .x %>% filter(cluster %in% comp)
      test <- wilcox.test(per_div ~ cluster, data = dat)
      tibble(
        special_type = stype,
        comparison = paste(comp, collapse = "_vs_"),
        p.value = test$p.value,
        W = test$statistic,
        median_Set1 = median(dat$per_div[dat$cluster == "Set_1"]),
        median_other = median(dat$per_div[dat$cluster == comp[2]])
      )
    })
  }, .keep = TRUE) %>%
  bind_rows()
  
  H3K27ac_wilcox_results_per50 <- H3K27ac_ols$all_H3K27ac %>%
    as.data.frame() %>%
    dplyr::filter(pct_te_overlap> 50) %>%
    left_join(., H3K27ac_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
  group_by(special_type) %>%
  group_map(~ {
    stype <- unique(.x$special_type)  # capture the TE type
    map_dfr(comparisons, function(comp) {
      dat <- .x %>% filter(cluster %in% comp)
      test <- wilcox.test(per_div ~ cluster, data = dat)
      tibble(
        special_type = stype,
        comparison = paste(comp, collapse = "_vs_"),
        p.value = test$p.value,
        W = test$statistic,
        median_Set1 = median(dat$per_div[dat$cluster == "Set_1"]),
        median_other = median(dat$per_div[dat$cluster == comp[2]])
      )
    })
  }, .keep = TRUE) %>%
  bind_rows()

H3K27me3 divergence from consensus

  H3K27me3_ols$all_H3K27me3 %>% 
    left_join(., H3K27me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
    dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(x=per_div))+
  geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
  facet_wrap(~special_type, scales="free_y") +
    theme_classic()

  H3K27me3_ols$all_H3K27me3 %>% 
  as.data.frame() %>% 
    left_join(., H3K27me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K27me3 1bp")

  H3K27me3_ols$all_H3K27me3 %>% 
  as.data.frame() %>% 
    dplyr::filter(pct_te_overlap> 50) %>% 
    left_join(., H3K27me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K27me3 50% overlap")

   H3K27me3_wilcox_results_per50 <- H3K27me3_ols$all_H3K27me3 %>%
    as.data.frame() %>%
    dplyr::filter(pct_te_overlap> 50) %>%
    left_join(., H3K27me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
    dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  filter(cluster %in% c("Set_1", "Set_2")) %>%
  group_by(special_type) %>%
  group_map(~ {
    stype <- unique(.x$special_type)  # capture the TE type
    map_dfr(comparisons, function(comp) {
      dat <- .x %>% filter(cluster %in% comp)
       # Skip if both clusters are not present
      if(length(unique(dat$cluster)) < 2) return(NULL)
      test <- wilcox.test(per_div ~ cluster, data = dat)
      tibble(
        special_type = stype,
        comparison = paste(comp, collapse = "_vs_"),
        p.value = test$p.value,
        W = test$statistic,
        median_Set1 = median(dat$per_div[dat$cluster == "Set_1"]),
        median_other = median(dat$per_div[dat$cluster == comp[2]])
      )
    })
  }, .keep = TRUE) %>%
  bind_rows()

H3K36me3 divergence from consensus

  H3K36me3_ols$all_H3K36me3 %>% 
    left_join(., H3K36me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(x=per_div))+
  geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
  facet_wrap(~special_type, scales="free_y") +
    theme_classic()

  H3K36me3_ols$all_H3K36me3 %>% 
  as.data.frame() %>% 
    left_join(., H3K36me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K36me3 1bp")

  H3K36me3_ols$all_H3K36me3 %>% 
  as.data.frame() %>% 
    dplyr::filter(pct_te_overlap> 50) %>% 
    left_join(., H3K36me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K36me3 50% overlap")

  H3K36me3_ols$all_H3K36me3 %>% 
  as.data.frame() %>% 
    dplyr::filter(pct_te_overlap> 50) %>% 
    left_join(., H3K36me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K36me3 50% overlap")

   H3K36me3_wilcox_results_per50 <- H3K36me3_ols$all_H3K36me3 %>%
    as.data.frame() %>%
    dplyr::filter(pct_te_overlap> 50) %>%
    left_join(., H3K36me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
    dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  filter(cluster %in% c("Set_1", "Set_2")) %>%
  group_by(special_type) %>%
  group_map(~ {
    stype <- unique(.x$special_type)  # capture the TE type
    map_dfr(comparisons, function(comp) {
      dat <- .x %>% filter(cluster %in% comp)
       # Skip if both clusters are not present
      if(length(unique(dat$cluster)) < 2) return(NULL)
      test <- wilcox.test(per_div ~ cluster, data = dat)
      tibble(
        special_type = stype,
        comparison = paste(comp, collapse = "_vs_"),
        p.value = test$p.value,
        W = test$statistic,
        median_Set1 = median(dat$per_div[dat$cluster == "Set_1"]),
        median_other = median(dat$per_div[dat$cluster == comp[2]])
      )
    })
  }, .keep = TRUE) %>%
  bind_rows()

H3K9me3 divergence from consensus

  H3K9me3_ols$all_H3K9me3 %>% 
    left_join(., H3K9me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(x=per_div))+
  geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
  facet_wrap(~special_type, scales="free_y") +
    theme_classic()

  H3K9me3_ols$all_H3K9me3 %>% 
  as.data.frame() %>% 
    left_join(., H3K9me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K9me3 1bp")

  H3K9me3_ols$all_H3K9me3 %>% 
  as.data.frame() %>% 
    dplyr::filter(pct_te_overlap> 50) %>% 
    left_join(., H3K9me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  ggplot(aes(y=per_div,x=special_type))+
  geom_boxplot(aes(fill=cluster) ) +
  # facet_wrap(~special_type, scales="free_y") +
    theme_classic()+
    ggtitle("Divergence from overlapping TE H3K9me3 50% overlap")

  H3K9me3_wilcox_results <- H3K9me3_ols$all_H3K9me3 %>%
    as.data.frame() %>%
    left_join(., H3K9me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
      dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  filter(cluster %in% c("Set_1", "Set_2")) %>%
  group_by(special_type) %>%
  group_map(~ {
    stype <- unique(.x$special_type)  # capture the TE type
    map_dfr(comparisons, function(comp) {
      dat <- .x %>% filter(cluster %in% comp)
       # Skip if both clusters are not present
      if(length(unique(dat$cluster)) < 2) return(NULL)
      test <- wilcox.test(per_div ~ cluster, data = dat)
      tibble(
        special_type = stype,
        comparison = paste(comp, collapse = "_vs_"),
        p.value = test$p.value,
        W = test$statistic,
        median_Set1 = median(dat$per_div[dat$cluster == "Set_1"]),
        median_other = median(dat$per_div[dat$cluster == comp[2]])
      )
    })
  }, .keep = TRUE) %>%
  bind_rows()
  
  H3K9me3_wilcox_results_per50 <- H3K9me3_ols$all_H3K9me3 %>%
    as.data.frame() %>%
    dplyr::filter(pct_te_overlap> 50) %>%
    left_join(., H3K9me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
    dplyr::filter(cluster != "not_assigned") %>% 
    mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
                                  str_detect(repClass, "LINE") ~ "LINE",
                                  str_detect(repClass, "LTR") ~ "LTR",
                                  str_detect(repClass, "DNA") ~ "DNA",
                                  str_detect(repClass, "Retroposon") ~ "SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other")) %>% 
  dplyr::filter(special_type != "not_TE") %>% 
    mutate(per_div=milliDiv/10) %>% 
  filter(cluster %in% c("Set_1", "Set_2")) %>%
  group_by(special_type) %>%
  group_map(~ {
    stype <- unique(.x$special_type)  # capture the TE type
    map_dfr(comparisons, function(comp) {
      dat <- .x %>% filter(cluster %in% comp)
       # Skip if both clusters are not present
      if(length(unique(dat$cluster)) < 2) return(NULL)
      test <- wilcox.test(per_div ~ cluster, data = dat)
      tibble(
        special_type = stype,
        comparison = paste(comp, collapse = "_vs_"),
        p.value = test$p.value,
        W = test$statistic,
        median_Set1 = median(dat$per_div[dat$cluster == "Set_1"]),
        median_other = median(dat$per_div[dat$cluster == comp[2]])
      )
    })
  }, .keep = TRUE) %>%
  bind_rows()

H3K27ac lengths of TE, TSS distance

H3K27ac_repclass_sum <- H3K27ac_ols$all_H3K27ac %>%
   left_join(., H3K27ac_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other"))%>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  group_by(Peakid, special_type) %>% 
  summarize(width= min(width),
            min_te_len=min(te_len),
            min_roi_len=min(roi_len),
            min_overlap=min(overlap_bp),
            min_dist_TSS=min(distanceToTSS),
            max_dist_TSS=max(distanceToTSS),
            cluster=paste0(unique(cluster), collapse = ":")) 
  

H3K27ac_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_te_len))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Overlapping minimum TE length per class in H3K27ac")

H3K27ac_sets_gr$all_H3K27ac %>% 
  as.data.frame() %>% 
   left_join(., H3K27ac_lookup) %>% 
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
  ggplot(aes(x=cluster, y=width))+
  geom_boxplot()+
  theme_classic()+
    ggtitle("Region of interest length in H3K27ac")

H3K27ac_repclass_sum %>% 
  ggplot(aes(x=cluster,y=width))+
  geom_boxplot() +
  facet_wrap(~special_type)+
  theme_classic()+
    ggtitle("Minimum overlapping TE & ROI width length per class in H3K27ac")

H3K27ac_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_dist_TSS))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Distance to TSS per class in H3K27ac of overlapping ROIs")+
  coord_cartesian(ylim=c(-35000,35000))

H3K27me3 lengths of TE, TSS distance

H3K27me3_repclass_sum <- H3K27me3_ols$all_H3K27me3 %>%
   left_join(., H3K27me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other"))%>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  group_by(Peakid, special_type) %>% 
  summarize(width= min(width),
            min_te_len=min(te_len),
            min_roi_len=min(roi_len),
            min_overlap=min(overlap_bp),
            min_dist_TSS=min(distanceToTSS),
            max_dist_TSS=max(distanceToTSS),
            cluster=paste0(unique(cluster), collapse = ":")) 
  

H3K27me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_te_len))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Overlapping minimum TE length per class in H3K27me3")

H3K27me3_sets_gr$all_H3K27me3 %>% 
  as.data.frame() %>% 
   left_join(., H3K27me3_lookup) %>% 
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
  ggplot(aes(x=cluster, y=width))+
  geom_boxplot()+
  theme_classic()+
    ggtitle("Region of interest length in H3K27me3")

H3K27me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=width))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Minimum overlapping TE & ROI width length per class in H3K27me3")

H3K27me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_dist_TSS))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Distance to TSS per class in H3K27me3 of overlapping ROIs")

# H3K27me3_sets_gr$all_H3K27me3 %>% 
#   as.data.frame() %>% 
#   dplyr::filter(width>60000)
# 
H3K27me3_ols$all_H3K27me3 %>%
   as.data.frame() %>%
  # mutate(medi=median(roi_len))
  dplyr::filter(roi_len>60000)

   seqnames    start      end width strand                  Peakid
1     chr17 48584907 48584961    55      + chr17:48584861-48649905
2     chr17 48586185 48586249    65      + chr17:48584861-48649905
3     chr17 48586438 48586471    34      + chr17:48584861-48649905
4     chr17 48587783 48587832    50      + chr17:48584861-48649905
5     chr17 48590814 48590833    20      + chr17:48584861-48649905
6     chr17 48591679 48591702    24      + chr17:48584861-48649905
7     chr17 48591989 48592024    36      + chr17:48584861-48649905
8     chr17 48593811 48593839    29      + chr17:48584861-48649905
9     chr17 48595767 48595804    38      + chr17:48584861-48649905
10    chr17 48595805 48595831    27      + chr17:48584861-48649905
11    chr17 48600992 48601039    48      + chr17:48584861-48649905
12    chr17 48602377 48602409    33      + chr17:48584861-48649905
13    chr17 48603475 48603517    43      + chr17:48584861-48649905
14    chr17 48604820 48604896    77      + chr17:48584861-48649905
15    chr17 48605180 48605207    28      + chr17:48584861-48649905
16    chr17 48606351 48606392    42      + chr17:48584861-48649905
17    chr17 48606816 48606859    44      + chr17:48584861-48649905
18    chr17 48607704 48607742    39      + chr17:48584861-48649905
19    chr17 48607793 48607859    67      + chr17:48584861-48649905
20    chr17 48608141 48608452   312      + chr17:48584861-48649905
21    chr17 48611613 48611651    39      + chr17:48584861-48649905
22    chr17 48611652 48611670    19      + chr17:48584861-48649905
23    chr17 48612063 48612097    35      + chr17:48584861-48649905
24    chr17 48612615 48612756   142      + chr17:48584861-48649905
25    chr17 48613536 48613561    26      + chr17:48584861-48649905
26    chr17 48614113 48614148    36      + chr17:48584861-48649905
27    chr17 48614720 48614740    21      + chr17:48584861-48649905
28    chr17 48614758 48614797    40      + chr17:48584861-48649905
29    chr17 48614849 48614884    36      + chr17:48584861-48649905
30    chr17 48614991 48615014    24      + chr17:48584861-48649905
31    chr17 48615214 48615385   172      + chr17:48584861-48649905
32    chr17 48616131 48616167    37      + chr17:48584861-48649905
33    chr17 48617534 48617551    18      + chr17:48584861-48649905
34    chr17 48618425 48618448    24      + chr17:48584861-48649905
35    chr17 48619224 48619270    47      + chr17:48584861-48649905
36    chr17 48625316 48625427   112      + chr17:48584861-48649905
37    chr17 48625916 48625976    61      + chr17:48584861-48649905
38    chr17 48626377 48626411    35      + chr17:48584861-48649905
39    chr17 48626805 48626842    38      + chr17:48584861-48649905
40    chr17 48627335 48627366    32      + chr17:48584861-48649905
41    chr17 48627793 48627835    43      + chr17:48584861-48649905
42    chr17 48633058 48633110    53      + chr17:48584861-48649905
43    chr17 48633497 48633575    79      + chr17:48584861-48649905
44    chr17 48634188 48634487   300      - chr17:48584861-48649905
45    chr17 48634496 48634548    53      + chr17:48584861-48649905
46    chr17 48635094 48635148    55      + chr17:48584861-48649905
47    chr17 48635771 48635920   150      - chr17:48584861-48649905
48    chr17 48636528 48636553    26      + chr17:48584861-48649905
49    chr17 48636898 48637099   202      - chr17:48584861-48649905
50    chr17 48638058 48638085    28      + chr17:48584861-48649905
51    chr17 48638401 48638418    18      + chr17:48584861-48649905
52    chr17 48638418 48638446    29      + chr17:48584861-48649905
53    chr17 48638466 48638613   148      + chr17:48584861-48649905
54    chr17 48639192 48639269    78      - chr17:48584861-48649905
55    chr17 48640842 48640983   142      + chr17:48584861-48649905
56    chr17 48641720 48641756    37      + chr17:48584861-48649905
57    chr17 48647022 48647169   148      + chr17:48584861-48649905
58    chr17 48647332 48647388    57      + chr17:48584861-48649905
59    chr17 48647999 48648087    89      + chr17:48584861-48649905
60    chr17 48648093 48648256   164      - chr17:48584861-48649905
61    chr17 48648287 48648578   292      - chr17:48584861-48649905
62    chr17 48648759 48649063   305      - chr17:48584861-48649905
63    chr17 48649065 48649207   143      - chr17:48584861-48649905
64    chr17 48649242 48649905   664      - chr17:48584861-48649905
         annotation geneChr geneStart  geneEnd geneLength geneStrand    geneId
1  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
2  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
3  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
4  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
5  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
6  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
7  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
8  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
9  Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
10 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
11 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
12 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
13 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
14 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
15 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
16 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
17 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
18 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
19 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
20 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
21 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
22 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
23 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
24 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
25 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
26 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
27 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
28 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
29 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
30 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
31 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
32 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
33 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
34 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
35 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
36 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
37 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
38 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
39 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
40 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
41 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
42 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
43 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
44 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
45 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
46 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
47 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
48 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
49 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
50 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
51 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
52 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
53 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
54 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
55 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
56 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
57 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
58 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
59 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
60 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
61 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
62 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
63 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
64 Promoter (<=1kb)      17  48628675 48634932       6258          1 100874351
        transcriptId distanceToTSS  hit overlap_bp roi_len pct_roi_overlap
1  ENST00000480386.1             0 TRUE         55   65045      0.08455685
2  ENST00000480386.1             0 TRUE         65   65045      0.09993082
3  ENST00000480386.1             0 TRUE         34   65045      0.05227150
4  ENST00000480386.1             0 TRUE         50   65045      0.07686986
5  ENST00000480386.1             0 TRUE         20   65045      0.03074794
6  ENST00000480386.1             0 TRUE         24   65045      0.03689753
7  ENST00000480386.1             0 TRUE         36   65045      0.05534630
8  ENST00000480386.1             0 TRUE         29   65045      0.04458452
9  ENST00000480386.1             0 TRUE         38   65045      0.05842109
10 ENST00000480386.1             0 TRUE         27   65045      0.04150972
11 ENST00000480386.1             0 TRUE         48   65045      0.07379506
12 ENST00000480386.1             0 TRUE         33   65045      0.05073411
13 ENST00000480386.1             0 TRUE         43   65045      0.06610808
14 ENST00000480386.1             0 TRUE         77   65045      0.11837958
15 ENST00000480386.1             0 TRUE         28   65045      0.04304712
16 ENST00000480386.1             0 TRUE         42   65045      0.06457068
17 ENST00000480386.1             0 TRUE         44   65045      0.06764548
18 ENST00000480386.1             0 TRUE         39   65045      0.05995849
19 ENST00000480386.1             0 TRUE         67   65045      0.10300561
20 ENST00000480386.1             0 TRUE        312   65045      0.47966792
21 ENST00000480386.1             0 TRUE         39   65045      0.05995849
22 ENST00000480386.1             0 TRUE         19   65045      0.02921055
23 ENST00000480386.1             0 TRUE         35   65045      0.05380890
24 ENST00000480386.1             0 TRUE        142   65045      0.21831040
25 ENST00000480386.1             0 TRUE         26   65045      0.03997233
26 ENST00000480386.1             0 TRUE         36   65045      0.05534630
27 ENST00000480386.1             0 TRUE         21   65045      0.03228534
28 ENST00000480386.1             0 TRUE         40   65045      0.06149589
29 ENST00000480386.1             0 TRUE         36   65045      0.05534630
30 ENST00000480386.1             0 TRUE         24   65045      0.03689753
31 ENST00000480386.1             0 TRUE        172   65045      0.26443232
32 ENST00000480386.1             0 TRUE         37   65045      0.05688370
33 ENST00000480386.1             0 TRUE         18   65045      0.02767315
34 ENST00000480386.1             0 TRUE         24   65045      0.03689753
35 ENST00000480386.1             0 TRUE         47   65045      0.07225767
36 ENST00000480386.1             0 TRUE        112   65045      0.17218848
37 ENST00000480386.1             0 TRUE         61   65045      0.09378123
38 ENST00000480386.1             0 TRUE         35   65045      0.05380890
39 ENST00000480386.1             0 TRUE         38   65045      0.05842109
40 ENST00000480386.1             0 TRUE         32   65045      0.04919671
41 ENST00000480386.1             0 TRUE         43   65045      0.06610808
42 ENST00000480386.1             0 TRUE         53   65045      0.08148205
43 ENST00000480386.1             0 TRUE         79   65045      0.12145438
44 ENST00000480386.1             0 TRUE        300   65045      0.46121916
45 ENST00000480386.1             0 TRUE         53   65045      0.08148205
46 ENST00000480386.1             0 TRUE         55   65045      0.08455685
47 ENST00000480386.1             0 TRUE        150   65045      0.23060958
48 ENST00000480386.1             0 TRUE         26   65045      0.03997233
49 ENST00000480386.1             0 TRUE        202   65045      0.31055423
50 ENST00000480386.1             0 TRUE         28   65045      0.04304712
51 ENST00000480386.1             0 TRUE         18   65045      0.02767315
52 ENST00000480386.1             0 TRUE         29   65045      0.04458452
53 ENST00000480386.1             0 TRUE        148   65045      0.22753478
54 ENST00000480386.1             0 TRUE         78   65045      0.11991698
55 ENST00000480386.1             0 TRUE        142   65045      0.21831040
56 ENST00000480386.1             0 TRUE         37   65045      0.05688370
57 ENST00000480386.1             0 TRUE        148   65045      0.22753478
58 ENST00000480386.1             0 TRUE         57   65045      0.08763164
59 ENST00000480386.1             0 TRUE         89   65045      0.13682835
60 ENST00000480386.1             0 TRUE        164   65045      0.25213314
61 ENST00000480386.1             0 TRUE        292   65045      0.44891998
62 ENST00000480386.1             0 TRUE        305   65045      0.46890614
63 ENST00000480386.1             0 TRUE        143   65045      0.21984780
64 ENST00000480386.1             0 TRUE        664   65045      1.02083173
   te_len pct_te_overlap      repName       repClass      repFamily milliDiv
1      55      100.00000         (C)n  Simple_repeat  Simple_repeat      209
2      65      100.00000      GA-rich Low_complexity Low_complexity      299
3      34      100.00000       (GGC)n  Simple_repeat  Simple_repeat      241
4      50      100.00000     (AAACA)n  Simple_repeat  Simple_repeat       92
5      20      100.00000         (T)n  Simple_repeat  Simple_repeat        0
6      24      100.00000       (GGA)n  Simple_repeat  Simple_repeat        0
7      36      100.00000       (AAT)n  Simple_repeat  Simple_repeat      128
8      29      100.00000       G-rich Low_complexity Low_complexity      196
9      38      100.00000       (TTA)n  Simple_repeat  Simple_repeat      153
10     27      100.00000       (ATC)n  Simple_repeat  Simple_repeat        0
11     48      100.00000        (GT)n  Simple_repeat  Simple_repeat        0
12     33      100.00000    (GGTGGG)n  Simple_repeat  Simple_repeat       63
13     43      100.00000     (TTTTC)n  Simple_repeat  Simple_repeat      133
14     77      100.00000        (TC)n  Simple_repeat  Simple_repeat      181
15     28      100.00000       (TCT)n  Simple_repeat  Simple_repeat       83
16     42      100.00000       A-rich Low_complexity Low_complexity      221
17     44      100.00000        (TG)n  Simple_repeat  Simple_repeat      271
18     39      100.00000         (T)n  Simple_repeat  Simple_repeat      112
19     67      100.00000    (CTCTCC)n  Simple_repeat  Simple_repeat      262
20    312      100.00000       AluYm1           SINE            Alu       48
21     39      100.00000         (A)n  Simple_repeat  Simple_repeat      148
22     19      100.00000       A-rich Low_complexity Low_complexity      142
23     35      100.00000     (GCCTG)n  Simple_repeat  Simple_repeat      161
24    142      100.00000      GA-rich Low_complexity Low_complexity      237
25     26      100.00000     (GAGGG)n  Simple_repeat  Simple_repeat       81
26     36      100.00000   (AAGAAAG)n  Simple_repeat  Simple_repeat       89
27     21      100.00000       (GGC)n  Simple_repeat  Simple_repeat       49
28     40      100.00000       G-rich Low_complexity Low_complexity      174
29     36      100.00000       A-rich Low_complexity Low_complexity      173
30     24      100.00000         (C)n  Simple_repeat  Simple_repeat      137
31    172      100.00000      GA-rich Low_complexity Low_complexity      291
32     37      100.00000     (CCCTG)n  Simple_repeat  Simple_repeat      117
33     18      100.00000         (A)n  Simple_repeat  Simple_repeat        0
34     24      100.00000       (ATT)n  Simple_repeat  Simple_repeat      139
35     47      100.00000        (GT)n  Simple_repeat  Simple_repeat        0
36    112      100.00000   (CCGGCCC)n  Simple_repeat  Simple_repeat      209
37     61      100.00000    (CGCCGG)n  Simple_repeat  Simple_repeat      246
38     35      100.00000         (C)n  Simple_repeat  Simple_repeat      196
39     38      100.00000       G-rich Low_complexity Low_complexity      154
40     32      100.00000       G-rich Low_complexity Low_complexity      216
41     43      100.00000    (TCCTGC)n  Simple_repeat  Simple_repeat       90
42     53      100.00000      GA-rich Low_complexity Low_complexity      253
43     79      100.00000       (GGC)n  Simple_repeat  Simple_repeat      305
44    300      100.00000       AluSx3           SINE            Alu       60
45     53      100.00000     (TTTTC)n  Simple_repeat  Simple_repeat      146
46     55      100.00000    (GCTCAG)n  Simple_repeat  Simple_repeat      235
47    150      100.00000    Charlie4z            DNA    hAT-Charlie      324
48     26      100.00000         (T)n  Simple_repeat  Simple_repeat        0
49    202      100.00000         MIRc           SINE            MIR      316
50     28      100.00000        (TG)n  Simple_repeat  Simple_repeat      159
51     18      100.00000        (TC)n  Simple_repeat  Simple_repeat        0
52     29      100.00000        (CA)n  Simple_repeat  Simple_repeat       74
53    148      100.00000         MIRb           SINE            MIR      299
54     78      100.00000          L2a           LINE             L2      187
55    142      100.00000    Charlie4z            DNA    hAT-Charlie      202
56     37      100.00000 (CAGGGGCGG)n  Simple_repeat  Simple_repeat       28
57    148      100.00000        (GA)n  Simple_repeat  Simple_repeat       81
58     57      100.00000       G-rich Low_complexity Low_complexity      283
59     89      100.00000     MIR1_Amn           SINE            MIR      247
60    164      100.00000         L1M5           LINE             L1      181
61    292      100.00000        AluSx           SINE            Alu      123
62    305      100.00000        AluSp           SINE            Alu       79
63    143      100.00000        AluSp           SINE            Alu       56
64    749       88.65154        L1ME3           LINE             L1      236
   milliDel milliIns
1         0        0
2        46        0
3         0        0
4         0       40
5         0        0
6         0       83
7         0       28
8         0        0
9         0        0
10        0        0
11       62        0
12       30        0
13       47       22
14       52       12
15       36       35
16       24        0
17        0        0
18        0        0
19       45       14
20        0        6
21        0        0
22       52       33
23       57        0
24       42       34
25        0        0
26       28        0
27        0        0
28       27        0
29       28       27
30        0        0
31       17        0
32       27        0
33        0        0
34        0        0
35        0        0
36       11      101
37       33       32
38        0        0
39        0       26
40        0        0
41       47       89
42       38       18
43        0       51
44       13        0
45       38       73
46       18       54
47       58       33
48        0        0
49       36       59
50        0        0
51        0        0
52        0        0
53       62       74
54        0       39
55       70      162
56        0        0
57       20        7
58       18       17
59      128        0
60       70       24
61        3        0
62       29        3
63       14        0
64       75       42

H3K36me3 lengths of TE, TSS distance

H3K36me3_repclass_sum <- H3K36me3_ols$all_H3K36me3 %>%
   left_join(., H3K36me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other"))%>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  group_by(Peakid, special_type) %>% 
  summarize(width= min(width),
            min_te_len=min(te_len),
            min_roi_len=min(roi_len),
            min_overlap=min(overlap_bp),
            min_dist_TSS=min(distanceToTSS),
            max_dist_TSS=max(distanceToTSS),
            cluster=paste0(unique(cluster), collapse = ":")) 
  

H3K36me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_te_len))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Overlapping minimum TE length per class in H3K36me3")

H3K36me3_sets_gr$all_H3K36me3 %>% 
  as.data.frame() %>% 
   left_join(., H3K36me3_lookup) %>% 
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
  ggplot(aes(x=cluster, y=width))+
  geom_boxplot()+
  theme_classic()+
    ggtitle("Region of interest length in H3K36me3")

H3K36me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=width))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Minimum overlapping TE & ROI width length per class in H3K36me3")

H3K36me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_dist_TSS))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Distance to TSS per class in H3K36me3 of overlapping ROIs")

# H3K36me3_sets_gr$all_H3K36me3 %>% 
#   as.data.frame() %>% 
#   dplyr::filter(width>60000)
# 
# H3K36me3_ols$all_H3K36me3 %>% 
#    as.data.frame() %>% 
#   mutate(medi=median(roi_len))
#   dplyr::filter(roi_len>60000)

H3K9me3 lengths of TE, TSS distance

H3K9me3_repclass_sum <- H3K9me3_ols$all_H3K9me3 %>%
   left_join(., H3K9me3_lookup) %>% 
     mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
    mutate(special_type=case_when(repClass=="SINE"~"SINE",
                                  repClass=="LINE"~"LINE",
                                  repClass=="LTR"~"LTR",
                                  repClass=="DNA"~"DNA",
                                  repClass=="Retroposon"~"SVA",
                                  is.na(repClass)~"not_TE",
                                  TRUE~"Other"))%>% 
  dplyr::filter(cluster != "not_assigned") %>% 
  group_by(Peakid, special_type) %>% 
  summarize(width= min(width),
            min_te_len=min(te_len),
            min_roi_len=min(roi_len),
            min_overlap=min(overlap_bp),
            min_dist_TSS=min(distanceToTSS),
            max_dist_TSS=max(distanceToTSS),
            cluster=paste0(unique(cluster), collapse = ":")) 
  

H3K9me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_te_len))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Overlapping minimum TE length per class in H3K9me3")

H3K9me3_sets_gr$all_H3K9me3 %>% 
  as.data.frame() %>% 
   left_join(., H3K9me3_lookup) %>% 
  mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>% 
  ggplot(aes(x=cluster, y=width))+
  geom_boxplot()+
  theme_classic()+
    ggtitle("Region of interest length in H3K9me3")

H3K9me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=width))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Minimum overlapping TE & ROI width length per class in H3K9me3")

H3K9me3_repclass_sum %>% 
  ggplot(aes(x=special_type,y=min_dist_TSS))+
  geom_boxplot() +
  facet_wrap(~cluster)+
  theme_classic()+
    ggtitle("Distance to TSS per class in H3K9me3 of overlapping ROIs")

# H3K9me3_sets_gr$all_H3K9me3 %>% 
#   as.data.frame() %>% 
#   dplyr::filter(width>60000)
# 
# H3K27ac_ols$all_H3K27ac %>% 
#    as.data.frame() %>% 
#   mutate(medi=median(roi_len))
#   dplyr::filter(roi_len>60000)

sessionInfo()

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] ggpubr_0.6.1                            
 [2] ggsignif_0.6.4                          
 [3] plyranges_1.26.0                        
 [4] ChIPseeker_1.42.1                       
 [5] TxDb.Hsapiens.UCSC.hg38.knownGene_3.20.0
 [6] GenomicFeatures_1.58.0                  
 [7] AnnotationDbi_1.68.0                    
 [8] Biobase_2.66.0                          
 [9] BiocParallel_1.40.2                     
[10] GenomicRanges_1.58.0                    
[11] GenomeInfoDb_1.42.3                     
[12] IRanges_2.40.1                          
[13] S4Vectors_0.44.0                        
[14] BiocGenerics_0.52.0                     
[15] genomation_1.38.0                       
[16] DT_0.33                                 
[17] lubridate_1.9.4                         
[18] forcats_1.0.0                           
[19] stringr_1.5.1                           
[20] dplyr_1.1.4                             
[21] purrr_1.1.0                             
[22] readr_2.1.5                             
[23] tidyr_1.3.1                             
[24] tibble_3.3.0                            
[25] ggplot2_3.5.2                           
[26] tidyverse_2.0.0                         
[27] workflowr_1.7.1                         

loaded via a namespace (and not attached):
  [1] splines_4.4.2                          
  [2] later_1.4.2                            
  [3] BiocIO_1.16.0                          
  [4] bitops_1.0-9                           
  [5] ggplotify_0.1.2                        
  [6] R.oo_1.27.1                            
  [7] XML_3.99-0.18                          
  [8] lifecycle_1.0.4                        
  [9] rstatix_0.7.2                          
 [10] rprojroot_2.1.1                        
 [11] processx_3.8.6                         
 [12] lattice_0.22-7                         
 [13] vroom_1.6.5                            
 [14] crosstalk_1.2.2                        
 [15] backports_1.5.0                        
 [16] magrittr_2.0.3                         
 [17] sass_0.4.10                            
 [18] rmarkdown_2.29                         
 [19] jquerylib_0.1.4                        
 [20] yaml_2.3.10                            
 [21] plotrix_3.8-4                          
 [22] httpuv_1.6.16                          
 [23] ggtangle_0.0.7                         
 [24] cowplot_1.2.0                          
 [25] DBI_1.2.3                              
 [26] RColorBrewer_1.1-3                     
 [27] abind_1.4-8                            
 [28] zlibbioc_1.52.0                        
 [29] R.utils_2.13.0                         
 [30] RCurl_1.98-1.17                        
 [31] yulab.utils_0.2.1                      
 [32] rappdirs_0.3.3                         
 [33] git2r_0.36.2                           
 [34] GenomeInfoDbData_1.2.13                
 [35] enrichplot_1.26.6                      
 [36] ggrepel_0.9.6                          
 [37] tidytree_0.4.6                         
 [38] codetools_0.2-20                       
 [39] DelayedArray_0.32.0                    
 [40] DOSE_4.0.1                             
 [41] tidyselect_1.2.1                       
 [42] aplot_0.2.8                            
 [43] UCSC.utils_1.2.0                       
 [44] farver_2.1.2                           
 [45] matrixStats_1.5.0                      
 [46] GenomicAlignments_1.42.0               
 [47] jsonlite_2.0.0                         
 [48] Formula_1.2-5                          
 [49] tools_4.4.2                            
 [50] treeio_1.30.0                          
 [51] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [52] Rcpp_1.1.0                             
 [53] glue_1.8.0                             
 [54] SparseArray_1.6.2                      
 [55] xfun_0.52                              
 [56] qvalue_2.38.0                          
 [57] MatrixGenerics_1.18.1                  
 [58] withr_3.0.2                            
 [59] fastmap_1.2.0                          
 [60] boot_1.3-32                            
 [61] callr_3.7.6                            
 [62] caTools_1.18.3                         
 [63] digest_0.6.37                          
 [64] timechange_0.3.0                       
 [65] R6_2.6.1                               
 [66] gridGraphics_0.5-1                     
 [67] seqPattern_1.38.0                      
 [68] colorspace_2.1-1                       
 [69] GO.db_3.20.0                           
 [70] gtools_3.9.5                           
 [71] dichromat_2.0-0.1                      
 [72] RSQLite_2.4.3                          
 [73] R.methodsS3_1.8.2                      
 [74] utf8_1.2.6                             
 [75] generics_0.1.4                         
 [76] data.table_1.17.8                      
 [77] rtracklayer_1.66.0                     
 [78] httr_1.4.7                             
 [79] htmlwidgets_1.6.4                      
 [80] S4Arrays_1.6.0                         
 [81] whisker_0.4.1                          
 [82] pkgconfig_2.0.3                        
 [83] gtable_0.3.6                           
 [84] blob_1.2.4                             
 [85] impute_1.80.0                          
 [86] XVector_0.46.0                         
 [87] htmltools_0.5.8.1                      
 [88] carData_3.0-5                          
 [89] fgsea_1.32.4                           
 [90] scales_1.4.0                           
 [91] png_0.1-8                              
 [92] ggfun_0.2.0                            
 [93] knitr_1.50                             
 [94] rstudioapi_0.17.1                      
 [95] tzdb_0.5.0                             
 [96] reshape2_1.4.4                         
 [97] rjson_0.2.23                           
 [98] nlme_3.1-168                           
 [99] curl_7.0.0                             
[100] cachem_1.1.0                           
[101] KernSmooth_2.23-26                     
[102] parallel_4.4.2                         
[103] restfulr_0.0.16                        
[104] pillar_1.11.0                          
[105] vctrs_0.6.5                            
[106] gplots_3.2.0                           
[107] promises_1.3.3                         
[108] car_3.1-3                              
[109] evaluate_1.0.5                         
[110] cli_3.6.5                              
[111] compiler_4.4.2                         
[112] Rsamtools_2.22.0                       
[113] rlang_1.1.6                            
[114] crayon_1.5.3                           
[115] labeling_0.4.3                         
[116] ps_1.9.1                               
[117] getPass_0.2-4                          
[118] plyr_1.8.9                             
[119] fs_1.6.6                               
[120] stringi_1.8.7                          
[121] gridBase_0.4-7                         
[122] Biostrings_2.74.1                      
[123] lazyeval_0.2.2                         
[124] GOSemSim_2.32.0                        
[125] Matrix_1.7-3                           
[126] BSgenome_1.74.0                        
[127] hms_1.1.3                              
[128] patchwork_1.3.2                        
[129] bit64_4.6.0-1                          
[130] KEGGREST_1.46.0                        
[131] SummarizedExperiment_1.36.0            
[132] igraph_2.1.4                           
[133] broom_1.0.9                            
[134] memoise_2.0.1                          
[135] bslib_0.9.0                            
[136] ggtree_3.14.0                          
[137] fastmatch_1.1-6                        
[138] bit_4.6.0                              
[139] ape_5.8-1

repeatmasker_data

Renee Matthews

2025-09-11

H3K27ac and TEs

All regions profile

H3K27me3 and TEs

All regions profile

H3K36me3 and TEs

All regions profile

H3K9me3 and TEs

All regions profile

Counting TE types:

H3K27ac proportions

results for 1bp overlap H3K27ac

H3K27ac_results_1bp

results for 50% TE overlap H3K27ac

H3K27ac_results 50% overlap

H3K27me3 proportions

results for 1bp overlap H3K27me3

H3K27me3_results_1bp

results for 50% TE overlap H3K27me3

H3K27me3_results 50% overlap

H3K36me3 proportions

results for 1bp overlap H3K36me3

H3K36me3_results_1bp

results for 50% TE overlap H3K36me3

H3K36me3_results 50% overlap

H3K9me3 proportions

results for 1bp overlap H3K9me3

H3K9me3_results_1bp

results for 50% TE overlap H3K9me3

H3K9me3_results 50% overlap

H3K27ac divergence from consensus

H3K27me3 divergence from consensus

H3K36me3 divergence from consensus

H3K9me3 divergence from consensus

H3K27ac lengths of TE, TSS distance

H3K27me3 lengths of TE, TSS distance

H3K36me3 lengths of TE, TSS distance

H3K9me3 lengths of TE, TSS distance