Three scenarios:
Review of antisense transcription: https://www.nature.com/articles/nrm2738
Antisense can encode for proteins or be non-protein coding transcripts.
The most prominent form of antisense transcription is AS-noncoding paired with protein coding transcripts.
## Check if any overlaps in transcripts of query genes and
## the coding regions of other genes in the database
## Both strand directions are checked for overlapping.
queryTranscript <- query[which(query$type == "transcript"),]
ref <- transcript[transcript$gene_name %in%
setdiff(transcript$gene_name,
queryTranscript$gene_name
),
]
overlaps <- mergeByOverlaps(ref,
queryTranscript,
ignore.strand = TRUE # include overlaps with the opposite strand
)
pairs<-data.frame(overlaps) %>%
select(c(queryTranscript.gene_name, ref.gene_name)) %>%
group_by(queryTranscript.gene_name, ref.gene_name) %>%
filter(row_number() == 1)
Three pairs of cis-NATs in the paper:
| queryTranscript.gene_name | ref.gene_name |
|---|---|
| <chr> | <chr> |
| TNFRSF14 | TNFRSF14-AS1 |
| HBP1 | AC004492.1 |
| HBP1 | COG5 |
| CTSA | NEURL2 |
| CTSA | AL008726.1 |
| CTSA | PLTP |
| queryTranscript.gene_name | ref.gene_name | targets.gene_name |
|---|---|---|
| <chr> | <chr> | <chr> |
| PINK1 | PINK1-AS | naPINK1 |
| PINK1 | AL391357.1 | naPINK1 |
| MSH4 | AL445464.1 | Hspa5 |
| SIX3 | SIX3-AS1 | Six3OS |
| VAX2 | snoU13 | Vax2OS |
| VAX2 | ATP6V1B1 | Vax2OS |
| VAX2 | AC007040.2 | Vax2OS |
| VAX2 | ATP6V1B1-AS1 | Vax2OS |
| VAX2 | ELOCP21 | Vax2OS |
| ZEB2 | AC009951.5 | Zeb2 NAT |
| ZEB2 | AC009951.2 | Zeb2 NAT |
| ZEB2 | AC009951.1 | Zeb2 NAT |
| ZEB2 | AC009951.4 | Zeb2 NAT |
| ZEB2 | ZEB2-AS1 | Zeb2 NAT |
| MKRN2 | MKRN2OS | RAF1 |
| MKRN2 | RAF1 | RAF1 |
| SMAD5 | SMAD5-AS1 | DAMS |
| CDYL | AL356747.1 | CDYL-AS |
| CDYL | CDYL-AS1 | CDYL-AS |
| IGF2R | AIRN | Air |
| IGF2R | AL353625.1 | Air |
| EMX2 | EMX2OS | EMX2OS |
| BDNF | BDNF-AS | BDNFOS |
| BDNF | AC103796.1 | BDNFOS |
| PAX6 | ELP4 | Pax6OS |
| PAX6 | PAUPAR | Pax6OS |
| PAX6 | AL035078.4 | Pax6OS |
| WT1 | WT1-AS | WT1-AS |
| BACE1 | RNF214 | BACE1-AS |
| BACE1 | BACE1-AS | BACE1-AS |
| BACE1 | AP000892.3 | BACE1-AS |
| BACE1 | CEP164 | BACE1-AS |
| PMCH | PARPBP | pMCH antisense |
| PMCH | HELLPAR | pMCH antisense |
| OTX2 | OTX2-AS1 | Otx2OS |
| SIX6 | C14orf39 | Six6OS |
| CHRNA3 | CHRNA5 | CHRNA5 |
| CHRNA3 | AC067863.2 | CHRNA5 |
| COX10 | COX10-AS1 | C17ORF1 |
| COX10 | SNORA74 | C17ORF1 |
| SPHK1 | PRPSAP1 | Khps1 |
| MBP | AC093330.1 | MBP-AS |
| MBP | AC093330.2 | MBP-AS |
| MBP | AC093330.3 | MBP-AS |
| MBP | AC018529.1 | MBP-AS |
| MBP | AC018529.3 | MBP-AS |
| MBP | AC018529.2 | MBP-AS |
| APOE | AC011481.3 | APOE-AS1 |
| TOP1 | PLCG1-AS1 | TOP1-AS |
| TSIX | XIST | Xist |
| FMR1 | FMR1-IT1 | ASFMR1 |
The list of reported antisense-target pairs have different gene names, so I don't have a good way to compare the results yet.
## Number of query genes that overlap with other genes in Gencode database
length(unique(pairs$queryTranscript.gene_name))
## Number of query genes not predicted to have a paired target
length(setdiff(toupper(queryGenes), pairs$queryTranscript.gene_name))
# Number of query genes not found in Gencode database
length(setdiff(toupper(queryGenes), gene$gene_name))
# genes in database but do not have a target to form dsRNA
setdiff(intersect(toupper(queryGenes), gene$gene_name),
pairs$queryTranscript.gene_name)
head(pairs, 20)
| queryTranscript.gene_name | ref.gene_name |
|---|---|
| <chr> | <chr> |
| THAP3 | DNAJC11 |
| TRIT1 | Y_RNA |
| PRPF38A | TUT4 |
| PRPF38A | snoU13 |
| AP4B1 | AP4B1-AS1 |
| FDPS | RUSC1-AS1 |
| GON4L | MSTO2P |
| GON4L | AL139128.1 |
| TRAF3IP3 | C1orf74 |
| ZFP36L2 | AC010883.3 |
| TRIP12 | RNU6-613P |
| TRIP12 | FBXO36 |
| DIS3L2 | AC019130.1 |
| DIS3L2 | MIR562 |
| DIS3L2 | NRBF2P6 |
| VGLL4 | ATG7 |
| VGLL4 | AC022001.2 |
| VGLL4 | AC022001.3 |
| GLB1 | SEC13P1 |
| GLB1 | SUMO2P10 |
[1] "146 out of 511(30%) TWAS prioritized genes are predicted to form dsRNAs based on sequence overlapping."