Three scenarios:
Review of antisense transcription: https://www.nature.com/articles/nrm2738
Antisense can encode for proteins or be non-protein coding transcripts.
The most prominent form of antisense transcription is AS-noncoding paired with protein coding transcripts.
## Check if any overlaps in transcripts of query genes and
## the coding regions of other genes in the database
## Both strand directions are checked for overlapping.
queryTranscript <- query[which(query$type == "transcript"),]
ref <- transcript[transcript$gene_name %in%
setdiff(transcript$gene_name,
queryTranscript$gene_name
),
]
overlaps <- mergeByOverlaps(ref,
queryTranscript,
ignore.strand = TRUE # include overlaps with the opposite strand
)
pairs<-data.frame(overlaps) %>%
select(c(queryTranscript.gene_name, ref.gene_name)) %>%
group_by(queryTranscript.gene_name, ref.gene_name) %>%
filter(row_number() == 1)
Three pairs of cis-NATs in the paper:
queryTranscript.gene_name | ref.gene_name |
---|---|
<chr> | <chr> |
TNFRSF14 | TNFRSF14-AS1 |
HBP1 | AC004492.1 |
HBP1 | COG5 |
CTSA | NEURL2 |
CTSA | AL008726.1 |
CTSA | PLTP |
queryTranscript.gene_name | ref.gene_name | targets.gene_name |
---|---|---|
<chr> | <chr> | <chr> |
PINK1 | PINK1-AS | naPINK1 |
PINK1 | AL391357.1 | naPINK1 |
MSH4 | AL445464.1 | Hspa5 |
SIX3 | SIX3-AS1 | Six3OS |
VAX2 | snoU13 | Vax2OS |
VAX2 | ATP6V1B1 | Vax2OS |
VAX2 | AC007040.2 | Vax2OS |
VAX2 | ATP6V1B1-AS1 | Vax2OS |
VAX2 | ELOCP21 | Vax2OS |
ZEB2 | AC009951.5 | Zeb2 NAT |
ZEB2 | AC009951.2 | Zeb2 NAT |
ZEB2 | AC009951.1 | Zeb2 NAT |
ZEB2 | AC009951.4 | Zeb2 NAT |
ZEB2 | ZEB2-AS1 | Zeb2 NAT |
MKRN2 | MKRN2OS | RAF1 |
MKRN2 | RAF1 | RAF1 |
SMAD5 | SMAD5-AS1 | DAMS |
CDYL | AL356747.1 | CDYL-AS |
CDYL | CDYL-AS1 | CDYL-AS |
IGF2R | AIRN | Air |
IGF2R | AL353625.1 | Air |
EMX2 | EMX2OS | EMX2OS |
BDNF | BDNF-AS | BDNFOS |
BDNF | AC103796.1 | BDNFOS |
PAX6 | ELP4 | Pax6OS |
PAX6 | PAUPAR | Pax6OS |
PAX6 | AL035078.4 | Pax6OS |
WT1 | WT1-AS | WT1-AS |
BACE1 | RNF214 | BACE1-AS |
BACE1 | BACE1-AS | BACE1-AS |
BACE1 | AP000892.3 | BACE1-AS |
BACE1 | CEP164 | BACE1-AS |
PMCH | PARPBP | pMCH antisense |
PMCH | HELLPAR | pMCH antisense |
OTX2 | OTX2-AS1 | Otx2OS |
SIX6 | C14orf39 | Six6OS |
CHRNA3 | CHRNA5 | CHRNA5 |
CHRNA3 | AC067863.2 | CHRNA5 |
COX10 | COX10-AS1 | C17ORF1 |
COX10 | SNORA74 | C17ORF1 |
SPHK1 | PRPSAP1 | Khps1 |
MBP | AC093330.1 | MBP-AS |
MBP | AC093330.2 | MBP-AS |
MBP | AC093330.3 | MBP-AS |
MBP | AC018529.1 | MBP-AS |
MBP | AC018529.3 | MBP-AS |
MBP | AC018529.2 | MBP-AS |
APOE | AC011481.3 | APOE-AS1 |
TOP1 | PLCG1-AS1 | TOP1-AS |
TSIX | XIST | Xist |
FMR1 | FMR1-IT1 | ASFMR1 |
The list of reported antisense-target pairs have different gene names, so I don't have a good way to compare the results yet.
## Number of query genes that overlap with other genes in Gencode database
length(unique(pairs$queryTranscript.gene_name))
## Number of query genes not predicted to have a paired target
length(setdiff(toupper(queryGenes), pairs$queryTranscript.gene_name))
# Number of query genes not found in Gencode database
length(setdiff(toupper(queryGenes), gene$gene_name))
# genes in database but do not have a target to form dsRNA
setdiff(intersect(toupper(queryGenes), gene$gene_name),
pairs$queryTranscript.gene_name)
head(pairs, 20)
queryTranscript.gene_name | ref.gene_name |
---|---|
<chr> | <chr> |
THAP3 | DNAJC11 |
TRIT1 | Y_RNA |
PRPF38A | TUT4 |
PRPF38A | snoU13 |
AP4B1 | AP4B1-AS1 |
FDPS | RUSC1-AS1 |
GON4L | MSTO2P |
GON4L | AL139128.1 |
TRAF3IP3 | C1orf74 |
ZFP36L2 | AC010883.3 |
TRIP12 | RNU6-613P |
TRIP12 | FBXO36 |
DIS3L2 | AC019130.1 |
DIS3L2 | MIR562 |
DIS3L2 | NRBF2P6 |
VGLL4 | ATG7 |
VGLL4 | AC022001.2 |
VGLL4 | AC022001.3 |
GLB1 | SEC13P1 |
GLB1 | SUMO2P10 |
[1] "146 out of 511(30%) TWAS prioritized genes are predicted to form dsRNAs based on sequence overlapping."