Identifying the formation of dsRNAs¶

Three scenarios:

  • antisense transcripts paired with the exons of a gene in the opposite strand
  • paired genes with overlapped exons
  • An anti-sense gene paired with nearby genes’ exons

Review of antisense transcription: https://www.nature.com/articles/nrm2738

Antisense can encode for proteins or be non-protein coding transcripts.

  • AS-CDS|AS-noncoding

The most prominent form of antisense transcription is AS-noncoding paired with protein coding transcripts.

In [696]:
## Check if any overlaps in transcripts of query genes and 
## the coding regions of other genes in the database
## Both strand directions are checked for overlapping.
queryTranscript <- query[which(query$type == "transcript"),]

ref <- transcript[transcript$gene_name %in% 
                 setdiff(transcript$gene_name, 
                         queryTranscript$gene_name
                         ),
                 ]
overlaps <- mergeByOverlaps(ref, 
                            queryTranscript,
                            ignore.strand = TRUE # include overlaps with the opposite strand
                           )

pairs<-data.frame(overlaps) %>% 
        select(c(queryTranscript.gene_name, ref.gene_name)) %>% 
        group_by(queryTranscript.gene_name, ref.gene_name) %>% 
        filter(row_number() == 1)

Validate with the three pairs of cis-NATs tested in Qin Li's nature paper¶

Three pairs of cis-NATs in the paper:

  • CTSA:PLTP
  • TNFRSF14:TNFRSF14-AS1
  • HBP1:COG5
A grouped_df: 6 × 2
queryTranscript.gene_nameref.gene_name
<chr><chr>
TNFRSF14TNFRSF14-AS1
HBP1 AC004492.1
HBP1 COG5
CTSA NEURL2
CTSA AL008726.1
CTSA PLTP

Validate with reported antisense transcripts and their targets in the review paper¶

A grouped_df: 51 × 3
queryTranscript.gene_nameref.gene_nametargets.gene_name
<chr><chr><chr>
PINK1 PINK1-AS naPINK1
PINK1 AL391357.1 naPINK1
MSH4 AL445464.1 Hspa5
SIX3 SIX3-AS1 Six3OS
VAX2 snoU13 Vax2OS
VAX2 ATP6V1B1 Vax2OS
VAX2 AC007040.2 Vax2OS
VAX2 ATP6V1B1-AS1Vax2OS
VAX2 ELOCP21 Vax2OS
ZEB2 AC009951.5 Zeb2 NAT
ZEB2 AC009951.2 Zeb2 NAT
ZEB2 AC009951.1 Zeb2 NAT
ZEB2 AC009951.4 Zeb2 NAT
ZEB2 ZEB2-AS1 Zeb2 NAT
MKRN2 MKRN2OS RAF1
MKRN2 RAF1 RAF1
SMAD5 SMAD5-AS1 DAMS
CDYL AL356747.1 CDYL-AS
CDYL CDYL-AS1 CDYL-AS
IGF2R AIRN Air
IGF2R AL353625.1 Air
EMX2 EMX2OS EMX2OS
BDNF BDNF-AS BDNFOS
BDNF AC103796.1 BDNFOS
PAX6 ELP4 Pax6OS
PAX6 PAUPAR Pax6OS
PAX6 AL035078.4 Pax6OS
WT1 WT1-AS WT1-AS
BACE1 RNF214 BACE1-AS
BACE1 BACE1-AS BACE1-AS
BACE1 AP000892.3 BACE1-AS
BACE1 CEP164 BACE1-AS
PMCH PARPBP pMCH antisense
PMCH HELLPAR pMCH antisense
OTX2 OTX2-AS1 Otx2OS
SIX6 C14orf39 Six6OS
CHRNA3CHRNA5 CHRNA5
CHRNA3AC067863.2 CHRNA5
COX10 COX10-AS1 C17ORF1
COX10 SNORA74 C17ORF1
SPHK1 PRPSAP1 Khps1
MBP AC093330.1 MBP-AS
MBP AC093330.2 MBP-AS
MBP AC093330.3 MBP-AS
MBP AC018529.1 MBP-AS
MBP AC018529.3 MBP-AS
MBP AC018529.2 MBP-AS
APOE AC011481.3 APOE-AS1
TOP1 PLCG1-AS1 TOP1-AS
TSIX XIST Xist
FMR1 FMR1-IT1 ASFMR1

The list of reported antisense-target pairs have different gene names, so I don't have a good way to compare the results yet.

'25 out of 32 genes in GENCODE database were found to form dsRNAs.'
In [641]:
## Number of query genes that overlap with other genes in Gencode database
length(unique(pairs$queryTranscript.gene_name))
25
In [642]:
## Number of query genes not predicted to have a paired target
length(setdiff(toupper(queryGenes), pairs$queryTranscript.gene_name))
44
In [643]:
# Number of query genes not found in Gencode database
length(setdiff(toupper(queryGenes), gene$gene_name))
37
In [645]:
# genes in database but do not have a target to form dsRNA
setdiff(intersect(toupper(queryGenes), gene$gene_name), 
          pairs$queryTranscript.gene_name)
  1. 'HFE'
  2. 'PDCD2'
  3. 'PAX2'
  4. 'CRX'
  5. 'RAX'
  6. 'MSX1'
  7. 'ABO'

Testing on TWAS prioritized genes¶

In [691]:
head(pairs, 20)
A grouped_df: 20 × 2
queryTranscript.gene_nameref.gene_name
<chr><chr>
THAP3 DNAJC11
TRIT1 Y_RNA
PRPF38A TUT4
PRPF38A snoU13
AP4B1 AP4B1-AS1
FDPS RUSC1-AS1
GON4L MSTO2P
GON4L AL139128.1
TRAF3IP3C1orf74
ZFP36L2 AC010883.3
TRIP12 RNU6-613P
TRIP12 FBXO36
DIS3L2 AC019130.1
DIS3L2 MIR562
DIS3L2 NRBF2P6
VGLL4 ATG7
VGLL4 AC022001.2
VGLL4 AC022001.3
GLB1 SEC13P1
GLB1 SUMO2P10
[1] "146 out of 511(30%) TWAS prioritized genes are predicted to form dsRNAs based on sequence overlapping."