iCLIP.utils.TranscriptCoordInterconverter

class iCLIP.utils.TranscriptCoordInterconverter(transcript, introns=False)

Interconvert between genome domain and transcript domain coordinates.

As there are expected to be many calls against the same transcript, time can be saved by precomputation. Overlapping exons are merged.

Parameters:
  • transcript (sequence of CGAT.GTF.Entry) – Set of GTF entires representing a transcript.
  • introns (bool, optional) – Use introns instead of exons (see below).
transcript_id

Value of the transcript_id field of the transcript in quesiton. Taken from the transcript_id field of the first entry in transcript.

Type:str
strand

Strand of the transcript. Taken from the strand field of the first entry in transcript.

Type:str
offset

Position of the start of the transcript in genome coordinates

Type:int
genome_intervals

Coordinates of exons (or introns) in genome space as the difference from offset. These are sorted in transcript order (see below)

Type:list of tuples of int
transcript_intervals

Coordinates of exons (or introns) in transcript space. That is absolute distance from transcription start site after splicing.

Type:list of tuples of int
length

Total length of intervals (exons or introns) in the transcript

Type:int

Notes

Imagine the following transcript:

chr1  protein_coding  exon  100  108   .  -  .  transcript_id "t1"; gene_id "g1";
chr1  protein_coding  exon  112  119   .  -  .  transcript_id "t1"; gene_id "g1";
chr1  protein_coding  exon  100  108   .  -  .  transcript_id "t1"; gene_id "g1";

We can visualise the relationship between the different coordinate domains as below:

Genome coordinates:    1         1         1         1
                       0         1         2         3
                       0123456789012345678901234567890
Transcript:            |<<<<<<|----|<<<<<|-----|<<<<<|
Transcript Coords:      2             1              0
                       10987654    3210987     6543210
  with `introns=True`:         8765       43210

Thus the intervals representing the exons in the transcript domain are (0, 7), (7,14), (14, 22), and the genome base 115 corresponds to transcript base 10.

TranscriptCoordInterconverter.genome2transcript should be the interverse of TranscriptCoordInterconverter.transcript2genome.

That is if:

myConverter = TranscriptCoordInterverter(transcript)

then:

myConverter.genome2transcript(myConverter.transcript2genome(x)) == x

and:

myConverter.transcript2genome(myConverter.genome2transcript(x)) == x
__init__(transcript, introns=False)

Pre compute the conversions for each exon

Methods

genome2transcript(pos) Convert genome coordinate into transcript coordinates.
genome_interval2transcript(interval) Convert an interval in genomic coordinates into an interval in transcript-coordinates.
transcript2genome(pos) Convert transcript coordinates into genome coordinates.
transcript_interval2genome_intervals(interval) Convert an interval in transcript coordinates and convert to a list of intervals in genome coordinates.