iCLIP.counting.count_transcript

iCLIP.counting.count_transcript(transcript, bam, flanks=0)

Count clip cross-link sites across a transcript

Provided with a complete transcript, count_transcript returns all cross-links in the transcript and returns counts in transcript domain coordinates.

Parameters:
  • transcript (iter of CGAT.GTF.Entry objects) – The transcript to be counted over. Usually as returned by one of CGAT.GTF iterators such as CGAT.GTF.transcript_iterator or CGAT.GTF.flat_gene_iterator
  • bam (*_getter function) – getter function to retrieve crosslinks from, as returned by make_getter().
  • flanks (int) – Length of 5’ and 3’ flanks in bp to count on either end of the transcript. Note: changes index type of returned Series.
Returns:

Series of counts. Inner-most level of the index is the coordinates in the transcript domain (see below). If flanks > 0 then will have MultiIndex (see below).

Return type:

pandas.Series

See also

make_getter()
Access to cross-link data for range of file-types.
getCrosslink()
find cross-link base from pysam.AlignedSegment.
TranscriptCoordInterconverter()
Conversion between genome and transcript coordinates.

Notes

Returns coordinates in the transcript domain. That is the base corresponding to the TSS of the transcript will be base 0 and the bases are numbered 0 to sum(length of exons). Introns are excluded both from the counting and from the coordinates. Strand is also accounted for.

This function can also return flanking regions upstream and downstream of the transcript. This is specified by flanks. If requested, the returned Series will have a pandas.MultiIndex. The inner level will correspond to the base and the outer level to whether the inner level refers to the the 5’ flank (‘flank5’), the 3’ flank (‘flank3’) or the transcript (‘exon’)