iCLIP.kmers.pentamer_frequency

iCLIP.kmers.pentamer_frequency(profile, length, regex_matches, nSpread=15)

Calculate the frequency of overlaps between a list of cross-link sites (possibly extended) and a collection of sites

The second collection of site would usually be a Series of ararys of positions as returned by find_all_matches().

Parameters:
  • profile (pandas.Series) – A profile of the number of cross-links at each base.
  • length (int) – Length of the sequence represetnted by profile.
  • regex_matches (pandas.Series of nd.array of int) – Each array entry represents a single match on the sequence. Each array represents a different thing matched (e.g. a different regex) . This structure is would usually be returned by find_all_matches().
  • nSpread (int, optional) – How far either side of a cross-link location to consider when calculating overlaps (defaults to 15).
Returns:

Each entry is the number of overlaps between cross-links sites in profile and a single entry in regex_matches. Each same index as regex_matches.

Return type:

pandas.Series of int