sglearn package

Submodules

sglearn.featurization module

sglearn.featurization.build_feature_dict(context, guide_start, guide_length, features, nts, context_order, pam_interaction, guide_sections)[source]
sglearn.featurization.featurize_guides(kmers, features=None, pam_start=24, pam_length=3, guide_start=4, guide_length=20, pam_interaction=(24, 27), guide_sections=(10, 20), n_jobs=1)[source]

Featurize a list of guide sequences

Parameters:
  • kmers (list of str) – Context sequences
  • features (list of str, optional) – List of features. Will default to rule set 2 features
  • guide_start (int) – Position of guide start, zero-indexed
  • guide_length (int) – Length of guide
  • pam_start (int) – Position of pam start, zero-indexed
  • pam_length (int) – Length of PAM
  • pam_interaction (tuple) – Location on either side of the pam, zero-indexed
Returns:

Nucleotide features

Return type:

DataFrame

sglearn.featurization.get_context_order(k, pam_start, pam_length, guide_start, guide_length)[source]

Get named order of context sequence

Parameters:
  • k (int) – length of kmer
  • pam_start (int) – Start of PAM, one-indexed
  • pam_length (int) – PAM length
  • guide_start (int) – Start of guide, one-indexed
  • guide_length (int) – Length of guide
Returns:

Ordering for context sequence

Return type:

list

sglearn.featurization.get_frac_g_or_c(feature_dict, guide_sequence)[source]

Get gc content

Parameters:
  • feature_dict (dict) – Feature dictionary
  • guide_sequence (str) – Guide sequence
sglearn.featurization.get_guide_sequence(context, guide_start, guide_length)[source]
sglearn.featurization.get_one_nt_frac(feature_dict, guide, nts)[source]

Get fraction of single nt

Parameters:
  • feature_dict (dict) – Feature dictionary
  • guide (str) – Guide sequence
  • nts (list) – List of nucleotides
sglearn.featurization.get_one_nt_pos(feature_dict, context_sequence, nts, context_order)[source]

One hot encode single nucleotide

Parameters:
  • feature_dict (dict) – Feature dictionary
  • context_sequence (str) – Context sequence
  • nts (list) – List of nucleotides
  • context_order (list) – Position of context
sglearn.featurization.get_pam_interaction(feature_dict, context_sequence, nts, context_order, pam_ends)[source]

One hot encode interactions on either side of the PAM sequence

Parameters:
  • feature_dict (dict) – Feature dictionary
  • context_sequence (str) – Context sequence
  • nts (list) – List of nucleotides
  • context_order (list) – Position of context
  • pam_ends (tuple) – Location on either side of the pam, zero-indexed
sglearn.featurization.get_polyn(feature_dict, guide_sequence, nts)[source]

Get max run for each nucleotide

Parameters:
  • feature_dict (dict) – Feature dictionary
  • guide_sequence (str) – Guide sequence
  • nts (list) – List of nucleotides
sglearn.featurization.get_thermo(feature_dict, guide_sequence, sections)[source]

Use Biopython to get thermo info. from context and guides

Parameters:
  • feature_dict (dict) – Feature dictionary
  • guide_sequence (str) – Guide sequence
  • sections (iterable of int) – Section of guide sequence
sglearn.featurization.get_three_nt_counts(feature_dict, guide, nts)[source]

Get fraction of three nts

Parameters:
  • feature_dict (dict) – Feature dictionary
  • guide (str) – Guide sequence
  • nts (list) – List of nucleotides
sglearn.featurization.get_three_nt_pos(feature_dict, context_sequence, nts, context_order)[source]

One hot encode three nucleotides

Parameters:
  • feature_dict (dict) – Feature dictionary
  • context_sequence (str) – Context sequence
  • nts (list) – List of nucleotides
  • context_order (list) – Position of context
sglearn.featurization.get_two_nt_frac(feature_dict, guide, nts)[source]

Get fraction of two nts

Parameters:
  • feature_dict (dict) – Feature dictionary
  • guide (str) – Guide sequence
  • nts (list) – List of nucleotides
sglearn.featurization.get_two_nt_pos(feature_dict, context_sequence, nts, context_order)[source]

One hot encode two nucleotides

Parameters:
  • feature_dict (dict) – Feature dictionary
  • context_sequence (str) – Context sequence
  • nts (list) – List of nucleotides
  • context_order (list) – Position of context

Module contents

Top-level package for sglearn.