sglearn package¶

Submodules¶

sglearn.featurization module¶

sglearn.featurization.build_feature_dict(context, guide_start, guide_length, features, nts, context_order, pam_interaction, guide_sections)[source]¶

sglearn.featurization.featurize_guides(kmers, features=None, pam_start=24, pam_length=3, guide_start=4, guide_length=20, pam_interaction=(24, 27), guide_sections=(10, 20), n_jobs=1)[source]¶

Featurize a list of guide sequences

Parameters:	kmers (list of str) – Context sequences features (list of str, optional) – List of features. Will default to rule set 2 features guide_start (int) – Position of guide start, zero-indexed guide_length (int) – Length of guide pam_start (int) – Position of pam start, zero-indexed pam_length (int) – Length of PAM pam_interaction (tuple) – Location on either side of the pam, zero-indexed
Returns:	Nucleotide features
Return type:	DataFrame

sglearn.featurization.get_context_order(k, pam_start, pam_length, guide_start, guide_length)[source]¶

Get named order of context sequence

Parameters:	k (int) – length of kmer pam_start (int) – Start of PAM, one-indexed pam_length (int) – PAM length guide_start (int) – Start of guide, one-indexed guide_length (int) – Length of guide
Returns:	Ordering for context sequence
Return type:	list

sglearn.featurization.get_frac_g_or_c(feature_dict, guide_sequence)[source]¶

Get gc content

Parameters:	feature_dict (dict) – Feature dictionary guide_sequence (str) – Guide sequence

sglearn.featurization.get_guide_sequence(context, guide_start, guide_length)[source]¶

sglearn.featurization.get_one_nt_frac(feature_dict, guide, nts)[source]¶

Get fraction of single nt

Parameters:	feature_dict (dict) – Feature dictionary guide (str) – Guide sequence nts (list) – List of nucleotides

sglearn.featurization.get_one_nt_pos(feature_dict, context_sequence, nts, context_order)[source]¶

One hot encode single nucleotide

Parameters:	feature_dict (dict) – Feature dictionary context_sequence (str) – Context sequence nts (list) – List of nucleotides context_order (list) – Position of context

sglearn.featurization.get_pam_interaction(feature_dict, context_sequence, nts, context_order, pam_ends)[source]¶

One hot encode interactions on either side of the PAM sequence

Parameters:	feature_dict (dict) – Feature dictionary context_sequence (str) – Context sequence nts (list) – List of nucleotides context_order (list) – Position of context pam_ends (tuple) – Location on either side of the pam, zero-indexed

sglearn.featurization.get_polyn(feature_dict, guide_sequence, nts)[source]¶

Get max run for each nucleotide

Parameters:	feature_dict (dict) – Feature dictionary guide_sequence (str) – Guide sequence nts (list) – List of nucleotides

sglearn.featurization.get_thermo(feature_dict, guide_sequence, sections)[source]¶

Use Biopython to get thermo info. from context and guides

Parameters:	feature_dict (dict) – Feature dictionary guide_sequence (str) – Guide sequence sections (iterable of int) – Section of guide sequence

sglearn.featurization.get_three_nt_counts(feature_dict, guide, nts)[source]¶

Get fraction of three nts

Parameters:	feature_dict (dict) – Feature dictionary guide (str) – Guide sequence nts (list) – List of nucleotides

sglearn.featurization.get_three_nt_pos(feature_dict, context_sequence, nts, context_order)[source]¶

One hot encode three nucleotides

Parameters:	feature_dict (dict) – Feature dictionary context_sequence (str) – Context sequence nts (list) – List of nucleotides context_order (list) – Position of context

sglearn.featurization.get_two_nt_frac(feature_dict, guide, nts)[source]¶

Get fraction of two nts

Parameters:	feature_dict (dict) – Feature dictionary guide (str) – Guide sequence nts (list) – List of nucleotides

sglearn.featurization.get_two_nt_pos(feature_dict, context_sequence, nts, context_order)[source]¶

One hot encode two nucleotides

Parameters:	feature_dict (dict) – Feature dictionary context_sequence (str) – Context sequence nts (list) – List of nucleotides context_order (list) – Position of context

Module contents¶

Top-level package for sglearn.