sglearn package¶
Submodules¶
sglearn.featurization module¶
-
sglearn.featurization.build_feature_dict(context, guide_start, guide_length, features, nts, context_order, pam_interaction, guide_sections)[source]¶
-
sglearn.featurization.featurize_guides(kmers, features=None, pam_start=24, pam_length=3, guide_start=4, guide_length=20, pam_interaction=(24, 27), guide_sections=(10, 20), n_jobs=1)[source]¶ Featurize a list of guide sequences
Parameters: - kmers (list of str) – Context sequences
- features (list of str, optional) – List of features. Will default to rule set 2 features
- guide_start (int) – Position of guide start, zero-indexed
- guide_length (int) – Length of guide
- pam_start (int) – Position of pam start, zero-indexed
- pam_length (int) – Length of PAM
- pam_interaction (tuple) – Location on either side of the pam, zero-indexed
Returns: Nucleotide features
Return type: DataFrame
-
sglearn.featurization.get_context_order(k, pam_start, pam_length, guide_start, guide_length)[source]¶ Get named order of context sequence
Parameters: - k (int) – length of kmer
- pam_start (int) – Start of PAM, one-indexed
- pam_length (int) – PAM length
- guide_start (int) – Start of guide, one-indexed
- guide_length (int) – Length of guide
Returns: Ordering for context sequence
Return type: list
-
sglearn.featurization.get_frac_g_or_c(feature_dict, guide_sequence)[source]¶ Get gc content
Parameters: - feature_dict (dict) – Feature dictionary
- guide_sequence (str) – Guide sequence
-
sglearn.featurization.get_one_nt_frac(feature_dict, guide, nts)[source]¶ Get fraction of single nt
Parameters: - feature_dict (dict) – Feature dictionary
- guide (str) – Guide sequence
- nts (list) – List of nucleotides
-
sglearn.featurization.get_one_nt_pos(feature_dict, context_sequence, nts, context_order)[source]¶ One hot encode single nucleotide
Parameters: - feature_dict (dict) – Feature dictionary
- context_sequence (str) – Context sequence
- nts (list) – List of nucleotides
- context_order (list) – Position of context
-
sglearn.featurization.get_pam_interaction(feature_dict, context_sequence, nts, context_order, pam_ends)[source]¶ One hot encode interactions on either side of the PAM sequence
Parameters: - feature_dict (dict) – Feature dictionary
- context_sequence (str) – Context sequence
- nts (list) – List of nucleotides
- context_order (list) – Position of context
- pam_ends (tuple) – Location on either side of the pam, zero-indexed
-
sglearn.featurization.get_polyn(feature_dict, guide_sequence, nts)[source]¶ Get max run for each nucleotide
Parameters: - feature_dict (dict) – Feature dictionary
- guide_sequence (str) – Guide sequence
- nts (list) – List of nucleotides
-
sglearn.featurization.get_thermo(feature_dict, guide_sequence, sections)[source]¶ Use Biopython to get thermo info. from context and guides
Parameters: - feature_dict (dict) – Feature dictionary
- guide_sequence (str) – Guide sequence
- sections (iterable of int) – Section of guide sequence
-
sglearn.featurization.get_three_nt_counts(feature_dict, guide, nts)[source]¶ Get fraction of three nts
Parameters: - feature_dict (dict) – Feature dictionary
- guide (str) – Guide sequence
- nts (list) – List of nucleotides
-
sglearn.featurization.get_three_nt_pos(feature_dict, context_sequence, nts, context_order)[source]¶ One hot encode three nucleotides
Parameters: - feature_dict (dict) – Feature dictionary
- context_sequence (str) – Context sequence
- nts (list) – List of nucleotides
- context_order (list) – Position of context
Module contents¶
Top-level package for sglearn.