neurosnap.nucleotide module#

Provides functions and classes related to processing nucleotide data.

neurosnap.nucleotide.get_reverse_complement(seq)[source]#

Generate the complementary strand of a DNA or RNA sequence in reverse order.

Parameters:

seq (str) – A string representing the nucleotide sequence. Valid characters are ‘A’, ‘T’, ‘C’, ‘G’ for DNA and ‘A’, ‘U’, ‘C’, ‘G’ for RNA.

Returns:

A string representing the reverse complementary strand of the input sequence.

Return type:

str

Raises:

KeyError – If the input sequence contains invalid nucleotide characters.

neurosnap.nucleotide.split_interleaved_fastq(fn_in, output_dir, *, preserve_identifier_names=False)[source]#

Split an interleaved FASTQ into left/right FASTQ files.

Assumes pairs are adjacent (left read followed by right read) and rewrites headers as “@<index>/1” and “@<index>/2”.

Supports gzip-compressed inputs with filenames ending in “.fastq.gz” or “.fq.gz”. Compression is detected by filename and streamed during read.

Parameters:
  • fn_in (Union[str, Path]) – Path to the interleaved FASTQ file.

  • output_dir (Union[str, Path]) – Directory to write outputs into.

  • preserve_identifier_names (bool) – If True, preserve the input read identifiers (normalizing mate suffix to “/1” or “/2”). If False, rewrite identifiers as “@<index>/1” and “@<index>/2”.

Returns:

Paths to the left and right FASTQ output files.

Return type:

Tuple[Path, Path]