neurosnap.database package#

Database and remote-search helpers.

class neurosnap.database.CCD(code, name, smiles)[source]#

Bases: object

Minimal Chemical Component Dictionary entry.

code#

CCD identifier, typically 1-5 characters.

name#

Human-readable component name.

smiles#

SMILES string for the component (technically canonicalized but the canonicalization algorithm used by wwPDB is inconsistent with that of RDkit).

code: str#
name: str#
smiles: str#
smiles_canonical()[source]#

Return the RDKit-canonicalized SMILES string for this CCD entry.

Return type:

str

to_mol()[source]#

Return an RDKit molecule parsed from the canonical SMILES string.

Return type:

Mol

Returns:

RDKit molecule for the CCD entry.

Raises:

ValueError – If the stored canonical SMILES cannot be parsed.

neurosnap.database.fetch_accessions(accessions, batch_size=150)[source]#

Fetch sequences corresponding to a list of UniProt accession numbers.

This function queries UniParc first and then UniProtKB for any missing accessions. Accessions are processed in batches to handle large lists efficiently.

Parameters:
  • accessions (Iterable[str]) – A list of UniProt accession numbers. Duplicate accessions are removed automatically.

  • batch_size (int) – Number of accessions to query per request.

Return type:

Dict[str, Optional[str]]

Returns:

Dictionary mapping accession numbers to protein sequences. Missing accessions are assigned None.

Raises:

requests.exceptions.HTTPError – If an API request fails.

neurosnap.database.fetch_uniprot(uniprot_id, head=False)[source]#

Fetch a UniProtKB or UniParc FASTA entry by identifier.

Parameters:
  • uniprot_id (str) – UniProtKB or UniParc accession ID.

  • head (bool) – If True, perform a HEAD request and return whether the entry exists.

Return type:

Union[str, bool]

Returns:

True when head is enabled and the accession exists, otherwise the fetched protein sequence.

Raises:
  • Exception – If the accession is not found in UniProtKB or UniParc.

  • ValueError – If the returned FASTA does not contain exactly one sequence.

Perform a protein structure search using the Foldseek API.

Parameters:
  • structure (Union[Structure, StructureEnsemble, StructureStack, str, Path]) – A Neurosnap structure container or a path to a PDB file.

  • mode (str) – Search mode. Must be one of "3diaa" or "tm-align".

  • databases (Optional[List[str]]) – Databases to search. Defaults to Foldseek’s common public structure databases.

  • max_retries (int) – Maximum number of retries when polling job status.

  • retry_interval (int) – Seconds between job-status polls.

  • output_format (str) – Output format, either "json" or "dataframe".

Return type:

Union[str, DataFrame]

Returns:

Search results in the requested format.

Raises:
  • RuntimeError – If job submission or retrieval fails.

  • TimeoutError – If the job does not complete before retries are exhausted.

  • ValueError – If output_format is invalid.

neurosnap.database.get_ccd(code, *, cache_path='~/.cache/neurosnap/ccd_entries.json', overwrite=False, max_age_days=7, timeout=30)[source]#

Return a CCD entry by its component code.

Return type:

Optional[CCD]

neurosnap.database.get_ccd_entries(*, cache_path='~/.cache/neurosnap/ccd_entries.json', overwrite=False, max_age_days=7, timeout=30)[source]#

Fetch and cache CCD metadata entries.

The CCD payload is cached locally and refreshed when the cached payload exceeds max_age_days based on its embedded created_at timestamp.

Parameters:
  • cache_path (str) – Local cache file path for the raw JSON payload.

  • overwrite (bool) – If True, force a fresh download.

  • max_age_days (int) – Maximum accepted payload age in days.

  • timeout (int) – HTTP timeout in seconds for the download request.

Return type:

Dict[str, CCD]

Returns:

Dictionary mapping CCD code to CCD.

neurosnap.database.run_blast(sequence, email, matrix='BLOSUM62', alignments=250, scores=250, evalue=10.0, filter=False, gapalign=True, database='uniprotkb_refprotswissprot', output_format=None, output_path=None, return_df=True)[source]#

Submit a BLASTP job to the EBI service and optionally return hits as a dataframe.

Return type:

Optional[DataFrame]

Submodules#