neurosnap.database package#
Database and remote-search helpers.
- class neurosnap.database.CCD(code, name, smiles)[source]#
Bases:
objectMinimal Chemical Component Dictionary entry.
- code#
CCD identifier, typically 1-5 characters.
- name#
Human-readable component name.
- smiles#
SMILES string for the component (technically canonicalized but the canonicalization algorithm used by wwPDB is inconsistent with that of RDkit).
- smiles_canonical()[source]#
Return the RDKit-canonicalized SMILES string for this CCD entry.
- Return type:
- to_mol()[source]#
Return an RDKit molecule parsed from the canonical SMILES string.
- Return type:
- Returns:
RDKit molecule for the CCD entry.
- Raises:
ValueError – If the stored canonical SMILES cannot be parsed.
- neurosnap.database.fetch_accessions(accessions, batch_size=150)[source]#
Fetch sequences corresponding to a list of UniProt accession numbers.
This function queries UniParc first and then UniProtKB for any missing accessions. Accessions are processed in batches to handle large lists efficiently.
- Parameters:
- Return type:
- Returns:
Dictionary mapping accession numbers to protein sequences. Missing accessions are assigned
None.- Raises:
requests.exceptions.HTTPError – If an API request fails.
- neurosnap.database.fetch_uniprot(uniprot_id, head=False)[source]#
Fetch a UniProtKB or UniParc FASTA entry by identifier.
- Parameters:
- Return type:
- Returns:
Truewhenheadis enabled and the accession exists, otherwise the fetched protein sequence.- Raises:
Exception – If the accession is not found in UniProtKB or UniParc.
ValueError – If the returned FASTA does not contain exactly one sequence.
- neurosnap.database.foldseek_search(structure, mode='3diaa', databases=None, max_retries=10, retry_interval=5, output_format='json')[source]#
Perform a protein structure search using the Foldseek API.
- Parameters:
structure (
Union[Structure,StructureEnsemble,StructureStack,str,Path]) – A Neurosnap structure container or a path to a PDB file.mode (
str) – Search mode. Must be one of"3diaa"or"tm-align".databases (
Optional[List[str]]) – Databases to search. Defaults to Foldseek’s common public structure databases.max_retries (
int) – Maximum number of retries when polling job status.retry_interval (
int) – Seconds between job-status polls.output_format (
str) – Output format, either"json"or"dataframe".
- Return type:
Union[str,DataFrame]- Returns:
Search results in the requested format.
- Raises:
RuntimeError – If job submission or retrieval fails.
TimeoutError – If the job does not complete before retries are exhausted.
ValueError – If
output_formatis invalid.
- neurosnap.database.get_ccd(code, *, cache_path='~/.cache/neurosnap/ccd_entries.json', overwrite=False, max_age_days=7, timeout=30)[source]#
Return a CCD entry by its component code.
- neurosnap.database.get_ccd_entries(*, cache_path='~/.cache/neurosnap/ccd_entries.json', overwrite=False, max_age_days=7, timeout=30)[source]#
Fetch and cache CCD metadata entries.
The CCD payload is cached locally and refreshed when the cached payload exceeds
max_age_daysbased on its embeddedcreated_attimestamp.- Parameters:
- Return type:
- Returns:
Dictionary mapping CCD code to
CCD.
- neurosnap.database.run_blast(sequence, email, matrix='BLOSUM62', alignments=250, scores=250, evalue=10.0, filter=False, gapalign=True, database='uniprotkb_refprotswissprot', output_format=None, output_path=None, return_df=True)[source]#
Submit a BLASTP job to the EBI service and optionally return hits as a dataframe.
- Return type:
Optional[DataFrame]