neurosnap.database.uniprot module#

UniProt and UniParc sequence retrieval helpers.

neurosnap.database.uniprot.fetch_accessions(accessions, batch_size=150)[source]#

Fetch sequences corresponding to a list of UniProt accession numbers.

This function queries UniParc first and then UniProtKB for any missing accessions. Accessions are processed in batches to handle large lists efficiently.

Parameters:
  • accessions (Iterable[str]) – A list of UniProt accession numbers. Duplicate accessions are removed automatically.

  • batch_size (int) – Number of accessions to query per request.

Return type:

Dict[str, Optional[str]]

Returns:

Dictionary mapping accession numbers to protein sequences. Missing accessions are assigned None.

Raises:

requests.exceptions.HTTPError – If an API request fails.

neurosnap.database.uniprot.fetch_uniprot(uniprot_id, head=False)[source]#

Fetch a UniProtKB or UniParc FASTA entry by identifier.

Parameters:
  • uniprot_id (str) – UniProtKB or UniParc accession ID.

  • head (bool) – If True, perform a HEAD request and return whether the entry exists.

Return type:

Union[str, bool]

Returns:

True when head is enabled and the accession exists, otherwise the fetched protein sequence.

Raises:
  • Exception – If the accession is not found in UniProtKB or UniParc.

  • ValueError – If the returned FASTA does not contain exactly one sequence.