neurosnap.structure.structure module#

Data structures for representing molecular coordinates and annotations.

This module provides a single-model Structure, immutable hierarchy views (Chain, Residue, and Atom), an ordered multi-model container (StructureEnsemble), and a shared-annotation multi-model fast path (StructureStack).

The universal length unit is Å.

class neurosnap.structure.structure.Atom(x, y, z, chain_id, res_id, ins_code, res_name, hetero, atom_name, element, annotations=<factory>)[source]#

Bases: object

Immutable atom-level hierarchy view.

annotations: Mapping[str, Any]#
atom_name: str#
chain_id: str#
property coord: ndarray#

Return the atom coordinates as a length-3 NumPy array.

element: str#
hetero: bool#
ins_code: str#
res_id: int#
res_name: str#
x: float#
y: float#
z: float#
class neurosnap.structure.structure.Chain(chain_id, _residues)[source]#

Bases: object

Immutable chain-level hierarchy view.

A Chain is a read-only hierarchy view over the residues associated with one chain identifier in a single Structure. It provides chain- level traversal plus convenience helpers for sequence extraction and simple residue-number gap detection.

chain_id#

Chain identifier represented by this view.

__getitem__(res_id)[source]#

Return a residue view by residue ID, not by positional index.

Parameters:

res_id (int) – Residue sequence number to retrieve.

Return type:

Residue

Returns:

The first Residue in this chain with the requested residue ID.

Raises:
  • TypeError – If res_id is not an integer residue ID.

  • KeyError – If no residue with the requested ID is present in the chain.

Notes

This method looks up residues by their residue ID rather than by list position. If multiple residues share the same residue ID, such as inserted residues distinguished by insertion codes, the first matching residue is returned and a warning is emitted.

__iter__()[source]#

Iterate over residues in residue order.

Return type:

Iterator[Residue]

chain_id: str#
missing_residue_ids()[source]#

Return missing residue numbers inferred from gaps in the chain.

Hetero residues are ignored so ligand or solvent numbering does not create artificial gaps in the polymer residue sequence.

Return type:

List[int]

Returns:

Sorted list of integer residue IDs that are absent between observed non-hetero residue numbers.

residues()[source]#

Return the residues that belong to this chain.

Return type:

List[Residue]

Returns:

List of immutable Residue views in residue order.

sequence(polymer_type='auto', include_modifications=False, modification_mode='inline', on_unknown_modified='raise')[source]#

Return the polymer sequence for this chain.

Protein, DNA, and RNA sequences are supported. Small molecules and other non-polymer residues in the chain are ignored. Modified residues can either be skipped, emitted inline as (CCD), or mapped to their parent sequence code when available.

Parameters:
  • polymer_type (Literal['auto', 'protein', 'dna', 'rna', 'nucleotide']) – Polymer family to extract. "auto" infers the family from the chain contents. "nucleotide" accepts either DNA or RNA, but raises if both are present.

  • include_modifications (bool) – Whether modified residues should contribute to the sequence. If False, modified residues are skipped entirely.

  • modification_mode (Literal['inline', 'parent']) – How included modifications are emitted. "inline" inserts (CCD) tokens, while "parent" uses the inferred parent residue code.

  • on_unknown_modified (Literal['raise', 'unknown']) – Behavior when modification_mode="parent" is requested but no parent code can be inferred. "raise" raises a ValueError; "unknown" inserts "X".

Return type:

str

Returns:

Sequence string for the selected polymer family. Returns an empty string if the chain contains no residues from the requested polymer family.

Raises:

ValueError – If the chain mixes polymer families in a way that conflicts with polymer_type or if an unknown modified residue cannot be mapped in "parent" mode.

class neurosnap.structure.structure.Residue(chain_id, res_id, ins_code, res_name, hetero, _atoms, _atom_indices)[source]#

Bases: object

Immutable residue-level hierarchy view.

A Residue groups atoms that share the same chain identifier, residue number, insertion code, residue name, and hetero flag. The object is a lightweight read-only view over the parsed atom table, intended for traversal and analysis rather than in-place editing.

chain_id#

Chain identifier containing the residue.

res_id#

Residue sequence number.

ins_code#

PDB insertion code for the residue.

res_name#

Residue name / CCD code.

hetero#

True for heterogens and False for polymer ATOM records.

__eq__(other)[source]#

Compare two residue views by stable identity.

__hash__()[source]#

Return a hash derived from key().

atom_indices()[source]#

Return atom-table indices for the atoms in this residue.

Return type:

List[int]

Returns:

List of integer atom indices in atom-table order.

atoms()[source]#

Return the atoms that belong to this residue.

Return type:

List[Atom]

Returns:

List of immutable Atom views in atom-table order.

chain_id: str#
hetero: bool#
ins_code: str#
key()[source]#

Return a stable residue identity tuple.

The returned key is suitable for dictionary/set membership when residue identity needs to be tracked outside the object itself.

Return type:

Tuple[str, int, str, str, bool]

Returns:

(chain_id, res_id, ins_code, res_name, hetero)

res_id: int#
res_name: str#
class neurosnap.structure.structure.Structure(remove_annotations=True)[source]#

Bases: object

Single-model molecular structure container.

Coordinates are stored separately from per-atom annotations so geometry-heavy operations can work on compact numeric arrays while annotation schemas remain flexible.

Parameters:

remove_annotations (bool) – If True, optional annotation columns that only contain default values are removed after initialization.

__getitem__(chain_id)[source]#

Return a chain view by chain ID.

Parameters:

chain_id (str) – Chain identifier to retrieve.

Return type:

Chain

Returns:

The matching Chain view.

Raises:
  • TypeError – If chain_id is not a string.

  • KeyError – If the requested chain is not present in the structure.

__init__(remove_annotations=True)[source]#

Initialize an empty single-model structure.

__iter__()[source]#

Iterate over chains in atom-table order.

Return type:

Iterator[Chain]

__len__()[source]#

Return the number of atoms in the structure.

Return type:

int

__repr__()[source]#

Return a compact string summary of the structure.

Return type:

str

add_annotation(name, dtype, values=None, *, fill_value=None, overwrite=False)[source]#

Add a new per-atom annotation column.

Parameters:
  • name (str) – Annotation name to add.

  • dtype (Any) – NumPy-compatible scalar dtype for the annotation values.

  • values (Any) – Optional per-atom values for the annotation.

  • fill_value (Any) – Optional default value used when values is not supplied.

  • overwrite (bool) – Whether to replace an existing optional annotation of the same name.

Raises:
  • ValueError – If the name is invalid, reserved, already present, or the supplied values do not match the atom count.

  • TypeError – If the supplied dtype is not a scalar per-atom dtype.

calculate_center_of_mass(chains=None)[source]#

Calculate the center of mass for the selected atoms.

Parameters:

chains (Optional[List[str]]) – Optional chain IDs to include. If None, all atoms are used.

Return type:

ndarray

Returns:

A length-3 NumPy array containing the center of mass in Å.

Raises:

ValueError – If no atoms are found in the selected structure or if any selected atom has an unknown element mass.

calculate_geometric_center(chains=None)[source]#

Calculate the geometric center for the selected atoms.

Parameters:

chains (Optional[List[str]]) – Optional chain IDs to include. If None, all atoms are used.

Return type:

ndarray

Returns:

A length-3 NumPy array containing the arithmetic mean of the selected atom coordinates in Å.

Raises:

ValueError – If no atoms are found in the selected structure.

calculate_rog(chains=None, center=None)[source]#

Calculate the radius of gyration for the selected atoms.

Parameters:
  • chains (Optional[List[str]]) – Optional chain IDs to include. If None, all atoms are used.

  • center (Optional[ndarray]) – Optional reference point. If None, the center of mass is used.

Return type:

float

Returns:

Radius of gyration in Å.

center_at(x=0.0, y=0.0, z=0.0, chains=None)[source]#

Translate selected atoms so their center of mass matches a target point.

Parameters:
  • x (float) – Target x-coordinate for the center of mass.

  • y (float) – Target y-coordinate for the center of mass.

  • z (float) – Target z-coordinate for the center of mass.

  • chains (Optional[List[str]]) – Optional chain IDs to center. If None, all atoms are used.

chain_ids()[source]#

Return all chains IDs found in the structure.

Return type:

List[str]

Returns:

List of strings for each chain.

chains()[source]#

Return all chains in the structure as immutable hierarchy views.

Return type:

List[Chain]

Returns:

List of Chain objects in atom-table order.

distances_from(point, chains=None)[source]#

Calculate distances from a point for the selected atoms.

Parameters:
  • point (ndarray) – Reference point as an array-like object with shape (3,).

  • chains (Optional[List[str]]) – Optional chain IDs to include. If None, all atoms are used.

Return type:

ndarray

Returns:

A 1D NumPy array containing Euclidean distances in atom-table order.

remove_annotation(name)[source]#

Remove a non-mandatory annotation column and return its values.

Parameters:

name (str) – Annotation name to remove.

Returns:

Copy of the removed annotation values.

Raises:
  • KeyError – If the annotation does not exist.

  • ValueError – If the name is invalid or refers to a mandatory annotation.

renumber(chain=None, start=1)[source]#

Renumber residues in-place.

Parameters:
  • chain (Optional[str]) – Chain ID to renumber. If None, all chains are renumbered in chain order using one continuous counter.

  • start (int) – Starting residue number.

Notes

Renumbering treats inserted residues as ordinary sequential residues and clears their insertion codes. For example, residues 10, 10A, and 10B become 1, 2, and 3 (with empty insertion codes) when renumbered with start=1.

to_dataframe()[source]#

Export the structure as a pandas dataframe.

This dataframe is derived on demand from the current atom table and is never cached on the structure.

Return type:

DataFrame

translate(x=0.0, y=0.0, z=0.0, chains=None)[source]#

Translate selected atoms in-place by a fixed vector.

Parameters:
  • x (float) – Translation along the x-axis.

  • y (float) – Translation along the y-axis.

  • z (float) – Translation along the z-axis.

  • chains (Optional[List[str]]) – Optional chain IDs to translate. If None, all atoms are translated.

class neurosnap.structure.structure.StructureEnsemble(models=None, *, model_ids=None, metadata=None)[source]#

Bases: object

Ordered collection of independent Structure models.

Unlike StructureStack, models in an ensemble do not need to have the same atoms, annotations, or bonds.

Parameters:
__getitem__(index)[source]#

Return a model by model ID or a sliced sub-ensemble by position.

Integer access uses model_id lookup rather than positional indexing, so ensemble[5] returns the model whose ID is 5. Slice access keeps normal positional semantics to preserve standard Python iteration and slicing behavior.

Raises:

KeyError – If an integer model ID is requested but not present.

__init__(models=None, *, model_ids=None, metadata=None)[source]#

Initialize an ordered collection of independent models.

__iter__()[source]#

Iterate over the stored models in order.

Return type:

Iterator[Structure]

__len__()[source]#

Return the number of models in the ensemble.

Return type:

int

__repr__()[source]#

Return a compact string summary of the ensemble.

Return type:

str

append(model, *, model_id=None)[source]#

Append a validated model to the ensemble.

Parameters:
  • model (Structure) – Model to append.

  • model_id (Optional[int]) – Optional model identifier. Defaults to the next sequential model ID starting at 1.

first()[source]#

Return the first model in the ensemble.

Return type:

Structure

Returns:

The first Structure in stored order.

Raises:

IndexError – If the ensemble is empty.

models()[source]#

Return the models as a shallow copied list.

Return type:

List[Structure]

remove_model(model_id)[source]#

Remove and return a model by model ID.

Parameters:

model_id (int) – Model identifier to remove.

Return type:

Structure

Returns:

The removed Structure.

Raises:

KeyError – If the requested model ID is not present.

renumber(start=1)[source]#

Renumber model identifiers in-place.

Parameters:

start (int) – Starting model ID. Defaults to 1.

to_dataframe()[source]#

Export the ensemble as a pandas dataframe with a model column.

This dataframe is derived on demand from the current models and is never cached on the ensemble.

Return type:

DataFrame

to_stack()[source]#

Convert the ensemble into a StructureStack.

Raises:

ValueError – If the models are not stack-compatible.

Return type:

StructureStack

class neurosnap.structure.structure.StructureStack(models=None, *, model_ids=None, metadata=None)[source]#

Bases: object

Shared-annotation, shared-bond multi-model fast path.

All models in a stack must share the same atom ordering, per-atom annotations, and bonds. Only the coordinates vary between models.

Parameters:
__getitem__(index)[source]#

Return a materialized model by model ID or a sliced sub-stack by position.

Integer access uses model_id lookup rather than positional indexing, so stack[5] returns the model whose ID is 5. Slice access keeps normal positional semantics to preserve standard Python slicing behavior.

Raises:

KeyError – If an integer model ID is requested but not present.

__init__(models=None, *, model_ids=None, metadata=None)[source]#

Initialize an empty or pre-populated stack of compatible models.

__iter__()[source]#

Iterate over the stack as materialized Structure models.

Return type:

Iterator[Structure]

__len__()[source]#

Return the number of models in the stack.

Return type:

int

__repr__()[source]#

Return a compact string summary of the stack.

Return type:

str

append(model, *, model_id=None)[source]#

Append a stack-compatible model.

Parameters:
  • model (Structure) – Model to append.

  • model_id (Optional[int]) – Optional model identifier. Defaults to the next sequential model ID starting at 1.

Raises:

ValueError – If the candidate model is not compatible with the existing stack.

property atom_count: int#

Return the number of shared atoms per model.

first()[source]#

Return the first model in the stack.

Return type:

Structure

Returns:

The first Structure in stored order.

Raises:

IndexError – If the stack is empty.

classmethod from_ensemble(ensemble)[source]#

Build a stack from an ensemble of compatible models.

Return type:

StructureStack

models()[source]#

Materialize and return all models in the stack.

Return type:

List[Structure]

remove_model(model_id)[source]#

Remove and return a model by model ID.

Parameters:

model_id (int) – Model identifier to remove.

Return type:

Structure

Returns:

The removed Structure.

Raises:

KeyError – If the requested model ID is not present.

renumber(start=1)[source]#

Renumber model identifiers in-place.

Parameters:

start (int) – Starting model ID. Defaults to 1.

to_dataframe()[source]#

Export the stack as a pandas dataframe with a model column.

This dataframe is derived on demand from the current stack contents and is never cached on the stack.

Return type:

DataFrame

to_ensemble()[source]#

Convert the stack into an independent StructureEnsemble.

Return type:

StructureEnsemble