neurosnap.algos.kluster module#

Implementation of the Kluster algorithm by Danial Gharaie.

This clustering algorithm is adapted from: Amani, K., Shivnauth, V., & Castroverde, C. D. M. (2023). CBP60‐DB: An AlphaFold‐predicted plant kingdom‐wide database of the CALMODULIN‐BINDING PROTEIN 60 protein family with a novel structural clustering algorithm. Plant Direct, 7(7). https://doi.org/10.1002/pld3.509

neurosnap.algos.kluster.check_alignment_tool(tool_name)[source]#

Check if the specified alignment tool exists and is executable.

Return type:

str

neurosnap.algos.kluster.cluster_projection(proj, eps=None, min_samples=None, scaling_factor=0.05, eps_floor=0.0001)[source]#

Cluster the projection using DBSCAN and raise error on too small input.

Expects proj to be the two dimensional output of reduce_dimensions array shape is (n_samples, n_dims)

Parameters:
  • proj (ndarray) – Array with shape (n_samples, n_dims)

  • eps (Optional[float]) – If provided use directly otherwise estimate as scaling_factor times the range

  • min_samples (Optional[int]) – If provided use directly otherwise estimate as max(1, int(log(n_samples)) plus one)

  • scaling_factor (float) – Fraction of the projection range for eps

  • eps_floor (float) – Minimum eps to avoid zero when all points are the same

Returns:

Cluster labels with negative one for noise

Return type:

labels

Raises:

ValueError – If proj is not a two dimensional array or has fewer than two samples

neurosnap.algos.kluster.compute_distance_matrix(proteins, alignment_tool, use_tmscore, use_rmsd, num_processes)[source]#

Compute pairwise distance matrix using multiprocessing.

Returns:

A tuple containing:
  • np.ndarray: A flattened feature matrix of shape (n, n*m) where:

    n is the number of proteins m is the number of features (TM-score and/or RMSD)

  • List[str]: Sorted list of protein IDs corresponding to matrix rows/columns

Return type:

tuple

neurosnap.algos.kluster.reduce_dimensions(matrix, method, dimensions, scale, perplexity=30.0, n_neighbors=15, min_dist=0.1)[source]#

Perform dimensionality reduction on the flattened feature matrix.

Parameters:
  • matrix (ndarray) – Flattened feature matrix of shape (n, n*m)

  • method (str) – Dimensionality reduction method (UMAP, TSNE, or PCA)

  • dimensions (int) – Output dimensions (2 or 3)

  • scale (bool) – Whether to scale features before reduction

  • perplexity (float) – t-SNE perplexity parameter

  • n_neighbors (int) – UMAP n_neighbors parameter

  • min_dist (float) – UMAP min_dist parameter

Returns:

Reduced dimensional representation of shape (n, dimensions)

Return type:

np.ndarray

neurosnap.algos.kluster.run_alignment(pdb_f1, pdb_f2, alignment_tool, use_tmscore, use_rmsd)[source]#

Run structural alignment and extract features.

Return type:

Dict[str, Any]

neurosnap.algos.kluster.visualize_projection(proj, protein_ids, output_file, method, dimensions, cluster_labels)[source]#

Generate 2D/3D visualization of the projection with cluster coloring.

Return type:

None

neurosnap.algos.kluster.visualize_projection_interactive(proj, protein_ids, method, cluster_labels, width=900, height=600)[source]#

Generate interactive visualization of the projection using Plotly.

This function is designed for use in Jupyter notebooks and returns a Plotly figure that can be displayed with rich interactive features like zooming, panning, and hovering information.

Parameters:
  • proj (ndarray) – Projection matrix of shape (n, dimensions)

  • protein_ids (List[str]) – List of protein identifiers

  • method (str) – Name of dimensionality reduction method used

  • cluster_labels (ndarray) – Cluster assignments from DBSCAN

  • width (int) – Width of the plot in pixels

  • height (int) – Height of the plot in pixels

Returns:

Interactive Plotly figure

Return type:

go.Figure

Example

>>> fig = visualize_projection_interactive(proj, protein_ids, 'UMAP', cluster_labels)
>>> fig.show()  # Display in notebook