neurosnap.algos.kluster module#
Implementation of the Kluster algorithm by Danial Gharaie.
This clustering algorithm is adapted from: Amani, K., Shivnauth, V., & Castroverde, C. D. M. (2023). CBP60‐DB: An AlphaFold‐predicted plant kingdom‐wide database of the CALMODULIN‐BINDING PROTEIN 60 protein family with a novel structural clustering algorithm. Plant Direct, 7(7). https://doi.org/10.1002/pld3.509
- neurosnap.algos.kluster.check_alignment_tool(tool_name)[source]#
Check if the specified alignment tool exists and is executable.
- Return type:
- neurosnap.algos.kluster.cluster_projection(proj, eps=None, min_samples=None, scaling_factor=0.05, eps_floor=0.0001)[source]#
Cluster the projection using DBSCAN and raise error on too small input.
Expects proj to be the two dimensional output of reduce_dimensions array shape is (n_samples, n_dims)
- Parameters:
proj (
ndarray) – Array with shape (n_samples, n_dims)eps (
Optional[float]) – If provided use directly otherwise estimate as scaling_factor times the rangemin_samples (
Optional[int]) – If provided use directly otherwise estimate as max(1, int(log(n_samples)) plus one)scaling_factor (
float) – Fraction of the projection range for epseps_floor (
float) – Minimum eps to avoid zero when all points are the same
- Returns:
Cluster labels with negative one for noise
- Return type:
labels
- Raises:
ValueError – If proj is not a two dimensional array or has fewer than two samples
- neurosnap.algos.kluster.compute_distance_matrix(proteins, alignment_tool, use_tmscore, use_rmsd, num_processes)[source]#
Compute pairwise distance matrix using multiprocessing.
- Returns:
- A tuple containing:
- np.ndarray: A flattened feature matrix of shape (n, n*m) where:
n is the number of proteins m is the number of features (TM-score and/or RMSD)
List[str]: Sorted list of protein IDs corresponding to matrix rows/columns
- Return type:
- neurosnap.algos.kluster.reduce_dimensions(matrix, method, dimensions, scale, perplexity=30.0, n_neighbors=15, min_dist=0.1)[source]#
Perform dimensionality reduction on the flattened feature matrix.
- Parameters:
matrix (
ndarray) – Flattened feature matrix of shape (n, n*m)method (
str) – Dimensionality reduction method (UMAP, TSNE, or PCA)dimensions (
int) – Output dimensions (2 or 3)scale (
bool) – Whether to scale features before reductionperplexity (
float) – t-SNE perplexity parametern_neighbors (
int) – UMAP n_neighbors parametermin_dist (
float) – UMAP min_dist parameter
- Returns:
Reduced dimensional representation of shape (n, dimensions)
- Return type:
np.ndarray
- neurosnap.algos.kluster.run_alignment(pdb_f1, pdb_f2, alignment_tool, use_tmscore, use_rmsd)[source]#
Run structural alignment and extract features.
- neurosnap.algos.kluster.visualize_projection(proj, protein_ids, output_file, method, dimensions, cluster_labels)[source]#
Generate 2D/3D visualization of the projection with cluster coloring.
- Return type:
- neurosnap.algos.kluster.visualize_projection_interactive(proj, protein_ids, method, cluster_labels, width=900, height=600)[source]#
Generate interactive visualization of the projection using Plotly.
This function is designed for use in Jupyter notebooks and returns a Plotly figure that can be displayed with rich interactive features like zooming, panning, and hovering information.
- Parameters:
- Returns:
Interactive Plotly figure
- Return type:
go.Figure
Example
>>> fig = visualize_projection_interactive(proj, protein_ids, 'UMAP', cluster_labels) >>> fig.show() # Display in notebook