Navigating the Chemical Landscape: Estimating Toxicity and synthetic accessibility with eToxPred on the NeuroSnap Platform

Written by Danial Gharaie Amirabadi

Published 2024-9-10

Preview

Drug development is becoming increasingly costly and time-consuming, with the efficiency of launching new pharmaceuticals on the decline. To address this challenge, we offer a service powered by eTox Pred, a cutting-edge machine learning tool designed to predict the toxicity and synthetic accessibility of small organic compounds Leveraging high-throughput experimental techniques and computational modeling, eToxPred provides reliable assessments to streamline drug discovery efforts. Our service utilizes eToxPred to evaluate drug candidates, ensuring you can filter out potentially toxic or hard-to-synthesize compounds early in the development process. With 72% accuracy in toxicity prediction and a 4% mean square error for synthetic accessibility, our service aims to reduce the time and costs involved in bringing new drugs to market.

Toxicity prediction

The goal of toxicity prediction methods is to detect harmful or negative effects that specific chemicals may have on humans, animals, plants, or the environment. Traditional methods for assessing toxicity profiles, which rely on animal testing, face limitations related to time, cost, and ethical issues. Consequently, rapid and cost-effective computational techniques are frequently utilized initially to weed out potentially toxic substances and minimize the number of experiments required.

About eToxPred

eToxPred represents an innovative method designed to predict both the synthetic accessibility and toxicity of molecules in a generalized manner. Unlike traditional approaches that rely on manually-crafted descriptors, eToxPred utilizes a generic model to estimate toxicity directly from the molecular fingerprints of chemical compounds. This advancement allows for more effective analysis across diverse and heterogeneous datasets.

Key Features of eToxPred:

eToxPred serves primarily to identify drug candidates that could be potentially harmful or difficult to synthesize. This is especially beneficial in Computer-Aided Drug Design (CADD), as removing unsuitable candidates early on can conserve both time and resources.

Implementation

The study employs two distinct machine learning models: a Deep Belief Network (DBN) for predicting synthetic accessibility (SAscore) and Extremely Randomized Trees (ET) for toxicity prediction, each tailored to the complexity of the respective tasks. The DBN is composed of a visible layer based on a 1024-bit Daylight fingerprint and three hidden layers (512, 128, and 32 nodes). It leverages layer-wise training and contrastive divergence for efficient unsupervised learning, with L2 regularization to minimize overfitting. The DBN is chosen for its ability to capture deep hierarchical representations, making it ideal for the complex task of predicting synthetic accessibility, where subtle features must be extracted from chemical data.

In contrast, the ET algorithm is used for toxicity prediction, a simpler classification task. ET constructs an ensemble of 500 decision trees, utilizing a randomized top-down splitting approach. This model is faster and more interpretable than DBN, as classification typically requires less complexity. The randomized splitting reduces overfitting, making it robust even for small datasets. The maximum depth of the trees is set to 70, with optimal parameters fine-tuned for the dataset.

Using eToxPred on Neurosnap

To use eToxPred on Neurosnap, navigate to the Services page, where you can find eTox Drug Toxicity Prediction either by using the search function or by selecting the "Toxicity Prediction" filter. Once on the eTox page, you'll be prompted to input the small molecules you want to predict the toxicity of in CSV format. Each compound should be entered as a unique ID followed by its corresponding SMILES string, with each entry on a separate line.

eToxPred page For example, let’s start with the following set of CIDs and SMILES strings to demonstrate how to run the eToxPred service:

7237,CC1=CC=CC=C1C
999,C1=CC=C(C=C1)CC(=O)O
1140,CC1=CC=CC=C1
637511,C1=CC=C(C=C1)C=CC=O
4775,C1=CC=C(C=C1)CCCC(=O)O
244,C1=CC=C(C=C1)CO
7500,CCC1=CC=CC=C1
7501,C=CC1=CC=CC=C1
7929,CC1=CC(=CC=C1)C
12053,CC(C)(C1=CC=CC=C1)O

After entering this data in CSV format, you can proceed to run the service.

Interpreting the Results

Once you have run your analysis using Neurosnap, you will be presented with a table containing two key scores for each compound:

  1. Toxicity Prediction (0 to 1):

Lower values indicate safer compounds; higher values suggest higher toxicity risk.

  1. Synthetic Accessibility (1 to 10):

Lower scores mean easier synthesis; higher scores indicate more complex synthesis challenges.
In our example, the output is as follows:

table

Alongside these scores, you receive a scatter plot of synthetic accessibility against toxicity probability. This plot offers a quick visual summary of compounds, helping you identify those that are both easy to synthesize and have low toxicity.

Here is the plot for our example: plot

Conclusion

In conclusion, eToxPred offers a valuable tool for researchers working in drug discovery. By accurately predicting toxicity and synthetic accessibility, eToxPred can help streamline the development process, reduce costs, and improve the efficiency of identifying promising drug candidates. With its integration into the Neurosnap platform, eToxPred is easily accessible to researchers seeking to enhance their drug discovery efforts.

You can check out our example run at: https://neurosnap.ai/job/66df54fa3febf92a5522c0a4?share=66df59d63febf92a5522c0a9

Want to get started with eTox Drug Toxicity Prediction? Register here and run your own jobs!

Explore more posts

Revolutionizing Medicine: The Remarkable Stories of Imatinib and Oseltamivir

By Amélie Lagacé-O'Connor

Comparative Genomics: Analysis of Evolutionary Relationships Among Species

By Amélie Lagacé-O'Connor

How to Use AlphaFold2 as a Wet Lab Biologist (Pt 3)

By Keaun Amani

Understanding the Radius of Gyration in Protein Design & Bioinformatics

By Danial Gharaie Amirabadi

Creating Next Generation Fluorescent Proteins Using AlphaFold2 and ProteinMPNN

By Keaun Amani

Interpreting Boltz-1 (AlphaFold3) Metrics and Visualizations on Neurosnap

By Danial Gharaie Amirabadi

Accelerate your lab's
research today

Register for free — upgrade anytime.

Interested in getting a license? Contact Sales.

Sign up free