Loading...
The Sound of Silence ( S.O.S) campaign which was her brainchild- was launched by celebrity and Actress Gul Panag at The Techfest 2017 held at IIT - Mumbai midst thousands of IIT'ians from across India to bring to light the un -heard cries and silent suffering / death of millions of beagles within the closed and cold mortar walls of testing laboratories in India and across the world. Project D.O.N.T was borne out of this campaign to find a AI tool / model to replace dogs in testing.
This project was prepared and presented on July 11th 2018 to Ms. Harriet Green OBE, CEO - IBM Asia Pacific by Shiranee Pereira- Founder of Project D.O.N.T. Input received from Dr. Gargi Dasgupta, Senior Manager, IBM Research, Bangalore is gratefully acknowledged. It was taken up in 2019 by the ‘Science for Social Good’ group at IBM Research Laboratory, New York. A team of five research scientists headed by Dr. Payel Das -worked on the project to bring out a brilliant machine learning model which can predict human toxicity with far more precision than animals. The findings of which have been published in the paper ‘Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations has been published in NATURE Scientific Reportshttps://www.nature.com/articles/s41598-023-31169-8
Project D.O.N.T. envisions a world safer and kinder to animals and human kind.
The project that seeks to augment the precision and prediction of drug/ chemical toxicity by replacing the dog in testing with the power and potential of machine learning and Artificial/Augmented Intelligence. The multi-billion dollar drug discovery industry – in which testing on animals is a mandated part – importantly the dog- if replaced by cognitive computing and Read Across strategies using Big Data , will serve as a robust and far more predictive, economical and humane replacement to the use of dogs in toxicity testing using computational toxicology - which when test tried and validated by regulatory agencies will be an ingenious high through put model to predict toxicity with far more precision than animals, when launched on a powerful AI platform.
Project D.O.N.T. was the result of two round table conferences that were held in India to discuss the use of dogs in testing in the light of recent scientific findings with regard to the functional utility of using dogs in toxicity testing and neurological evidence on canine sentience. The participants were from the pharma industry, testing laboratories, regulatory agencies and animal welfare groups. The first conference was held at the National Academy of Agricultural Research Management (ICAR) on 12th Oct 2017 at Hyderabad and the second held at The WESTIN, on May 17th 2018 at Chennai.
Both the Round Table conferences received an overwhelming support from the pharmaceutical industry, regulatory bodies in India that include DCGI, IPC, CIB and CPCSEA, representatives from ICMR, ICAR and Animal welfare organizations.
At the first-round table conference participants resolved that they agree ‘the dog is a deeply sentient being and recognize an urgent need to chart out a roadmap to replace the use of dogs in animal testing’. Following which the second-round table conference was held – where in the participants deliberated on a gamut of non- animal methods and resolved that they ‘see a promising future with computational toxicology as a stand alone or in combination with multiple- organ -on- chip -as alternative/s that could bridge the gap
between rodent and clinical trials - there by replacing the dog as the non- rodent species in regulatory testing.
Imagine a world sans chocolate! That is exactly what it would be if we are to rely on testing chocolate on dogs. Chocolate can kill your dog- but chocolate for humans besides being a relished treat is good for the human brain and is like opium when you are depressed. In dogs the ‘toxic’ component of chocolate is theobromine- while humans easily metabolize theobromine, dogs metabolize it slowly, allowing it to build up to toxic levels in their system. Simply put – this
is how unreliable and unpredictive the dog is when used in testing. But yet the dog remains the trusted yard stick and the last and final check point to measure toxicity of a drug before clinical trials. But archaic regulations - with no scientific rationale - promulgated more than half a century ago still demand that every drug, pesticide, chemical and medical device is tested for safety on rodents and a non-rodent – most often the dog (otherwise primate).
Not surprisingly -today drug attrition hovers around a 94- 95%. And taking over US$2.5bn and more than ten years to bring a new drug to the market. Of which only one in ten drugs that enter Phase I of clinical trials reaches the patient. Nine out of ten of these drugs will be recalled at some point of time- as and when an unpredicted toxicity not revealed by the animals on which the drug was tested on - harms, maims or kills a patient.
Drug recalls are due to the obvious inherent weakness in extrapolating toxicity data from animals to human beings – the inability of animal models to predict human toxicity. The FDA determined that a ten percent improvement in compound attrition would save $100 million per drug. This staggering cost incurred in failures percolates to the end user – the patient- concealed in the cost of the drug -thus making essential and life- saving health care un- affordable to many, especially in Third world countries.
This inability of dogs to predict human toxicity is mainly because of the significant differences between humans and dogs in their cytochrome P450 enzymes (CYPs) — the major enzymes involved in drug metabolism. Besides this, in an analysis of a comprehensive quantitative database of publicly-available animal toxicity studies of 2366 compounds indicated that dogs are highly inconsistent predictors of toxic responses in humans, and that the predictions they can provide are little better than those that could be obtained by chance ― or tossing a coin ― when considering whether or not a compound should proceed to testing in humans. Notably, the absence of toxicity in dogs provides essentially no insight into the likelihood of a similar lack of toxicity in humans: the former contributes no, or almost no, evidential weight to the latter. Quantitatively: if, for example, a new drug has (based on prior information) a 70% chance of not being toxic in humans, then a negative test in dogs increases this probability to just 72%. The dog tests therefore provide essentially no additional confidence in the outcome for humans, but at great ethical - and financial - cost.
On the ethical front -with more than 200,000 dogs being used annually- there was been public outcry opposing the using of “Man’s best friend’ in the toxicity testing. A campaign launched by on Change.org called the Sound of Silence (S.O.S) campaign has 130000 supporters to date .In recent years, neuroscientists have been using functional magnetic resonance imaging (fMRI) on fully awake and unrestrained dog ‘volunteers’ as an effective tool to understand the neural circuitry and functioning of the canine brain, which has fortified ethological findings on canine cognition and sentience.
The first and most important neurological finding that fMRI studies have proven is that there exists a striking similarity between dogs and humans in the functioning of the caudate nucleus (the part of the brain associated with pleasure and emotion), and dogs experience positive emotions, empathic-like responses and demonstrate human bonding which, scientists claim, may be at least comparable with a human child aged 2-2.5 years.
Secondly FMRI studies have shown that there exists an area analogous to the 'voice area' of primates in the canine brain, enabling dogs to comprehend and respond to emotional cues/valence in human voices. Dogs and humans have dedicated voice areas in similar locations, and there are striking similarities in the way dog and human brains process emotionally loaded sounds. In both species, an area near the primary auditory cortex lit up more with happy speech than unhappy speech. Thus, dog brains, like those of people, are also sensitive to acoustic cues of emotion.
In this scenario of high drug recalls and attrition, in recent years – very nearly in the recent months - we do see an encouraging trend with a shift in focus from animals to non- animal models both in drug discovery and in toxicity testing for both pharma and chemicals.
The biggest game changer has been in the context of the REACH Directive (Registration, Evaluation, Authorization and Restriction of Chemicals) directive with ECHA (European Chemicals Agency) accepting Read Across as a method for toxicity prediction for chemical registration under REACH directive.
ECHA has also explicitly documented and published the RAAF document (Read across Assessment Framework, 2017). Others include FDA’s Predictive Toxicology Road Map and similar road maps published by EPA – “A Strategic Plan to Reduce, Refine and Replace Vertebrate Animal Testing under the Amended TSCA ” Perusal of literature in the field of alternatives to animals in research and testing indicates no dearth in the availability of NAMs. (New Alternative Methods). However, by virtue of practice - paradoxically - animal methods continue to be the gold standard even when in reality only one in ten drugs that enter Phase I clinical trials reaches patients due to the lack of efficacy of animal testing models. Seemingly a lack of will to accept change, fear of new and the untrodden path and a lethargy to move away from historic models of testing comes across as the biggest hurdles in adopting more precise, predictive, safer and humane models in regulatory toxicology.
With regard to drug discovery it is encouraging to note that big pharma do see the potential of AI and are poised for a big change. In the article “Artificial Intelligence: will it change the way drugs are discovered? (Pharmaceutical Journal, Dec 2017) the author talks about how pharmaceutical giants are beginning to invest in artificial intelligence in order to develop better diagnostics or biomarkers, to identify drug targets and to design new drugs. GSK is one of the first big pharmaceutical companies to create its own in-house AI unit. “Harnessing the power of modern supercomputers and machine learning will enable us to develop medicines more quickly, and at a reduced cost,” says head of the new unit John Baldoni.
The first and most important neurological finding that fMRI studies have proven is that there exists a striking similarity between dogs and humans in the functioning of the caudate nucleus (the part of the brain associated with pleasure and emotion), and dogs experience positive emotions, empathic-like responses and demonstrate human bonding which, scientists claim, may be at least comparable with a human child aged 2-2.5 years.
Amongst many -a fine example how AI reduces effort, time and resources in drug discovery is exemplified in the outcome of a project for psychiatric therapeutics by Exscientia (UK) when in 2015 in collaboration with a Japanese Pharma met with great success where in within 12 months they had discovered and optimized a drug candidate while synthesizing fewer than 400 compounds!
In a very recent publication from Elsevier and Bayer AG – ‘A big data approach to the concordance of the toxicity of pharmaceuticals in animals and humans (Regulatory Toxicology & Pharmacology Vol. 96, July 2018) the authors Clark and Hartmann found that negative predictive value was very low in animals i.e. no findings in animals does not mean that there will be no adverse events in humans. With Elsevier keen on taking the big data approach forward – first author Dr. Matthew Clark feels continued application of technology- as with big data - will help industry make even more safe and humane break throughs in the future.
in the review article by Chen et al. ‘IBM Watson- How cognitive computing can be applied for big data challenges in life science research’ (Clinical Therapeutics, 48(4) 2016) the authors detail how Watson can leverage big data , integrate data from different stages of drug development , combine the discoveries made across the drugs development cycle ( in vitro tox data , in vivo tox data and human data from clinical
trials ) and use it to create a holistic visualization of the data in simple formats such as network maps depicting relationships thereby presenting a fuller view of the information available . In doing this -Watson uses quantitative predictive analytics to infer relationships for which there may not be yet any explicit evidence. Besides this Watson is capable of concept recognition i.e. can recognize terms and their synonyms; can extract entities with the help of annotators with an ability to preclude human bias and is capable of cross domain discovery.
In a research challenge to predict the kinases that might phosphorylate P53 protein, Watson researchers identified 9 potential kinases in weeks. The same exercise had taken the company 10 years to identify the same kinases and going a step further Watson not only threw up a bigger and better dataset of potential kinases but also ranked it with various levels of probability in phosphorylating P53.
In a research challenge to predict the kinases that might phosphorylate P53 protein, Watson researchers identified 9 potential kinases in weeks. The same exercise had taken the company 10 years to identify the same kinases and going a step further Watson not only threw up a bigger and better dataset of potential kinases but also ranked it with various levels of probability in phosphorylating P53
In another example – the challenge to Watson was to identify compounds to treat malaria from a large biopharma’s existing therapeutics portfolio. The study included the exploration of MEDLINE literature and to identify those molecules in the existing therapeutic portfolio for those with a structural similarity to known malarial treatments – by looking for similarity in structure and mechanism of action. In this proof of concept study, it was seen that what took the company 10 research scientists and 14 months- took Watson less than a month providing a better and richer dataset of candidate molecules.
In a study to identify the genes associated with Multiple sclerosis (MS), Watson was able to draw up a network map (right) in less than a minute - depicting the relationship between genes and MS out of more than 24 million MEDLINE abstracts.
The speed, intelligence and power of Watson needs no review here- but in an effort, to bring out its potential in toxicity testing the above examples have been cited. Watson which is known to have a performance of 80 Teraflops and an ability to process 500 Gigabytes- equivalent to a million books- per second - can obviously accelerate the speed of any life science research process as also with drug toxicity testing using methods in computational toxicology.
The biggest challenge in this could be that data about drugs are stored in several repositories - which would be separate data sets for- chemo- tox, in vivo tox, in-vitro tox, human data from clinical trials (Phase 1, II and III), ADR reports etc. Again, Watson with its ability to leverage large volumes of information from big data sets, perform cross domain linkages and in combination with its ability to observe, interpret, evaluate and finally present its findings as holistic relationship maps would make it the perfect tool and solution to predict the toxicity of chemicals and New chemical Entities (NCE’s) in drug development
‘Making Big Sense from Big Data in Toxicology by Read-Across’ (ALTEX 33 (2) 2016) by Thomas Hartung reveals the potential of Big data in chemical toxicity prediction. In recent times small software programs such REACH Across have been developed to predict toxicity with sensitivities of around 80% which could be augmented if launched in a powerful AI platform.
Putting it in a nutshell- the answer to replacing to dog in testing lies in harnessing the power of Big Data using cognitive computing and Read Across strategies on a powerful AI / machine learning platform.
The project was a brilliant success and completed in 2022. My grateful thanks to the brilliant team who made this project a success – Payel Das* (Head of Project ), Bhanushee Sharma** ( Research Scholar ), Enara Vijil*, Amit Dhurandhar*, Jim Hendler** and Jonathan Dordick ** . (*IBM Watson Research Laboratory, New York; ** Rensselaer Polytechnic Institute, New York.
1.Toxicities predicted in vitro or in vivo are not necessarily in concordance with each other nor to humans, thus reducing their ability to predict clinical human toxicity. Hence toxicity remains a major driver in drug candidate failure in drug development, resulting in the high cost of drugs that make it into the market.
The granularity of toxicity tests varies across the in vitro (in cells), in vivo (in animals), and clinical platforms (in humans). In vitro testing is the most granular and captures the ability of a chemical to disrupt biological pathways at the cellular level. In contrast, in vivo /clinical testing are coarse-grained and captures the interactions of chemicals at multiple levels in animals /humans, but at the organ and tissue levels. Thus, ML models trained on in vitro and in vivo data might not reliably capture clinical toxicity.
Despite toxicity being a multi-task problem, ML (Machine Learning) models have predicted toxicity in each platform separately with single-task models. A single molecule can demonstrate simultaneously a multitude of responses in different assays and different living organisms.Various solutions for modeling multiple toxic endpoints have been reported, by creating separate binary classification models for each endpoint or by using multiple classification models that define classes. Yet, thus far, the multiplicity problem has been modeled by predicting multiple endpoints within the same testing platform: in vitro, in vivo, or in humans, separately.
Hence, we realized the need to develop an Explainable Machine learning (XML) model that simultaneously models- in vivo, invitro and clinical toxicity data. Thus a Multi Task Deep Neural Network (MTDNN) model was developed with an ability to predict clinical human toxicity accurately.
2.Morgan finger prints and pretrained SMILES embeddings (SE) were used. SMILES as an input performed better. SE were created in-house using a neural network-based translation model that translates from non-canonical SMILES to canonical SMILES, there by encoding for the relationships between the chemicals. To obtain a base model for the relationships between chemicals, the translation model was trained on 103 million chemicals in Pubchem58 and 35 million chemicals in ZINC-1259. The trained translation model was then applied to the chemicals present in the in vitro, in vivo, and clinical datasets to obtain their SE.
3.The datasets used were TOX21 (for in vitro- 1491 drugs), RTECS (for in vivo- 42,639 chemicals – (Acute oral toxicity - LD50) and ClinTOX (for human clinical toxicity- 8014 molecules).
4.We found that this MTDNN model can predict accurately toxicity for all endpoints – in vivo, invitro and clinical as a different task within one model while comparing to singe task and transfer learning counter parts.
5.We evaluated the performance of our framework by metrics of AUC-ROC (Area under Receiver Operating Characteristic Curve) and Balanced Accuracy. We compared the performance of STDNN to MTDNN with either SMILES embeddings or Morgan fingerprints as input. Performance of different platform combinations in the MTDNN was contrasted by combining all three in vivo, in vitro and clinical platforms, clinical and in vitro platforms, clinical and in vivo platforms and in vitro and in vivo platforms.
6. Skewed datasets are a prevalent problem in predictive toxicology. Regardless of the platform, the distribution of toxic and nontoxic examples is often imbalanced. Within our datasets, the imbalance is biased towards the "nontoxic" class in ClinTox and Tox21, and the "toxic" class in RTECS. This biases the AUC-ROC values towards a small fraction of true toxic or true nontoxic predictions. Hence another metric, Balanced Accuracy, takes into account this imbalance and has been used as a more representative metric for predictive toxicology models. Balanced accuracy averages the ‘sensitivity’ and the ‘specificity’.
7.To increase trustworthiness and provide confidence to explain the model’s prediction a post hoc CEM (Contrastive Explanations Method) was adapted that returns Pertinent Positives (PP) and Pertinent Negatives (PN) which in turn correspond to known toxicophores.
The CEM more comprehensively explains DNN predications by identifying present (PP) and absent ( PN) substructures within the molecules that corelate to a prediction.
E.g. For a molecule predicted to be toxic the PP sub structures are the minimum and necessary substructures within the molecule that corelates to the ‘toxic’ prediction. Conversely the PN substructures --missing from the molecule – are those that when added converts the ‘toxic’ prediction to ‘non- toxic’.
We obtained all the PP and PN sub structures for all the molecules used from RTECS, ClinTOX and TOX21. For each molecule, ten PP and ten PN substructures were collected, totaling 94,240 PP and PN substructures.
8.To discern whether the CEM correctly pinpoints PP and PNs substructures correlating to toxicity, we further matched all collected PP and PNs to known toxicophores. We expected to see a larger number of matches with PP and PN substructures correlating to toxicity (toxicophores), than to the converse (nontoxic substructures). Indeed, for ClinTox and Tox21, but not for RTECS, there are a larger number of matches to known toxicophores with CEM-derived toxicophores than with CEM-derived nontoxic substructures.
The literature contains a vast and diverse array of known toxicophores. Mutagenic toxicophores, in particular, have been widely used to verify results of computationally predicted toxicophores8. Here, we matched the toxicophores obtained from the CEM to known mutagenic toxicophores collected in vitro experimentally, or computationally, and known reactive substructures commonly used to filter molecules. The CEM was able to identify toxicophores across all these types of known toxicophores, both from PP substructures of correctly predicted toxic molecules and PN substructures of correctly predicted nontoxic molecules.
Interestingly and importantly (and not surprisingly) using our multi-task model and its transfer learning counterpart, we demonstrated the minimal relative importance of in vivo data to make accurate predictions of clinical toxicity The addition of in vivo data in the MTDNN or its transfer learning counterpart did not improve clinical toxicity predictions. Instead, the addition of in vitro data to clinical data was sufficient in improving the predictions of clinical toxicity by AUC-ROC.
10.The relative importance for in vivo data for predictions of clinical toxicity was assessed by training different combinations of platforms in the multi-task model, with or without in vivo data, and its transfer learning counterpart. To investigate whether the type of chemicals within the in vitro, in vivo, and clinical datasets affected the ability of in vivo data to predict clinical toxicity, we visualized the relationships between the chemicals using t-distributed stochastic neighbor embeddings (t-SNE). t-SNE, is a method that maps high-dimensional data to lower dimensions while preserving local similarities (i.e., distances between datapoints). We applied t-SNE mapping to SE of the chemicals in the Tox21, RTECS, and ClinTox datasets, with each dot representing a chemical and distance representing similarity. The map ( given below) is dominated by RTECS chemicals (green) due to the larger number of chemicals present in RTECS than both the Tox21 and ClinTox datasets. However, when examining overlap of the chemicals, the majority of the overlap is among ClinTox (purple) and Tox21 (red) chemicals, with some overlap between ClinTox (purple) and RTECS (green) chemicals. Thus, chemicals present in the clinical dataset (ClinTox) are more related to the chemicals present in the in vitro dataset (Tox21) rather than with the in vivo (RTECS) dataset.
Fig 1.t-SNE of SMILES embeddings of chemicals in the Tox21, RTECS, and ClinTox datasets. Distances correlate to similarities of the chemicals across these datasets; shorter the linear distance, the more similar are the chemicals. ClinTox chemicals overlap more with Tox21 chemicals than with RTECS.
11.Hence these results strongly suggest that there is a minimal relative importance of in vivo data for predicting clinical toxicity in particular when unsupervised pre-trained SMILES embeddings were used as an input to multi-task models; thus, providing possible guidance on what aspects of animal data cannot be considered in predicting clinical toxicity We further provided a more complete and consistent molecular explanation of the predicted toxicities across different platforms by analyzing the contrastive substructures present within a molecule.