A computational pipeline to predict Drug Induced Liver Injury (DILI).

Problem Statements

Access to Virual Tool Room

Category: T2 - General Drug Discovery, Including COVID PS ID : DDT2-14

A computational pipeline to predict Drug Induced Liver Injury (DILI).

Adverse drug reactions (ADRs) are a major threat to the development of novel drugs and their therapeutic use. A particular class of ADRs is drug induced liver injury (DILI), encompassing ADRs that cause liver damage. More than 700 drugs have been found to be associated with liver injury. Drug-induced liver injury (DILI) has been the single most frequent cause of safety-related drug marketing withdrawals for the past 50 years (e.g., iproniazid, etc.) continuing to the present (e.g., ticrynafen, benoxaprofen, bromfenac, troglitazone, nefazodone). Aim: Build a pipeline/model or use scripting language like Python to predict toxic effects based on the chemical structure of known DILI drugs. The predictive model should be a binary classifier, which is able to predict active/inactive class with confidence scores. The minimum sensitivity and specificity of the models should be above 70%. Besides, that AUC-ROC value should be above 70% too. Often datasets are imbalanced, it is important how do you handle this majority/minority class biases while building your model. The models will be evaluated based on their performance on both training set and external dataset, as well as compared to the published models (see dataset). i) Dataset: the DILI training set will be used to develop a machine learning model to predict activity ii) Calculate different fingerprints, molecular descriptors, metabolically labile groups for your training set. iii) Calculate different properties/descriptors for your training set molecules. iii) Build a cross-validation pipeline, e.g. 10-fold cross-validation. iv) Build a model using any classification algorithms of your choice v) Evaluate the Predictive model using an independent test set and obtain a ROC curve/ AUC. What is the AUC, sensitivity, specificity, balanced-accuracy, F-measure of your best model?

Dataset Supporting Data Guidelines