Developing a linear discriminant analysis model for screening pharmaceutical compounds with hERG inhibitory activity (cardiotoxicity) and using the model to screen CAS antiviral database to identify compounds with cardiotoxicity potential
In silico prediction of cardiotoxicity with high sensitivity and specificity for potential drug molecules would be of immense value. Hence, building a linear discriminant analysis (LDA) model, or alternatively classification based machine learning models, capable of efficiently predicting cardiotoxicity will be critical. A data set of diverse pharmaceutical compounds with hERG channel inhibitory activity (blocker/non-blocker) is provided. The SMILES notations of all compounds are given. The activity of a few compounds is blinded. The remaining set of compounds should be divided into a training set and a test set using 70:30 ratios. Simple, reproducible and easily transferable linear discriminant analysis models should be developed from the training set compounds using only 2D descriptors. The models should be validated based on the test set compounds. The models should have (preferably) the following quality: Wilks Lambda 0.6, CCR > 0.6, AUC_ROC > 0.6. Using the best model, the activity of the blinded set should be determined. Care should also be taken to consider the applicability domain(AD) while predicting the external/blinded compounds. The best model should also be used to classify CAS antiviral database compounds for hERG channel inhibitory activity and a list of compounds with cardiotoxicity potential should be generated (along with AD information). Additional Comments: i) The participants may explore new methods of feature/descriptor selection, new methods of selection of appropriate training set compounds, new methods of defining applicability domain. ii) The participants may collect from the literature new data with the experimental activity values and show the applicability of the developed model, and iii) The participants may explore the response activity or any other related measure computed from another method for the prioritized compounds and show consistency with the predicted values from the developed model. A consensus will further prove the reliability of the developed model
Login to Download Input Form