To develop a regression based QSPR model for Caco-2 cell permeability.
A data set of pharmaceutical compounds with Caco-2 cell permeability (logPapp) is provided. The SMILES notation of all compounds are given. The permeability values of a few compounds are blinded. The rest set of compounds should be divided into a training set and a test set using 70:30 ratio. Simple, reproducible and easily transferable regression based QSPR models should be developed from the training set compounds using only 2D descriptors following the OECD guidelines. The models should be validated based on the test set compounds. The models should have at least the following quality: R2 > 0.6, LOO-Q2 > 0.6, Q2ext_F1 > 0.6, Tropsha’s criteria: Pass, MAE based criteria: Moderate or good. Using the best model, the activity of the blinded set should be determined. Care should also be taken to consider the applicability domain while predicting the external/blinded compounds. The best model should also be used to rank CAS antiviral database compounds for Caco-2 cell permeability and the top ranking 100 compounds should be listed (along with AD information). Additional Comments: i) The participants may explore new methods of feature/descriptor selection, new methods of selection of appropriate training set compounds, new methods of defining applicability domain, ii) The participants may collect from the literature new data with the experimental activity/property values and show the applicability of the developed model and iii) The participants may explore the response property or any other related measure computed from another method for the prioritized compounds and show consistency with the predicted values from the developed model. A consensus will further prove reliability of the developed modelLogin to Download Input Form