Category: T2 - General Drug Discovery, Including COVID PS ID : DDT2-07

ML model to predict small molecule clinical trial success probability by phase

Clinical trials are the most expensive phase of bringing a drug to market with an extremely high rate of attrition and anticipating success or failure is key. This challenge will be to develop a software to a) Construct an automated pipeline in python or similar to download and update data from clinicaltrials.gov b) make it available for searching in a mySQL database and c) build a Bayesian/ML model to assign a classification & probability of passing each of phases I, II & III. Prior to building predictive models, clinical trials that fail due to any reason other than toxicity, inadequate efficacy will be removed, in particular ones that fail to recruit patients or have 'unknown' reasons. Features used to build the model will contain all relevant columns in the database and features derived therefrom including inclusion-exclusion criteria, meta-data of the study, sites, investigators as also molecular properties. It's expected that the model will learn the relationship between success in earlier phases and later ones. Machine learning models that are simple and interpretable should be used. Success criteria is a valid model that demonstrates classification accuracy or probabilities on data from ongoing trials and the ones for which the reason for termination is unknown at clinicaltrials.gov for small molecule antiviral drugs. An automated pipeline for data download , model update and database storage are highly desirable.

Login to Download Input Form