Machine intelligence design and development of main protease inhibitors drugs.
The sequence identity of the COVID19 protease and that of SARS-CoV is high, hence by using the known SARS-CoV protease drugs generate possible drugs using machine learning methods to generate novel drug like candidates. Use a variational autoencoder with SMILES representation to generate novel molecules from the trained continuous latent space. The novel molecules shall be assessed based on SMART pattern matching, to rule out unlikely molecular structures e.g structures with multiple 3,4,7,8 membered rings, purely acyclic compounds, highly branched ones etc. Furthermore, diversity based on PubChem / MACCS fingerprints (tanimoto coefficients <= 0.60) shall also be assessed to make sure generated structures are significantly dissimilar to training set. Use multiple SARS-CoV-2 protease conformation to screen of generated chemicals using ensemble molecular docking protocol at both allosteric and orthosteric sites. Retain chemicals with best docking scores and binding energies as per the guidelines. Shortlist the top 100 or top 25% (max 100) hits and refine the best 10 using MD simulation without any restraints.