How did you get interested in science?
I developed an early interest in science and mathematics due to some excellent teachers throughout my early education, and a strong affinity for the idea that you could prove or disprove something definitively. This naive view could only survive up until the point of doing my own research, but was quickly replaced by an appreciation for all of the wonderful strangeness of chemistry. The central science seems determined to defy representation, and trying to wrestle it into something that can give you meaningful insights is as rewarding and essential as it is challenging.
During my Ph.D. one of the central questions I was asking of a system was ‘Is this a hydrogen bond?’. This simple question masks complicated ones, like what is a hydrogen bond anyway, or even any sort of bond for that matter. The paper we ended up publishing on the topic reads more like ‘These are observations about this system that are consistent with other systems people describe as hydrogen bonds.’ Describing systems in terms such as these are essential for science communication, because you can’t just dump the wavefunctions from your calculations into somebody else’s brain and have them fully understand the system.
In sum, my interest in chemistry is driven by the phrase ‘All models are wrong, some models are useful.’ Even if I got here hoping that rigorous science would yield definitive answers, I now embrace the challenge of finding the least wrong, most useful way of representing chemical systems to push science forward.
Tell us about the lab where you did this work.
Our lab, led by Dr. Stefano Forli, is at Scripps Research, and our building overlooks the beautiful Torrey Pines golf course, and Pacific ocean, in beautiful La Jolla, CA (my office does not have a window). We are the current maintainers of the AutoDock software suite, which is the most widely used open source docking software. Docking is the task of predicting affinities and binding modes of small molecules for proteins, and serves as an essential process in computer aided drug design. Our lab seeks to expand the capabilities of docking to more challenging protein targets and ligand chemistries, and to improve the efficiency and accuracy of the underlying tools.
What were the biggest challenges with this study?
The process of describing a new computational model involves a lot of discretion in the features to include in the model. Increasing model complexity increases both the difficulty of optimizing model performance, and the likelihood that you are overfitting your data, creating a model that will not generalize well to future observations. However, simple models can fail to capture the important features of a system. This study was particularly challenging on this front because data on covalent inhibitors is relatively sparse, with many possible reactive warheads but few observations for most classes. The model that we settled on has few parameters and generalizes well across these classes, and so we are happy with the result, but this is the greatest challenge at the outset of building any new model.
What are you working on now?
A large outstanding challenge in virtual screening is to better leverage the structure of the chemical library being screened to efficiently identify inhibitors, rather than needing to screen the entire library. This is increasingly important as library sizes grow, more drug-like combinatorial chemistries are being described, and as there is a greater focus on the environmental cost of large-scale computations.