About me
I am a software developer at BenevolentAI, where I work alongside the computer-aided drug design (CADD) team, developing tools that they use in their day-to-day work. Previously, I was a senior research software engineer based within the Research Computing Service of Imperial College London. I’m originally from Bournemouth, UK, and studied chemistry at the University of Bath. Throughout my PhD I developed a passion for software engineering and data science, and I now work on a range of projects to enable research across the college.
Software
I joined the central research software engineering team at Imperial College London in August 2021 and worked with researchers from all departments to produce performant and reliable software to tackle cutting-edge research challenges. I have worked on a range of projects making use of various technologies to design GUIs, webapps, APIs and data pipelines. A number of projects included a large infrastructure element, enabling me to build research tools on college virtual machines and in the cloud.
In March 2023 I joined BenevolentAI’s CADD team, where my focus has been the development of sustainable, performant cheminformatics and computational chemistry software, and its deployment in the cloud.
Research Background
My background is in computational chemistry. I completed an MChem at the University of Bath and stayed on to carry out an MRes, then PhD on “Materials discovery using chemical heuristics and high-throughput calculations” under the supervision of Prof. Aron Walsh. I was awarded a Doctoral Prize Fellowship at Imperial College London before moving to the chemistry department of UCL within the group of Prof. David Scanlon.
Searching the materials hyperspace
My research focused on the discovery of new inorganic materials for clean energy. Application areas included solar absorbers, transparent conductors, thermoelectrics, photocatalysts and battery cathodes.
A typical computational materials screening process is: 1. generate a sensible, and often very large, search space of hypothetical compositions or compounds (see the SMACT code) 2. use chemical heuristics, machine learning, or other data-driven filters to significantly narrow down the search space 3. subject top candidates to accurate first-principles methods to predict their properties. I developed models and tools that help with all three steps.
Materials data and machine learning
Trying to predict stable compounds with target properties using existing materials data is an enjoyable challenge. Traditionally, this has been done using chemical heuristic, which certainly come in useful, but more recently I have relied more upon supervised machine learning models.
A combination of python packages (pandas, scikit-learn, pymatgen, smact…) and a selection of queryable databases of materials properties (The Materials Project, The Computational Materials Repository…) usually get the job done. I have also been fortunate to collaborate with experimental groups who can make our imagined compounds with awesome properties a reality.
For a wider overview, you can read our quick-start guide on Machine Learning for Molecular and Materials Science, which appeared in Nature in 2018.