Date:
Type:
Publication:
Author(s):
Over the past few years, the development of Machine Learning Interatomic Potentials (MLIPs) has been dominated by a brute-force philosophy: heavier models trained on hundreds of millions of density functional theory (DFT) calculations. Yet, a large portion of foundational datasets represent idealized, 0K, pristine crystal facets resting at perfect equilibrium. Real chemistry does not happen in a pristine vacuum. Catalysts degrade, reconstruct, and rely on defects, vacancies, and promoters to function. Similarly, solid-state battery electrolytes experience extreme strain, thermal fluctuations, and soft lattice dynamics that routinely break standard universal models. When general-purpose models are exposed to highly distorted, elevated-temperature regimes, they confidently fail.
This talk presents a departure from brute-force sampling by demonstrating how to synergize highly diverse, lower-fidelity data with deliberately curated, higher-fidelity data. By applying multifidelity co-training alongside explicit feature-wise conditioning, we can map extreme physical regimes without inducing catastrophic forgetting.I will highlight this methodology through two recent developments. First, I will outline the transition from the AQCat25 to the AQCat26 dataset, which expands its domain of applicability by orders of magnitude to inject realistic physical complexity into catalytic training distributions. Furthermore, I will introduce AQVolt26, a dataset engineered to capture the highly anharmonic configurational landscape of solid-state electrolytes. Ultimately, this work illustrates that accelerating the discovery of next-generation materials requires more than just scaling up parameters. The discipline demands targeted data, explicitly designed to confront the dynamic reality of chemical engineering.