能源储存是通过利用可再生能源来满足日益增长的能源需求的关键技术。液态电解质基锂离子电池已被广泛应用于便携式电子和电动汽车市场,而使用固态电解质的替代电池可以避免与有机液态基电解质相关的安全问题,并通过使用锂金属阳极来提供高能量密度。
Fig. 1 Distribution in room temperature conductivities for materials in the dataset.
然而,采用固态电解质的最大障碍是寻找具备特殊性能的固态材料,包括足够高离子电导率、对锂金属和氧化阴极材料稳定性以及适配的机械性能。目前已有大量的研究致力于发现和开发满足这些要求的固态电解质。最近,人们也开始使用已有的数据去训练机器学习模型,通过材料的成分来预测材料的离子导电性能。
Fig. 2 Distribution of room temperature conductivities across expert-curated structural families.
但这种方法受到了可用于训练模型数据质量和数量的限制。虽然自然语言处理任务可以访问数十亿个训练示例,但在实验材料科学中,即使是大型数据集也通常包含不到10,000个条目。由于这些训练集相对较小,必须使用最高质量的数据来避免向预测模型提供不准确的数据,但目前还没有大型的实验离子电导率数据库供人们进行机器学习研究。
Fig. 3 An embedding of the 127,638 unique compositions (grey) from the ICSD database (2021) with respect to ElMD similarity between compounds, embedded to 2 principle axes withPCA.
来自英国利物浦大学化学系的Cameron J. Hargreaves等人,构建了一个用于机器学习模型的锂固态电解质数据集,实现了基于成分的锂离子电导率预测。该数据集有从214个来源收集的820个条目;条目包含化学成分、专家指定的结构标签和特定温度下的离子电导率,其中403个化学成分具有接近室温的离子电导率。
Fig. 4 Embeddings of the 403 unique room temperature solid state electrolytes compositional data.
作者利用无监督嵌入和聚类技术,根据成分相似度将数据集划分为9个族,从而评估了数据集的多样性。监督统计(AutoSklearn)和深度学习(CrabNet)模型被应用于该数据集,实现了仅从元素成分来预测材料的离子电导率。
Fig. 5 Parity plots and error distribution for two control studies.
在不同的交叉验证机制下,作者用标准的统计指标对回归和分类模型进行了评估,特别是模型在预测新材料离子电导率方面的性能。他们发现,具有迁移学习的Crabnet在不同交叉验证下都表现出了最好的性能。这种分类器是一个实用的工具,可帮助实验人员确定优先考虑的候选锂离子导体以进行未来研究。该文近期发布于npj Computational Materials 9: 9 (2023).
Fig. 6 Parity plots and error distributions for three regression models.
Editorial Summary
Lithium solid electrolyte conductivities: A database for machine learning
Energy storage is a key technology to meet growing energy demand by harnessing renewable sources. Liquid electrolyte-based lithium ion batteries have been extensively deployed in the portable electronic and electric vehicle markets. Alternative batteries that utilize solid state electrolytes (SSEs) avoid the safety issues associated with organic liquid electrolytes and offer high energy density by enabling the use of a lithium metal anode. The most significant obstacle to the adoption of SSEs is the realization of solid-state materials with the full suite of required properties, including sufficiently high ionic conductivity, stability against both lithium metal and the oxidizing cathode material together with appropriate mechanical properties. As such, considerable research has been devoted to the discovery and development of SSEs that meet these requirements. Recent works have used previously published data to train machine learning models and predict the ionic conductivity performance of materials using only their composition. However, this approach is limited by the quality and quantity of the data available to train models. While natural language processing tasks have access to billions of training examples, in experimental materials science even large datasets typically contain fewer than 10,000 entries. Due to these comparatively small training sets, it is imperative that the highest quality data are used to avoid providing inaccurate data to predictive models. But there are no large repositories of experimental ionic conductivities currently available for solid lithium ion conductors to perform a machine learning investigation.
Cameron J. Hargreaves et al. from the Department of Chemistry, University of Liverpool, built a dataset of lithium SSEs for machine learning models to predict lithium ion conductivity based on composition. This dataset has 820 entries collected from 214 sources; entries contain a chemical composition, an expert-assigned structural label, and ionic conductivity at a specific temperature, with 403 unique compositions with an associated ionic conductivity near room temperature. Unsupervised embedding and clustering techniques were used to partition this dataset into nine families by compositional similarity, thus assessing the diversity of the dataset. Supervised statistical (AutoSklearn) and deep learning (CrabNet) models were applied to this dataset to predict the ionic conductivity of a material from its elemental composition alone. Regression and classification models were evaluated with standard statistical metrics under different cross-validation regimes to assess their performance at predicting the ionic conductivities of novel materials. The results showed that CrabNets with transfer learning demonstrate the best performance under both k-folds and LOCO cross-validation. This classifier is a practical tool to aid experimentalists in prioritizing candidates for further investigation as lithium ion conductors. This article was recently published in npj Computational Materials9: 9 (2023).
原文Abstract及其翻译
A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning (用机器学习评估实验测量的锂固体电解质电导率的数据库) Cameron J. Hargreaves, Michael W. Gaultois, Luke M. Daniels, Emma J. Watts, Vitaliy A. Kurlin, Michael Moran, Yun Dang, Rhun Morris, Alexandra Morscher, Kate Thompson, Matthew A. Wright, Beluvalli-Eshwarappa Prasad, Frédéric Blanc, Chris M. Collins, Catriona A. Crawford, Benjamin B. Duff, Jae Evans, Jacinthe Gamon, Guopeng Han, Bernhard T. Leube, Hongjun Niu, Arnaud J. Perez, Aris Robinson, Oliver Rogan, Paul M. Sharp, Elvis Shoko, Manel Sonni, William J. Thomas, Andrij Vasylenko,Lu Wang, Matthew J. Rosseinsky & Matthew S. Dyer.
AbstractThe application of machine learning models to predict material properties is determined by the availability of high-quality data. We present an expert-curated dataset of lithium ion conductors and associated lithium ion conductivities measured by a.c. impedance spectroscopy. This dataset has 820 entries collected from 214 sources; entries contain a chemical composition, an expert-assigned structural label, and ionic conductivity at a specific temperature (from 5 to 873?°C). There are 403 unique chemical compositions with an associated ionic conductivity near room temperature (15–35?°C). The materials contained in this dataset are placed in the context of compounds reported in the Inorganic Crystal Structure Database with unsupervised machine learning and the Element Movers Distance. This dataset is used to train a CrabNet-based classifier to estimate whether a chemical composition has high or low ionic conductivity. This classifier is a practical tool to aid experimentalists in prioritizing candidates for further investigation as lithium ion conductors.
摘要机器学习模型在预测材料性能方面的应用取决于高质量数据的可用性。我们提出了一个专家策划的数据集,包含锂离子导体及与其相关的由交流阻抗谱测量获得的锂离子电导率。该数据集有来自214个来源的820个条目;条目包含化学成分、专家指定的结构标签,以及特定温度(从5到873°C)下的离子电导率。在室温(15-35°C)附近,有403种独特的化学成分具有相关的离子电导率。数据集中包含的材料在无机晶体结构数据库报告的化合物中,具有无监督机器学习和元素移动距离。该数据集用于训练基于CrabNet的分类器,以估计化学成分是否具有高或低的离子电导率。这种分类器是一个实用的工具,可以帮助实验人员确定优先考虑的候选锂离子导体以进行未来研究。