Application of statistical learning to predict material properties
Demonstrationsprojekt zur Konstruktion einer Gradient-Boosting-Maschine auf Basis der lokalen Regression für die Vorhersage von Materialeigenschaften. This is a programming project to demonstrate that machine learning approaches could be used to harvest traits present in some data to aid materials research. Prima facie, properties like the elastic constants, i.e., the bulk and shear moduli, could be used to correlate with wide range of materials and their characteristic properties. Theoretical models limit themselves to specific subsets. This study attempts to construct a locally weighted regression based gradient boosting machine to predict material properties. The idea is to construct a statistical learning model that utilizes gradient boosting machine framework with a multivariate local polynomial regression base-learner to predict material properties such as bulk (K) and shear (G) moduli. The objectives of this programming project, therefore, were established: Materials Genome Initiative, founded in 2011 in the United States of America (USA) and touted to be an approach to promote collaboration between various disciplines, had focused on developing frameworks to accelerate discovery and deployment of advanced materials. One such projects undertaken by MGI is The Materials Project. It serves an open dataset that is accesible through multiple channels. The Materials API is a component of the materials project. It serves the required data over a REST API. The ratio of infinitesimal increase in pressure to its relative decrease of volume describes the bulk modulus of a substance, whereas shear modulus, also known as modulus of rigidity, provides a measure of shear stiffness of a substance through the ratio of shear stress to shear strain. This project focuses on Voigt-Reuss-Hill averages for the bulk and shear moduli. It is an empirical average has been deemed to provide a better represetation of the bulk and shear modulus of polycrystalline materials. The empirical averages of Voigt-Reuss-Hill average values are given by the equation: $$ \begin{equation} 2K_{VRH} = (K_V+K_R) \end{equation} $$ $$ \begin{equation} 2G_{VRH} = (G_V+G_R) \end{equation} $$ This project could be broadly divided into two areas. The workflow in this programming project could be summarized as follows: This programming project succeeeds in learning the VRH averages of elastic bulk and shear moduli. The implementation targeted learning of $\log(K)$ and $\log(G)$ within a gradient boosting machine framework using multivariate local polynomial as the weak learner. Fig. 1: Comparison of predictions of shear modulus and bulk modulus for compounds including quarternary systems. Overview
Solution
Implementation
Python and NumPy, including an accelerated local regression kernel using Cython, with an emphasis on OOP. Results

