Data Processing¶
Import and parse data from the Materials Project database.
Queries REST API and parses JSON objects obtained.
This file is a part of Personal Programming Project (PPP) coursework in Computational Materials Science (CMS) M.Sc. course in Technische Universität Bergakademie Freiberg.
This file is a part of the project titled Application of statistical learning to predict material properties.
- class dataProcessing.DataProcessing(verbose=False, check_api=True)¶
Bases:
objectTo query and handle data obtained from the Materials Project database.
- :meth:`get_api_validity` : To check the validity of API key provided.
- :meth:`read_element_aiab_energy` : Method to parse the atom-in-a-box energies
for different elements provided in the database.
- :meth:`get_element_aiab_energy` : Method that uses the data read to act as a getter
for atom-in-a-box energy of an element.
- :meth:`collect_data_from_source` : Method that acts as a wrapper for all the other
methods present in this class. One specifies criteria and properties used to query the API and the data obtained is written to disk.
- :meth:`read_from_file` : Method that acts as a setter for DATA when the data is
accessible on the disk, prevents unnecessary calls to API.
- :meth:`query_property` : Method that uses the properties and criteria specified
to POST a query to the REST API of Materials Project.
- :meth:`get_pymatgen_data` : Method that uses specimen obtained after querying to
add certain elemental properties that are easily accessible from pymatgen wrapper on a local level.
- :meth:`eval_per_atom_data` : Method that computes cohesive energy of an atom as
well as their log volume per atom.
- :meth:`get_elasticity_data` : Method that specifically isolates Voigt-Reuss-Hill
averages of bulk modulus (K) and shear modulus (G) from the data that is available on the disk.
- :meth:`write_query_to_file` : Helper method to parse the JSON object received from
server to write to disk as JSON files.
- collect_data_from_source(properties, criteria, file_name, max_components, log_file=None) None¶
Method that calls query and write methods with properties and criteria provided.
- Parameters:
self (DataProcessing object)
properties (list) – List of properties being queried.
criteria (dict) – Dictionary consisting of criteria with which Materials Project database is queried.
file_name (str) – Prefix to file that stores data post response from query.
max_components (int) – Maximum number of unique specimen in an item.
log_file (str) – Name for the file which stores the logs of this method.
- Return type:
None
- eval_per_atom_data(file_name) int¶
Method to evaluate cohesive energy and volume per atom for a specimen.
- Parameters:
self (DataProcessing object)
file_name (str) – Name of the file to save the data to.
- Returns:
Integer status indicator.
- Return type:
int
- get_api_validity(MAPI_REQUEST='api_check') bool¶
Method that queries the Materials Project API to verify the validity of the API key configured in this project.
- Parameters:
self (Object of DataProcessing)
MAPI_REQUEST (str) – The request being implemented in this method, obtained from Materials Project API documentation. Refer: https://docs.materialsproject.org/open-apis/the-materials-api/
- Returns:
Indicates the validity of response.
- Return type:
bool
- get_elasticity_data(max_components=None, source_file_stub=None, dest_file_stub=None) int¶
Method to filter out elasticity specific data and to write it to a file.
- Parameters:
max_components (int) – Maximum number of components in a specimen.
source_file_stub (str) – Prefix of the file from which data is being sourced.
dest_file_stub (str) – Prefix of the file to which data is to be written to.
- Returns:
int
A status indicator.
- get_element_aiab_energy(entity)¶
Fetches atom-in-a-box energy for specified entity.
- Parameters:
entity (str) – Formula or the id of the entity being queried.
- Returns:
energy – Atom-in-a-box energy.
- Return type:
double
- get_pymatgen_data(file_name) int¶
Method to gather data available only through pymatgen.
IMPORTANT: The file must exist as this methods adds data to existent
JSON file.
- Parameters:
self (DataProcessing object)
file_name (str) – Name of the file.
- Returns:
A status code for reference.
- Return type:
int
- query_property(properties, criteria, file_name) int¶
Method to query using Materials API’s functionality for flexible queries. Refer https://docs.materialsproject.org/open-apis/the-materials-api/ Calls
write_query_to_file().- Parameters:
properties (list) – List of properties to be queried. Refer MAPI documentation for supported properties.
criteria (dict) – Criteria specified for the query. Uses Mongo-like flexible queries. Refer MAPI documentation for supported syntax, or available criteria.
file_name (str) – Name of the file that stores data that is queried. This value is passed on to
write_query_to_file()
- Returns:
code – Status code to indicate different outcomes of the method.
- Return type:
int
- read_element_aiab_energy() None¶
Reads the atom-in-a-box energy and saves it to the dictionary specified from a pre-sourced file situated in data directory. Source: https://github.com/materialsproject/gbml
- Parameters:
self (Object of Class DataProcessing)
- Return type:
sets the energies attribute.
- read_from_file(file) int¶
Reads data from provided file and sets the attribute.
- Parameters:
file (str) – Name of the file.
- Returns:
int
A status indicator.
- write_query_to_file(data, file_name) None¶
Method to write the result of a query out of
query_property()to a json file.- Parameters:
data (dict) – The resultant data dictionary of the
query_property(),get_pymatgen_data(), oreval_per_atom_data()methods.file_name (str) – Name of the file to be created as a string. File will be placed in directory as specified in
DATA_PATHattribute.
- Return type:
None
File, if it doesn’t exist, consisting of queried data is created as a result of this method.