Data Processing

Import and parse data from the Materials Project database.

  • Queries REST API and parses JSON objects obtained.

  • This file is a part of Personal Programming Project (PPP) coursework in Computational Materials Science (CMS) M.Sc. course in Technische Universität Bergakademie Freiberg.

  • This file is a part of the project titled Application of statistical learning to predict material properties.

class dataProcessing.DataProcessing(verbose=False, check_api=True)

Bases: object

To query and handle data obtained from the Materials Project database.

:meth:`get_api_validity` : To check the validity of API key provided.
:meth:`read_element_aiab_energy` : Method to parse the atom-in-a-box energies

for different elements provided in the database.

:meth:`get_element_aiab_energy` : Method that uses the data read to act as a getter

for atom-in-a-box energy of an element.

:meth:`collect_data_from_source` : Method that acts as a wrapper for all the other

methods present in this class. One specifies criteria and properties used to query the API and the data obtained is written to disk.

:meth:`read_from_file` : Method that acts as a setter for DATA when the data is

accessible on the disk, prevents unnecessary calls to API.

:meth:`query_property` : Method that uses the properties and criteria specified

to POST a query to the REST API of Materials Project.

:meth:`get_pymatgen_data` : Method that uses specimen obtained after querying to

add certain elemental properties that are easily accessible from pymatgen wrapper on a local level.

:meth:`eval_per_atom_data` : Method that computes cohesive energy of an atom as

well as their log volume per atom.

:meth:`get_elasticity_data` : Method that specifically isolates Voigt-Reuss-Hill

averages of bulk modulus (K) and shear modulus (G) from the data that is available on the disk.

:meth:`write_query_to_file` : Helper method to parse the JSON object received from

server to write to disk as JSON files.

collect_data_from_source(properties, criteria, file_name, max_components, log_file=None) None

Method that calls query and write methods with properties and criteria provided.

Parameters:
  • self (DataProcessing object)

  • properties (list) – List of properties being queried.

  • criteria (dict) – Dictionary consisting of criteria with which Materials Project database is queried.

  • file_name (str) – Prefix to file that stores data post response from query.

  • max_components (int) – Maximum number of unique specimen in an item.

  • log_file (str) – Name for the file which stores the logs of this method.

Return type:

None

eval_per_atom_data(file_name) int

Method to evaluate cohesive energy and volume per atom for a specimen.

Parameters:
  • self (DataProcessing object)

  • file_name (str) – Name of the file to save the data to.

Returns:

Integer status indicator.

Return type:

int

get_api_validity(MAPI_REQUEST='api_check') bool

Method that queries the Materials Project API to verify the validity of the API key configured in this project.

Parameters:
Returns:

Indicates the validity of response.

Return type:

bool

get_elasticity_data(max_components=None, source_file_stub=None, dest_file_stub=None) int

Method to filter out elasticity specific data and to write it to a file.

Parameters:
  • max_components (int) – Maximum number of components in a specimen.

  • source_file_stub (str) – Prefix of the file from which data is being sourced.

  • dest_file_stub (str) – Prefix of the file to which data is to be written to.

Returns:

  • int

  • A status indicator.

get_element_aiab_energy(entity)

Fetches atom-in-a-box energy for specified entity.

Parameters:

entity (str) – Formula or the id of the entity being queried.

Returns:

energy – Atom-in-a-box energy.

Return type:

double

get_pymatgen_data(file_name) int

Method to gather data available only through pymatgen.

  • IMPORTANT: The file must exist as this methods adds data to existent

JSON file.

Parameters:
  • self (DataProcessing object)

  • file_name (str) – Name of the file.

Returns:

A status code for reference.

Return type:

int

query_property(properties, criteria, file_name) int

Method to query using Materials API’s functionality for flexible queries. Refer https://docs.materialsproject.org/open-apis/the-materials-api/ Calls write_query_to_file().

Parameters:
  • properties (list) – List of properties to be queried. Refer MAPI documentation for supported properties.

  • criteria (dict) – Criteria specified for the query. Uses Mongo-like flexible queries. Refer MAPI documentation for supported syntax, or available criteria.

  • file_name (str) – Name of the file that stores data that is queried. This value is passed on to write_query_to_file()

Returns:

code – Status code to indicate different outcomes of the method.

Return type:

int

read_element_aiab_energy() None

Reads the atom-in-a-box energy and saves it to the dictionary specified from a pre-sourced file situated in data directory. Source: https://github.com/materialsproject/gbml

Parameters:

self (Object of Class DataProcessing)

Return type:

sets the energies attribute.

read_from_file(file) int

Reads data from provided file and sets the attribute.

Parameters:

file (str) – Name of the file.

Returns:

  • int

  • A status indicator.

write_query_to_file(data, file_name) None

Method to write the result of a query out of query_property() to a json file.

Parameters:
Return type:

None

File, if it doesn’t exist, consisting of queried data is created as a result of this method.