This notebook demonstrates how to predict structures using the built-in structure_prediction package in pymatgen. We will be gathering all possible structures (via the Materials API) of the chemical systems containing the highest probability specie substitutions for our original species. We will then resubstitute the original species back into these structures, filter out duplicates as well as preexisting structures already on the Materials Project, and output the newly predicted structures.

Written using:

Author: Matthew McDermott (09/25/18)

Here we define two variables -- threshold for the threshold probability in making substitution/structure predictions, and num_subs for how many substitutions you wish to explore:

Finding highest probability specie substitutions

In this section, we use the SubstitutionPredictor to predict likely specie substitutions using a data-mined approach from ICSD data. This does not yet calculate probable structures -- only which species are likely to substitute for the original species you input. The substitution prediction methodology is presented in: Hautier, G., Fischer, C., Ehrlacher, V., Jain, A., and Ceder, G. (2011) Data Mined Ionic Substitutions for the Discovery of New Compounds. Inorganic Chemistry, 50(2), 656-663. doi:10.1021/ic102031h

Predict most common specie substitutions, sort by highest probability, and take the number of substitutions specified by num_subs:

Create a new list of just the substituted specie combinations:

Create a set of strings of each unique chemical system (elements separated by dashes):

Finding all structures for new chemical systems via Materials API

Create a new dictionary and populate it with all structures for each chemical system:

Now create a new dictionary of all structures (with oxidation states) for each chemical system:

Substitute original species into new structures

Now create a new dictionary trans_structures populated with predicted structures made up of original species. Note: these new predicted structures are TransformedStructure objeects:

Filter duplicate structures using StructureMatcher:

NOTE: The chemical systems to which the filtered structures are assigned might change when re-running the program. Since we are filtering for duplicates across chemical systems, either of the two systems may be reported in the filtered dictionary. Which of the two systems it is simply depends on the order in that the filter algorithm follows (and it's reading from a naturally unordered dictionary!)

Now we wish to run one more filter to remove all duplicate structures already accessible on the Materials Project.

Create final structure dictionary with StructureNL objects for each transformed structure (Note: this requires installation of pybtex):