This notebook demonstrates how to predict structures using the built-in structure_prediction package in pymatgen. We will be gathering all possible structures (via the Materials API) of the chemical systems containing the highest probability specie substitutions for our original species. We will then resubstitute the original species back into these structures, filter out duplicates as well as preexisting structures already on the Materials Project, and output the newly predicted structures.

Written using:

  • pymatgen==2018.9.19

Author: Matthew McDermott (09/25/18)

In [1]:
# Imports we need for running structure prediction

from pymatgen.analysis.structure_prediction.substitutor import Substitutor 
from pymatgen.analysis.structure_prediction.substitution_probability import SubstitutionPredictor
from pymatgen.analysis.structure_matcher import StructureMatcher, ElementComparator
from pymatgen.transformations.standard_transformations import AutoOxiStateDecorationTransformation
from pymatgen import Specie, Element
from pymatgen import MPRester
from pprint import pprint
In [2]:
# Establish rester for accessing Materials API
mpr = MPRester(api_key='#######') # INSERT YOUR OWN API KEY

Here we define two variables -- threshold for the threshold probability in making substitution/structure predictions, and num_subs for how many substitutions you wish to explore:

In [3]:
threshold = 0.001 #threshold for substitution/structure predictions
num_subs = 10 # number of highest probability substitutions you wish to see

Finding highest probability specie substitutions

In this section, we use the SubstitutionPredictor to predict likely specie substitutions using a data-mined approach from ICSD data. This does not yet calculate probable structures -- only which species are likely to substitute for the original species you input. The substitution prediction methodology is presented in: Hautier, G., Fischer, C., Ehrlacher, V., Jain, A., and Ceder, G. (2011) Data Mined Ionic Substitutions for the Discovery of New Compounds. Inorganic Chemistry, 50(2), 656-663. doi:10.1021/ic102031h

In [4]:
original_species = [Specie('Y',3), Specie('Mn',3), Specie('O',-2)] # List of original species for substituting into

Predict most common specie substitutions, sort by highest probability, and take the number of substitutions specified by num_subs:

In [5]:
subs = SubstitutionPredictor(threshold=threshold).list_prediction(original_species)
subs.sort(key = lambda x: x['probability'], reverse = True)
subs = subs[0:num_subs]
[{'probability': 0.013371287006021255,
  'substitutions': {Specie Y3+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.006955965572918707,
  'substitutions': {Specie Na+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.005315460487213551,
  'substitutions': {Specie Mn3+: Specie Mn3+,
                    Specie Re5+: Specie Y3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.004872512156006512,
  'substitutions': {Specie Y3+: Specie Y3+,
                    Specie Fe3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.004742382503118596,
  'substitutions': {Specie Y3+: Specie Y3+,
                    Specie Sc3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.004621732557387764,
  'substitutions': {Specie Yb3+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.004484870617262718,
  'substitutions': {Specie K+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.0043959763085879905,
  'substitutions': {Specie Ho3+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.004293789374843302,
  'substitutions': {Specie Nd3+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}},
 {'probability': 0.004225888976078515,
  'substitutions': {Specie Er3+: Specie Y3+,
                    Specie Mn3+: Specie Mn3+,
                    Specie O2-: Specie O2-}}]

Create a new list of just the substituted specie combinations:

In [6]:
trial_subs = [list(sub['substitutions'].keys()) for sub in subs]
[[Specie Y3+, Specie Mn3+, Specie O2-],
 [Specie Na+, Specie Mn3+, Specie O2-],
 [Specie Re5+, Specie Mn3+, Specie O2-],
 [Specie Y3+, Specie Fe3+, Specie O2-],
 [Specie Y3+, Specie Sc3+, Specie O2-],
 [Specie Yb3+, Specie Mn3+, Specie O2-],
 [Specie K+, Specie Mn3+, Specie O2-],
 [Specie Ho3+, Specie Mn3+, Specie O2-],
 [Specie Nd3+, Specie Mn3+, Specie O2-],
 [Specie Er3+, Specie Mn3+, Specie O2-]]

Create a set of strings of each unique chemical system (elements separated by dashes):

In [7]:
elem_sys_list = [[specie.element for specie in sub] for sub in trial_subs]

chemsys_set = set()
for sys in elem_sys_list:

Finding all structures for new chemical systems via Materials API

Create a new dictionary and populate it with all structures for each chemical system:

In [8]:
all_structs = {}
for chemsys in chemsys_set:
    all_structs[chemsys] = mpr.get_structures(chemsys) # Getting all structures -- this can take a while!
In [9]:
auto_oxi = AutoOxiStateDecorationTransformation() # create object to determine oxidation states at each lattice site

Now create a new dictionary of all structures (with oxidation states) for each chemical system:

In [10]:
oxi_structs = {}

for chemsys in all_structs:
    oxi_structs[chemsys] = []
    for num, struct in enumerate(all_structs[chemsys]):
            oxi_structs[chemsys].append({'structure': auto_oxi.apply_transformation(struct), 
                                         'id': str(chemsys + "_" + str(num))})
            continue # if auto oxidation fails, try next structure

Substitute original species into new structures

Now create a new dictionary trans_structures populated with predicted structures made up of original species. Note: these new predicted structures are TransformedStructure objeects:

In [11]:
sbr = Substitutor(threshold = threshold) # create a Substitutor object with structure prediction threshold
In [12]:
trans_structs = {}

for chemsys in oxi_structs:
    trans_structs[chemsys] = sbr.pred_from_structures(original_species,oxi_structs[chemsys])

Filter duplicate structures using StructureMatcher:

In [13]:
sm = StructureMatcher(comparator=ElementComparator(),primitive_cell=False) # create object for structure matching
In [14]:
filtered_structs = {} # new filtered dictionary
seen_structs = [] # list of all seen structures, independent of chemical system

print("Number of entries BEFORE filtering: " + str(sum([len(sys) for sys in trans_structs.values()])))

for chemsys in trans_structs:
    filtered_structs[chemsys] = []
    for struct in trans_structs[chemsys]:
        found = False
        for struct2 in seen_structs:
            if, struct2.final_structure):
                found = True
        if not found:
print("Number of entries AFTER filtering: " + str(sum([len(sys) for sys in filtered_structs.values()])))
Number of entries BEFORE filtering: 16
Number of entries AFTER filtering: 6

NOTE: The chemical systems to which the filtered structures are assigned might change when re-running the program. Since we are filtering for duplicates across chemical systems, either of the two systems may be reported in the filtered dictionary. Which of the two systems it is simply depends on the order in that the filter algorithm follows (and it's reading from a naturally unordered dictionary!)

Now we wish to run one more filter to remove all duplicate structures already accessible on the Materials Project.

In [15]:
known_structs = mpr.get_structures("Y-Mn-O") # get all known MP structures for original system
In [16]:
final_filtered_structs = {}
print("Number of entries BEFORE filtering against MP: " + str(sum([len(sys) for sys in filtered_structs.values()])))

for chemsys in filtered_structs:
    final_filtered_structs[chemsys] = []
    for struct in filtered_structs[chemsys]:
        found = False
        for struct2 in known_structs:
            if, struct2):
                found = True
        if not found:
print("Number of entries AFTER filtering against MP: " + str(sum([len(sys) for sys in final_filtered_structs.values()])))
Number of entries BEFORE filtering against MP: 6
Number of entries AFTER filtering against MP: 1
{'Er-Mn-O': [],
 'Ho-Mn-O': [],
 'K-Mn-O': [],
 'Na-Mn-O': [],
 'Nd-Mn-O': [],
 'Re-Mn-O': [],
 'Y-Fe-O': [<pymatgen.alchemy.materials.TransformedStructure object at 0x1271dea90>],
 'Y-Mn-O': [],
 'Y-Sc-O': [],
 'Yb-Mn-O': []}

Create final structure dictionary with StructureNL objects for each transformed structure (Note: this requires installation of pybtex):

In [17]:
final_structs = {}
for chemsys in final_filtered_structs:
    final_structs[chemsys] = [struct.to_snl([{"name":"Matthew McDermott", "email":"N/A"}]) 
                              for struct in final_filtered_structs[chemsys]]

#pprint(final_structs['Y-Fe-O'][0].as_dict()) # Printing one of the StructureNL objects - this is a large dictionary!
/Users/mcdermott/miniconda3/envs/dev/lib/python3.7/site-packages/pymatgen/alchemy/ UserWarning: Data in TransformedStructure.other_parameters discarded during type conversion to SNL
  warn('Data in TransformedStructure.other_parameters discarded '