This notebook demonstrates how to predict structures using the built-in structure_prediction package in pymatgen. We will be gathering all possible structures (via the Materials API) of the chemical systems containing the highest probability specie substitutions for our original species. We will then resubstitute the original species back into these structures, filter out duplicates as well as preexisting structures already on the Materials Project, and output the newly predicted structures.
Author: Matthew McDermott (09/25/18)
# Uncomment the subsequent lines in this cell to install dependencies for Google Colab.
# !pip install pymatgen==2022.7.19
# Imports we need for running structure prediction
from pprint import pprint
from pymatgen.analysis.structure_matcher import ElementComparator, StructureMatcher
from pymatgen.analysis.structure_prediction.substitution_probability import (
SubstitutionPredictor,
)
from pymatgen.analysis.structure_prediction.substitutor import Substitutor
from pymatgen.core.periodic_table import Specie
from pymatgen.ext.matproj import MPRester
from pymatgen.transformations.standard_transformations import (
AutoOxiStateDecorationTransformation,
)
# Establish rester for accessing Materials API
mpr = MPRester(api_key="#######") # INSERT YOUR OWN API KEY
Here we define two variables -- threshold for the threshold probability in making substitution/structure predictions, and num_subs for how many substitutions you wish to explore:
threshold = 0.001 # threshold for substitution/structure predictions
num_subs = 10 # number of highest probability substitutions you wish to see
In this section, we use the SubstitutionPredictor to predict likely specie substitutions using a data-mined approach from ICSD data. This does not yet calculate probable structures -- only which species are likely to substitute for the original species you input. The substitution prediction methodology is presented in: Hautier, G., Fischer, C., Ehrlacher, V., Jain, A., and Ceder, G. (2011) Data Mined Ionic Substitutions for the Discovery of New Compounds. Inorganic Chemistry, 50(2), 656-663. doi:10.1021/ic102031h
original_species = [
Specie("Y", 3),
Specie("Mn", 3),
Specie("O", -2),
] # List of original species along with their specified oxidation states for substituting into
Predict most common specie substitutions, sort by highest probability, and take the number of substitutions specified by num_subs:
subs = SubstitutionPredictor(threshold=threshold).list_prediction(original_species)
subs.sort(key=lambda x: x["probability"], reverse=True)
subs = subs[0:num_subs]
pprint(subs)
[{'probability': 0.013371287006021255, 'substitutions': {Specie Y3+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.006955965572918707, 'substitutions': {Specie Na+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.005315460487213551, 'substitutions': {Specie Mn3+: Specie Mn3+, Specie Re5+: Specie Y3+, Specie O2-: Specie O2-}}, {'probability': 0.004872512156006512, 'substitutions': {Specie Y3+: Specie Y3+, Specie Fe3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.004742382503118596, 'substitutions': {Specie Y3+: Specie Y3+, Specie Sc3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.004621732557387764, 'substitutions': {Specie Yb3+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.004484870617262718, 'substitutions': {Specie K+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.0043959763085879905, 'substitutions': {Specie Ho3+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.004293789374843302, 'substitutions': {Specie Nd3+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}, {'probability': 0.004225888976078515, 'substitutions': {Specie Er3+: Specie Y3+, Specie Mn3+: Specie Mn3+, Specie O2-: Specie O2-}}]
Create a new list of just the substituted specie combinations:
trial_subs = [list(sub["substitutions"].keys()) for sub in subs]
pprint(trial_subs)
[[Specie Y3+, Specie Mn3+, Specie O2-], [Specie Na+, Specie Mn3+, Specie O2-], [Specie Re5+, Specie Mn3+, Specie O2-], [Specie Y3+, Specie Fe3+, Specie O2-], [Specie Y3+, Specie Sc3+, Specie O2-], [Specie Yb3+, Specie Mn3+, Specie O2-], [Specie K+, Specie Mn3+, Specie O2-], [Specie Ho3+, Specie Mn3+, Specie O2-], [Specie Nd3+, Specie Mn3+, Specie O2-], [Specie Er3+, Specie Mn3+, Specie O2-]]
Create a set of strings of each unique chemical system (elements separated by dashes):
elem_sys_list = [[specie.element for specie in sub] for sub in trial_subs]
chemsys_set = set()
for sys in elem_sys_list:
chemsys_set.add("-".join(map(str, sys)))
pprint(chemsys_set)
{'Er-Mn-O', 'Ho-Mn-O', 'K-Mn-O', 'Na-Mn-O', 'Nd-Mn-O', 'Re-Mn-O', 'Y-Fe-O', 'Y-Mn-O', 'Y-Sc-O', 'Yb-Mn-O'}
Create a new dictionary and populate it with all structures for each chemical system:
all_structs = {}
for chemsys in chemsys_set:
all_structs[chemsys] = mpr.get_structures(
chemsys
) # Getting all structures -- this can take a while!
auto_oxi = (
AutoOxiStateDecorationTransformation()
) # create object to determine oxidation states at each lattice site
Now create a new dictionary of all structures (with oxidation states) for each chemical system:
oxi_structs = {}
for chemsys in all_structs:
oxi_structs[chemsys] = []
for num, struct in enumerate(all_structs[chemsys]):
try:
oxi_structs[chemsys].append(
{
"structure": auto_oxi.apply_transformation(struct),
"id": str(chemsys + "_" + str(num)),
}
)
except Exception:
continue # if auto oxidation fails, try next structure
# pprint(oxi_structs)
Now create a new dictionary trans_structures populated with predicted structures made up of original species. Note: these new predicted structures are TransformedStructure objeects:
sbr = Substitutor(
threshold=threshold
) # create a Substitutor object with structure prediction threshold
trans_structs = {}
for chemsys in oxi_structs:
trans_structs[chemsys] = sbr.pred_from_structures(
original_species, oxi_structs[chemsys]
)
Filter duplicate structures using StructureMatcher:
# create object for structure matching
sm = StructureMatcher(comparator=ElementComparator(), primitive_cell=False)
filtered_structs = {} # new filtered dictionary
seen_structs = [] # list of all seen structures, independent of chemical system
n_entries = sum(len(sys) for sys in trans_structs.values())
print(f"Number of entries BEFORE filtering: {n_entries}")
for chemsys in trans_structs:
filtered_structs[chemsys] = []
for struct in trans_structs[chemsys]:
found = False
for struct2 in seen_structs:
if sm.fit(struct.final_structure, struct2.final_structure):
found = True
break
if not found:
filtered_structs[chemsys].append(struct)
seen_structs.append(struct)
n_entries = sum(len(sys) for sys in filtered_structs.values())
print(f"Number of entries AFTER filtering: {n_entries}")
Number of entries BEFORE filtering: 16 Number of entries AFTER filtering: 6
NOTE: The chemical systems to which the filtered structures are assigned might change when re-running the program. Since we are filtering for duplicates across chemical systems, either of the two systems may be reported in the filtered dictionary. Which of the two systems it is simply depends on the order in that the filter algorithm follows (and it's reading from a naturally unordered dictionary!)
Now we wish to run one more filter to remove all duplicate structures already accessible on the Materials Project.
# get all known MP structures for original system
known_structs = mpr.get_structures("Y-Mn-O")
final_filtered_structs = {}
n_entries = sum(len(sys) for sys in filtered_structs.values())
print(f"Number of entries BEFORE filtering against MP: {n_entries}")
for chemsys in filtered_structs:
final_filtered_structs[chemsys] = []
for struct in filtered_structs[chemsys]:
found = False
for struct2 in known_structs:
if sm.fit(struct.final_structure, struct2):
found = True
break
if not found:
final_filtered_structs[chemsys].append(struct)
n_entries = sum(len(sys) for sys in final_filtered_structs.values())
print(f"Number of entries AFTER filtering against MP: {n_entries}")
pprint(final_filtered_structs)
Number of entries BEFORE filtering against MP: 6 Number of entries AFTER filtering against MP: 1 {'Er-Mn-O': [], 'Ho-Mn-O': [], 'K-Mn-O': [], 'Na-Mn-O': [], 'Nd-Mn-O': [], 'Re-Mn-O': [], 'Y-Fe-O': [<pymatgen.alchemy.materials.TransformedStructure object at 0x1271dea90>], 'Y-Mn-O': [], 'Y-Sc-O': [], 'Yb-Mn-O': []}
Create final structure dictionary with StructureNL objects for each transformed structure (Note: this requires installation of pybtex):
final_structs = {}
for chemsys in final_filtered_structs:
final_structs[chemsys] = [
struct.to_snl([{"name": "Matthew McDermott", "email": "N/A"}])
for struct in final_filtered_structs[chemsys]
]
# Printing one of the StructureNL objects - this is a large dictionary!
# pprint(final_structs['Y-Fe-O'][0].as_dict())
/Users/mcdermott/miniconda3/envs/dev/lib/python3.7/site-packages/pymatgen/alchemy/materials.py:346: UserWarning: Data in TransformedStructure.other_parameters discarded during type conversion to SNL warn('Data in TransformedStructure.other_parameters discarded '