This notebook demonstrates how you can obtain various data from the Materials Project using pymatgen's interface to the Materials API.
# Uncomment the subsequent lines in this cell to install dependencies for Google Colab.
# !pip install pymatgen==2022.7.19
import pprint
import re
from pymatgen.core import Composition
from pymatgen.ext.matproj import MPRester
# Make sure that you have the Materials API key. Put the key in the call to
# MPRester if needed, e.g, MPRester("MY_API_KEY")
mpr = MPRester()
Let's say you want to find all structures with similar stoichiometry to Fe2O3.
comp = Composition("Fe2O3")
anon_formula = comp.anonymized_formula
# We need to convert the formula to the dict form used in the database.
anon_formula = {
m.group(1): int(m.group(2)) for m in re.finditer(r"([A-Z]+)(\d+)", anon_formula)
}
data = mpr.query(
{"anonymous_formula": anon_formula},
properties=["task_id", "pretty_formula", "structure"],
)
print(len(data)) # Should show ~600 data.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1099/1099 [00:03<00:00, 316.58it/s]
1099
# data now contains a list of dict. This shows you what each dict has.
# Note that the mp id is named "task_id" in the database itself.
pprint.pprint(data[0])
{'pretty_formula': 'Ca3N2', 'structure': Structure Summary Lattice abc : 4.60114 4.60114 4.60114 angles : 90.0 90.0 90.0 volume : 97.40838513596154 A : 4.60114 0.0 0.0 B : 0.0 4.60114 0.0 C : 0.0 0.0 4.60114 PeriodicSite: Ca (0.0000, 2.3006, 2.3006) [0.0000, 0.5000, 0.5000] PeriodicSite: Ca (2.3006, 0.0000, 2.3006) [0.5000, 0.0000, 0.5000] PeriodicSite: Ca (2.3006, 2.3006, 0.0000) [0.5000, 0.5000, 0.0000] PeriodicSite: N (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000] PeriodicSite: N (2.3006, 2.3006, 2.3006) [0.5000, 0.5000, 0.5000], 'task_id': 'mp-1013524'}
Band structures are fairly large objects. It is not recommended that you download large quantities of bandstructures in one shot, but rather just download the ones you need.
bs = mpr.get_bandstructure_by_material_id("mp-20470")
from pymatgen.electronic_structure.plotter import BSPlotter
%matplotlib inline
plotter = BSPlotter(bs)
plotter.show()
We have 5000 elastic constants and growing. You can easily get all the elastic data with materials ids as follows.
elastic_data = mpr.query(
{"elasticity": {"$exists": True}},
properties=["task_id", "pretty_formula", "elasticity"],
)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13172/13172 [00:13<00:00, 983.25it/s]
print(len(elastic_data))
pprint.pprint(elastic_data[0])
13172 {'elasticity': {'G_Reuss': 97.0, 'G_VRH': 97.0, 'G_Voigt': 97.0, 'G_Voigt_Reuss_Hill': 97.0, 'K_Reuss': 191.0, 'K_VRH': 191.0, 'K_Voigt': 191.0, 'K_Voigt_Reuss_Hill': 191.0, 'compliance_tensor': [[3.8, -1.1, -1.1, 0.0, 0.0, -0.0], [-1.1, 4.4, -1.6, 0.0, -0.0, -0.0], [-1.1, -1.6, 4.4, 0.0, -0.0, 0.0], [0.0, 0.0, 0.0, 9.7, 0.0, 0.0], [0.0, -0.0, -0.0, 0.0, 10.1, -0.0], [-0.0, -0.0, 0.0, 0.0, -0.0, 10.1]], 'elastic_anisotropy': 0.03, 'elastic_tensor': [[332.0, 124.0, 124.0, 0.0, 0.0, 0.0], [124.0, 307.0, 140.0, 0.0, 0.0, 0.0], [124.0, 140.0, 307.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 103.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 99.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 99.0]], 'elastic_tensor_original': [[306.34501796434415, 140.02516106492783, 120.25373900307284, 0.0, 0.0, 0.0], [140.73523286459786, 307.24078786025365, 120.98049494894944, 0.0, 0.0, 0.0], [126.83512772730717, 126.83512772730717, 331.89783867849764, 0.0, 0.0, 0.0], [-0.021507705351341142, -0.02905753108932332, -0.021567639553376226, 98.8858018498472, 0.0, 0.0], [-0.0031093457999872534, 0.007379276530962137, -0.002431547075297005, 0.0, 98.88513276899003, 0.0], [0.0001522596284748585, 0.00018797650767891905, 0.0002879295106355058, 0.0, 0.0, 103.34758692996046]], 'homogeneous_poisson': 0.28, 'nsites': 12, 'poisson_ratio': 0.28, 'universal_anisotropy': 0.03, 'warnings': []}, 'pretty_formula': 'Nb4CoSi', 'task_id': 'mp-10003'}
In general, almost any data can be obtained from MP using the MPRester, either via the high-level functions or the very powerful "query" method.
For more complex queries, you can refer to the documentation for the Materials API at https://github.com/materialsproject/mapidoc.
Pymatgen has its own structure matching algorithm, which we have used to effectively reduce the 130,000 structures in ICSD to ~60,000 - 70,000 structures. It is fast and accurate. Here's an example of how it works.
from pymatgen.analysis.structure_matcher import StructureMatcher
m = (
StructureMatcher()
) # You can customize tolerances etc., but the defaults usually work fine.
s1 = data[0]["structure"]
print(s1)
s2 = s1.copy()
s2.apply_strain(0.1)
print(s2)
Full Formula (Ca3 N2) Reduced Formula: Ca3N2 abc : 4.601140 4.601140 4.601140 angles: 90.000000 90.000000 90.000000 Sites (5) # SP a b c magmom --- ---- --- --- --- -------- 0 Ca 0 0.5 0.5 0.001 1 Ca 0.5 0 0.5 0.001 2 Ca 0.5 0.5 0 0.001 3 N 0 0 0 0 4 N 0.5 0.5 0.5 -0.009 Full Formula (Ca3 N2) Reduced Formula: Ca3N2 abc : 5.061254 5.061254 5.061254 angles: 90.000000 90.000000 90.000000 Sites (5) # SP a b c magmom --- ---- --- --- --- -------- 0 Ca 0 0.5 0.5 0.001 1 Ca 0.5 0 0.5 0.001 2 Ca 0.5 0.5 0 0.001 3 N 0 0 0 0 4 N 0.5 0.5 0.5 -0.009
print(m.fit(s1, s2))
True
For something more challenging, let's see how many structures are similar to Gd2O3
matches = []
for d in data:
if m.fit_anonymous(d["structure"], s1):
matches.append(d)
# The above fitting took a few seconds. We have 32 similar structures.
print(len(matches))
22
# Let's see a few of the matches.
pprint.pprint(matches[0])
pprint.pprint(matches[1])
pprint.pprint(matches[2])
{'pretty_formula': 'Ca3N2', 'structure': Structure Summary Lattice abc : 4.60114 4.60114 4.60114 angles : 90.0 90.0 90.0 volume : 97.40838513596154 A : 4.60114 0.0 0.0 B : 0.0 4.60114 0.0 C : 0.0 0.0 4.60114 PeriodicSite: Ca (0.0000, 2.3006, 2.3006) [0.0000, 0.5000, 0.5000] PeriodicSite: Ca (2.3006, 0.0000, 2.3006) [0.5000, 0.0000, 0.5000] PeriodicSite: Ca (2.3006, 2.3006, 0.0000) [0.5000, 0.5000, 0.0000] PeriodicSite: N (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000] PeriodicSite: N (2.3006, 2.3006, 2.3006) [0.5000, 0.5000, 0.5000], 'task_id': 'mp-1013524'} {'pretty_formula': 'Ba3P2', 'structure': Structure Summary Lattice abc : 6.04929 6.04929 6.04929 angles : 90.0 90.0 90.0 volume : 221.36717082405707 A : 6.04929 0.0 0.0 B : 0.0 6.04929 0.0 C : 0.0 0.0 6.04929 PeriodicSite: Ba (0.0000, 3.0246, 3.0246) [0.0000, 0.5000, 0.5000] PeriodicSite: Ba (3.0246, 0.0000, 3.0246) [0.5000, 0.0000, 0.5000] PeriodicSite: Ba (3.0246, 3.0246, 0.0000) [0.5000, 0.5000, 0.0000] PeriodicSite: P (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000] PeriodicSite: P (3.0246, 3.0246, 3.0246) [0.5000, 0.5000, 0.5000], 'task_id': 'mp-1013551'} {'pretty_formula': 'Sr3As2', 'structure': Structure Summary Lattice abc : 5.81 5.81 5.81 angles : 90.0 90.0 90.0 volume : 196.12294099999997 A : 5.81 0.0 0.0 B : 0.0 5.81 0.0 C : 0.0 0.0 5.81 PeriodicSite: Sr (0.0000, 2.9050, 2.9050) [0.0000, 0.5000, 0.5000] PeriodicSite: Sr (2.9050, 0.0000, 2.9050) [0.5000, 0.0000, 0.5000] PeriodicSite: Sr (2.9050, 2.9050, 0.0000) [0.5000, 0.5000, 0.0000] PeriodicSite: As (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000] PeriodicSite: As (2.9050, 2.9050, 2.9050) [0.5000, 0.5000, 0.5000], 'task_id': 'mp-1013559'}
You can see that we have successfully found iso-structural materials!