Welcome to ai4materials’s documentation!¶
The current documentation is not actively mantained and thus might not be up-to-date. For the most recent documentation, please visit ai4materials github repository https://github.com/angeloziletti/ai4materials.
ai4materials allows to perform complex analysis of materials science data using machine learning. It also provide functions to pre-process (on parallel processors), save and subsequently load materials science datasets, thus easing the traceability, reproducibility, and prototyping of new models.
ai4materials allows perform crystal-structure classification and analysis, as introduced in:
[1] | A. Leitherer, A. Ziletti, and L. M. Ghiringhelli, “Robust recognition and exploratory analysis of crystal structures via Bayesian deep learning”, https://arxiv.org/abs/2103.09777 (2021) |
Installation instructions can be found in the ai4materials github repository: https://github.com/angeloziletti/ai4materials.
On the left panel, you can find a few examples that showcase what ai4materials can do.
Moreover, ai4materials can also reproduce results from the following publications:
[2] | A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article] |
[3] | L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article] |
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
Installation¶
Installation instructions can be found in the ai4materials github repository: https://github.com/angeloziletti/ai4materials.
Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Representing crystal structures: descriptors¶
The first necessary step to perform any machine learning and/or automatized analysis on materials science data is to represent the material under consideration in a way that is understandable for a computer. This representation - termed descriptor - should contain all the relevant information on the system needed for the desired learning task.
Starting from crystal structure, provided as ASE (Atomistic Simulation Environment) Atoms object [link], the code allows to calculate different representations. Currently, the following descriptors (i.e. function to represent crystal structures) are implemented:
ai4materials.descriptors.atomic_features
returns the atomic features corresponding to the chemical species of the system [1]ai4materials.descriptors.diffraction2d
calculates the two-dimensional diffraction fingerprint [2]ai4materials.descriptors.diffraction3d
calculates the three-dimensional diffraction fingerprint [3]ai4materials.descriptors.prdf
calculates the partial radial distribution function [4]ai4materials.descriptors.SOAP
calculates the SOAP descriptor [5]
For example of descriptors’ usage and their references, see below.
Example: atomic features¶
It was recently shown in Ref. [1] that the crystal structure of binary compounds can be predicted using compressed-sensing technique using atomic features only.
The code below illustrates how to retrieve atomic features for one crystal structure. It performs the following steps:
- build a NaCl crystal structure using the ASE package
- calculate atomic features using the descriptor
ai4materials.descriptors.atomic_features.AtomicFeatures
- retrieve the atomic features of this crystal structure as the panda dataframe
nacl_atomic_features
- save this table to file.
import sys
import os.path
atomic_data_dir = os.path.abspath(os.path.normpath("/home/ziletti/nomad/nomad-lab-base/analysis-tools/atomic-data"))
sys.path.insert(0, atomic_data_dir)
from ase.spacegroup import crystal
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.descriptors.atomic_features import AtomicFeatures
from nomadcore.local_meta_info import loadJsonFile, InfoKindEl
# setup configs
configs = set_configs(main_folder='./desc_atom_features_ai4materials')
logger = setup_logger(configs, level='INFO', display_configs=False)
desc_file_name = 'atomic_features_try1'
# build atomic structure
structure = crystal(['Na', 'Cl'], [(0, 0, 0), (0.5, 0.5, 0.5)], spacegroup=225, cellpar=[5.64, 5.64, 5.64, 90, 90, 90])
selected_feature_list = ['atomic_ionization_potential', 'atomic_electron_affinity',
'atomic_rs_max', 'atomic_rp_max', 'atomic_rd_max']
# define and calculate descriptor
kwargs = {'feature_order_by': 'atomic_mulliken_electronegativity', 'energy_unit': 'eV', 'length_unit': 'angstrom'}
descriptor = AtomicFeatures(configs=configs, **kwargs)
structure_result = descriptor.calculate(structure, selected_feature_list=selected_feature_list)
nacl_atomic_features = structure_result.info['descriptor']['atomic_features_table']
# write table to file
nacl_atomic_features.to_csv('nacl_atomic_features_table.csv', float_format='%.4f')
This is the table (saved in the file nacl_atomic_features_table.csv) containing the atomic features obtained using the code above:
ordered_chemical_symbols | atomic_ionization_potential(A) | atomic_electron_affinity(A) | atomic_rs_max(A) | atomic_rp_max(A) | atomic_rd_max(A) | atomic_ionization_potential(B) | atomic_electron_affinity(B) | atomic_rs_max(B) | atomic_rp_max(B) | atomic_rd_max(B) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaCl | -5.2231 | -0.7157 | 1.7100 | 2.6000 | 6.5700 | -13.9018 | -3.9708 | 0.6800 | 0.7600 | 1.6700 |
Example: two-dimensional diffraction fingerprint¶
The two-dimensional diffraction fingerprint was introduced in Ref. [2].
The code below illustrates how to calculate the two-dimensional diffraction fingerprint for a supercell of face-center-cubic aluminium containing approximately 256 atoms, performing following steps:
- build a face-centered-cubic aluminium crystal structure using the ASE package
- create a supercell using the function
ai4materials.utils.utils_crystals.create_supercell
- calculate the two-dimensional diffraction fingerprint of this crystal structure as the numpy.array
intensity_rgb
- convert the two-dimensional diffraction fingerprint as RGB image and write it to file.
from ase.spacegroup import crystal
from ase.io import write
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_crystals import create_supercell
import numpy as np
from PIL import Image
# setup configs
configs = set_configs(main_folder='./desc_2d_diff_ai4materials')
# create the fcc aluminium structure
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
structure = create_supercell(fcc_al, target_nb_atoms=256)
# calculate the two-dimensional diffraction fingerprint
descriptor = Diffraction2D(configs=configs)
structure_result = descriptor.calculate(structure)
intensity_rgb = structure_result.info['descriptor']['diffraction_2d_intensity']
# write the diffraction fingerprint as png image
rgb_array = np.zeros((intensity_rgb.shape[0], intensity_rgb.shape[1], intensity_rgb.shape[2]), 'uint8')
current_img = list(intensity_rgb.reshape(-1, intensity_rgb.shape[0], intensity_rgb.shape[1]))
for ix_ch in range(len(current_img)):
rgb_array[..., ix_ch] = current_img[ix_ch] * 255
img = Image.fromarray(rgb_array)
img = img.resize([256, 256], Image.ANTIALIAS)
img.save('fcc_al_diffraction2d_fingerprint.png')
This is the calculated two-dimensional diffraction fingerprint for face-centered-cubic aluminium:

Implementation details of the two-dimensional diffraction fingerprint can be found at
ai4materials.descriptors.diffraction2d
.
Example: three-dimensional diffraction fingerprint¶
The three-dimensional diffraction fingerprint was introduced in Ref. [3].
The code below illustrates how to calculate the three-dimensional diffraction fingerprint for a supercell of face-center-cubic aluminium containing approximately 256 atoms, performing following steps:
- build a face-centered-cubic aluminium crystal structure using the ASE package
- create a supercell using the function
ai4materials.utils.utils_crystals.create_supercell
- calculate the three-dimensional diffraction fingerprint of this crystal structure as the numpy.array
diff3d_spectrum
- convert the two-dimensional diffraction fingerprint as a heatmap image and write it to file.
from ase.spacegroup import crystal
import matplotlib.pyplot as plt
from ai4materials.descriptors.diffraction3d import DISH
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_crystals import create_supercell
from scipy import ndimage
# setup configs
configs = set_configs(main_folder='./dish_ai4materials')
# create the fcc aluminium structure
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
structure = create_supercell(fcc_al, target_nb_atoms=256, random_rotation=True, cell_type='standard', optimal_supercell=False)
# calculate the two-dimensional diffraction fingerprint
descriptor = DISH(configs=configs)
structure_result = descriptor.calculate(structure)
diff3d_spectrum = descriptor.calculate(structure).info['descriptor']['diffraction_3d_sh_spectrum']
# plot the (enlarged) array as image (enlarging is unphysical, only for visualization purposes)
plt.imsave('fcc_al_diffraction3d_fingerprint.png', ndimage.zoom(diff3d_spectrum, (4, 4)))
This is the calculated three-dimensional diffraction fingerprint for face-centered-cubic aluminium (zoomed for visualization purposes):

Implementation details of the three-dimensional diffraction fingerprint can be found at
ai4materials.descriptors.diffraction3d
.
[1] | (1, 2) L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article] |
[2] | (1, 2) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article] |
[3] | (1, 2)
|
[4] | K. T. Schuett, H. Glawe, F. Brockherde, A. Sanna, K. R. M”uller, and E. K. U.Gross, “How to represent crystal structures for machine learning: Towards fast prediction of electronic properties,” Physical Review B, vol. 89, pp. 205118 (2014). [Link to article] |
[5] | A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical environments,” Physical Review B, vol. 87, no. 18, p.184115 (2013) [Link to article] |
Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Submodules¶
ai4materials.descriptors.atomic_features module¶
ai4materials.descriptors.base_descriptor module¶
ai4materials.descriptors.diffraction1d module¶
ai4materials.descriptors.diffraction2d module¶
ai4materials.descriptors.diffraction3d module¶
ai4materials.descriptors.ft_soap_descriptor module¶
ai4materials.descriptors.prdf module¶
-
class
ai4materials.descriptors.prdf.
PRDF
(configs=None, cutoff_radius=20, rdf_only=False)[source]¶ Bases:
ai4materials.descriptors.base_descriptor.Descriptor
Compute the partial radial distribution of a given crystal structure.
Cell vectors v1,v2,v3 with values in the columns: [[v1x,v2x,v3x],[v1y,v2y,v3x],[v1z,v2z,v3z]]
Parameters:
- cutoff_radius: float, optional (default=20)
- Atoms within a sphere of cut-off radius (in Angstrom) are considered.
- rdf_only: bool, optional (defaults=`False`)
- If False calculates partial radial distribution function. If True calculates radial distribution function (all atom types are considered as the same)
Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de> and Angelo Ziletti <angelo.ziletti@gmail.com>
-
calculate
(structure, **kwargs)[source]¶ Calculate the descriptor for the given ASE structure.
Parameters:
- structure: ase.Atoms object
- Atomic structure.
Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>
-
write
(structure, tar, op_id=0, write_geo=True, format_geometry='aims')[source]¶ Write the descriptor to file.
Parameters:
- structure: ase.Atoms object
- Atomic structure.
- tar: TarFile object
- TarFile archive where the descriptor is added. This is created internally with tarfile.open.
- op_id: int, optional (default=0)
- Number of the applied operation to the descriptor. At present always set to zero in the code.
- write_geo: bool, optional (default=`True`)
- If True, write a coordinate file of the structure for which the diffraction pattern is calculated.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.descriptors.prdf.
get_design_matrix
(structures, total_bins=50, max_dist=25)[source]¶ Starting from atomic structures calculate the design matrix for the partial radial distribution function.
The list of structures must contain the calculated
ai4materials.descriptors.prdf.PRDF
. The discretization is performed using a logarithmic grid as follows:bins = np.logspace(0, np.log10(max_dist), num=total_bins + 1) - 1Parameters:
- structures:
ase.Atoms
object or list ofase.Atoms
object - Atomic structure or list of atomic structure.
- total_bins: int, optional (default=50)
- Total number of bins to be used in the discretization of the partial radial distribution function.
- max_dist: float, optional (default=25)
- Maximum distance to consider in the partial radial distribution function when the design matrix is
calculated. Unit in Angstrom.
The unit of measure is the same as
ai4materials.descriptors.prdf.PRDF
.
Return:
- scipy.sparse.csr.csr_matrix, shape [n_samples, largest_atomic_nb * largest_atomic_nb * total_bins]
- Returns a sparse row-compressed matrix.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
- structures:
-
ai4materials.descriptors.prdf.
get_unique_chemical_species
(structures)[source]¶ Get the set of unique chemical species from a list of atomic structures.
The list of structures must contain the calculated
ai4materials.descriptors.prdf.PRDF
.Parameters:
- structures:
ase.Atoms
object or list ofase.Atoms
objects - Atomic structure or list of atomic structure.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
- structures:
ai4materials.descriptors.quippy_soap_descriptor module¶
ai4materials.descriptors.soap_model module¶
Module contents¶
Creating and loading materials science datasets¶
Before performing any data analysis, pre-processing steps (e.g. descriptor calculation) are often needed to transform materials science data in a suitable form for the algorithm of choice, being it for example a neural network. This pre-processing is usually a computationally demanding step, especially if hundred of thousands of structures needs to be calculated, possible for different parameters setting.
Since hyperparameter tuning of the regression/classification algorithm typically requires to run the model several times (for a given pre-processed dataset), it is thus highly beneficial to be able to save and re-load the pre-processed results in a consistent and traceable manner.
Here we provide functions to pre-process (in parallel), save and subsequently load materials science datasets; this not only eases the traceability and reproduciblity of data analysis on materials science data, but speeds up the prototyping of new models.
Example: diffraction fingerprint calculation for multiple structures¶
The code below illustrates how to compute a descriptor for multiple crystal structures using multiple processors, save the results to file, and reload the file for later use (e.g. for classification).
As illustrative example we calculate the two-dimensional diffraction fingerprint [1] of pristine (e.g. perfect) and highly defective (50% of missing atoms) crystal structures. In particular, the four crystal structures considered are: body-centered cubic (bcc), face-centered cubic(fcc), diamond (diam), and hexagonal closed packed (hcp) structures; more than 80% of elemental solids adopt one of these four crystal structures under standard conditions.
The steps performed in the code below are the following:
- define the folders where the results are going to be saved
- build the four crystal structures (bcc, fcc, diam, hcp) using the ASE package
- create a pristine supercell using the function
ai4materials.utils.utils_crystals.create_supercell
- create a defective supercell (50% of atoms missing) using the function
ai4materials.utils.utils_crystals.create_vacancies
- calculate the two-dimensional diffraction fingerprint for all (eight) crystal structures using
ai4materials.wrappers.calc_descriptor
- save the results to file
- reload the results from file
- generate a texture atlas with the two-dimensional diffraction fingerprints of all structures and write it to file.
Implementation details of the two-dimensional diffraction fingerprint can be found at
ai4materials.descriptors.diffraction2d
.
from ase.spacegroup import crystal
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import create_supercell
from ai4materials.utils.utils_crystals import create_vacancies
from ai4materials.utils.utils_data_retrieval import generate_facets_input
from ai4materials.wrappers import calc_descriptor
from ai4materials.wrappers import load_descriptor
import os.path
# set configs
configs = set_configs(main_folder='./multiple_2d_diff_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)
# setup folder and files
desc_file_name = 'fcc_bcc_diam_hcp_example'
# build crystal structures
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
bcc_fe = crystal('Fe', [(0, 0, 0)], spacegroup=229, cellpar=[2.87, 2.87, 2.87, 90, 90, 90])
diamond_c = crystal('C', [(0, 0, 0)], spacegroup=227, cellpar=[3.57, 3.57, 3.57, 90, 90, 90])
hcp_mg = crystal('Mg', [(1. / 3., 2. / 3., 3. / 4.)], spacegroup=194, cellpar=[3.21, 3.21, 5.21, 90, 90, 120])
# create supercells - pristine
fcc_al_supercell = create_supercell(fcc_al, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell = create_supercell(bcc_fe, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell = create_supercell(diamond_c, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell = create_supercell(hcp_mg, target_nb_atoms=128, cell_type='standard_no_symmetries')
# create supercells - vacancies
fcc_al_supercell_vac = create_vacancies(fcc_al, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell_vac = create_vacancies(bcc_fe, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell_vac = create_vacancies(diamond_c, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell_vac = create_vacancies(hcp_mg, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
ase_atoms_list = [fcc_al_supercell, fcc_al_supercell_vac,
bcc_fe_supercell, bcc_fe_supercell_vac,
diamond_c_supercell, diamond_c_supercell_vac,
hcp_mg_supercell, hcp_mg_supercell_vac]
# calculate the descriptor for the list of structures and save it to file
descriptor = Diffraction2D(configs=configs)
desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
desc_file=str(desc_file_name)+'.tar.gz', format_geometry='aims',
nb_jobs=-1)
# load the previously saved file containing the crystal structures and their corresponding descriptor
target_list, structure_list = load_descriptor(desc_files=desc_file_path, configs=configs)
# create a texture atlas with all the two-dimensional diffraction fingerprints
df, texture_atlas = generate_facets_input(structure_list=structure_list, desc_metadata='diffraction_2d_intensity',
target_list=target_list,
sprite_atlas_filename=desc_file_name,
configs=configs, normalize=True)
This are the calculated two-dimensional diffraction fingerprints for all crystal structures in the list :

Example: atomic feature retrieval for multiple structures¶
It was recently shown in Ref. [2] that the crystal structure of binary compounds can be predicted using compressed-sensing technique using atomic features only.
The code below illustrates how to retrieve atomic features, performing the following steps:
- build a list of crystal structure using the ASE package
- retrieve atomic features using the descriptor
ai4materials.descriptors.atomic_features.AtomicFeatures
for all crystal structures - save the results to file
- reload the results from file
- construct a table
df_atomic_features
containing the atomic features using the functionai4materials.descriptors.atomic_features.get_table_atomic_features
- write the atomic feature table as csv file
- build a heatmap of the atomic feature table
import sys
import os.path
atomic_data_dir = os.path.abspath(os.path.normpath("/home/ziletti/nomad/nomad-lab-base/analysis-tools/atomic-data"))
sys.path.insert(0, atomic_data_dir)
from ase.spacegroup import crystal
import matplotlib.pyplot as plt
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import get_spacegroup_old
from ai4materials.utils.utils_binaries import get_binaries_dict_delta_e
from ai4materials.wrappers import calc_descriptor
from ai4materials.wrappers import load_descriptor
from ai4materials.descriptors.atomic_features import AtomicFeatures
from ai4materials.descriptors.atomic_features import get_table_atomic_features
import seaborn as sns
# set configs
configs = set_configs(main_folder='./dataset_atomic_features_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)
desc_file_name = 'atomic_features_try1'
# build atomic structures
group_1a = ['Li', 'Na', 'K', 'Rb']
group_1b = ['F', 'Cl', 'Br', 'I']
group_2a = ['Be', 'Mg', 'Ca', 'Sr']
group_2b = ['O', 'S', 'Se', 'Te']
ase_atoms_list = []
for el_1a in group_1a:
for el_1b in group_1b:
ase_atoms_list.append(crystal([el_1a, el_1b], [(0, 0, 0), (0.5, 0.5, 0.5)], spacegroup=225, cellpar=[5.64, 5.64, 5.64, 90, 90, 90]))
for el_2a in group_2a:
for el_2b in group_2b:
ase_atoms_list.append(crystal([el_2a, el_2b], [(0, 0, 0), (0.5, 0.5, 0.5)], spacegroup=225, cellpar=[5.64, 5.64, 5.64, 90, 90, 90]))
selected_feature_list = ['atomic_ionization_potential', 'atomic_electron_affinity',
'atomic_rs_max', 'atomic_rp_max', 'atomic_rd_max']
# define and calculate descriptor
kwargs = {'feature_order_by': 'atomic_mulliken_electronegativity', 'energy_unit': 'eV', 'length_unit': 'angstrom'}
descriptor = AtomicFeatures(configs=configs, **kwargs)
desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
desc_file=str(desc_file_name)+'.tar.gz', format_geometry='aims',
selected_feature_list=selected_feature_list,
nb_jobs=-1)
target_list, ase_atoms_list = load_descriptor(desc_files=desc_file_path, configs=configs)
df_atomic_features = get_table_atomic_features(ase_atoms_list)
# write table to file
df_atomic_features.to_csv('atomic_features_table.csv', float_format='%.4f')
# plot the table with seaborn
df_atomic_features = df_atomic_features.set_index('ordered_chemical_symbols')
mask = df_atomic_features.isnull()
fig = plt.figure()
sns.set(font_scale=0.5)
sns_plot = sns.heatmap(df_atomic_features, annot=True, mask=mask)
fig = sns_plot.get_figure()
fig.tight_layout()
fig.savefig('atomic_features_plot.png', dpi=200)
This is the table containing the atomic features obtained using the code above:
ordered_chemical_symbols | atomic_ionization_potential(A) | atomic_electron_affinity(A) | atomic_rs_max(A) | atomic_rp_max(A) | atomic_rd_max(A) | atomic_ionization_potential(B) | atomic_electron_affinity(B) | atomic_rs_max(B) | atomic_rp_max(B) | atomic_rd_max(B) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | LiF | -5.3291 | -0.6981 | 1.6500 | 2.0000 | 6.9300 | -19.4043 | -4.2735 | 0.4100 | 0.3700 | 1.4300 |
1 | KBr | -4.4332 | -0.6213 | 2.1300 | 2.4400 | 1.7900 | -12.6496 | -3.7393 | 0.7500 | 0.8800 | 1.8700 |
2 | KI | -4.4332 | -0.6213 | 2.1300 | 2.4400 | 1.7900 | -11.2571 | -3.5135 | 0.9000 | 1.0700 | 1.7200 |
3 | RbF | -4.2889 | -0.5904 | 2.2400 | 3.2000 | 1.9600 | -19.4043 | -4.2735 | 0.4100 | 0.3700 | 1.4300 |
4 | RbCl | -4.2889 | -0.5904 | 2.2400 | 3.2000 | 1.9600 | -13.9018 | -3.9708 | 0.6800 | 0.7600 | 1.6700 |
5 | RbBr | -4.2889 | -0.5904 | 2.2400 | 3.2000 | 1.9600 | -12.6496 | -3.7393 | 0.7500 | 0.8800 | 1.8700 |
6 | RbI | -4.2889 | -0.5904 | 2.2400 | 3.2000 | 1.9600 | -11.2571 | -3.5135 | 0.9000 | 1.0700 | 1.7200 |
7 | BeO | -9.4594 | 0.6305 | 1.0800 | 1.2100 | 2.8800 | -16.4332 | -3.0059 | 0.4600 | 0.4300 | 2.2200 |
8 | BeS | -9.4594 | 0.6305 | 1.0800 | 1.2100 | 2.8800 | -11.7951 | -2.8449 | 0.7400 | 0.8500 | 2.3700 |
9 | BeSe | -9.4594 | 0.6305 | 1.0800 | 1.2100 | 2.8800 | -10.9460 | -2.7510 | 0.8000 | 0.9500 | 2.1800 |
10 | BeTe | -9.4594 | 0.6305 | 1.0800 | 1.2100 | 2.8800 | -9.8667 | -2.6660 | 0.9400 | 1.1400 | 1.8300 |
11 | LiCl | -5.3291 | -0.6981 | 1.6500 | 2.0000 | 6.9300 | -13.9018 | -3.9708 | 0.6800 | 0.7600 | 1.6700 |
12 | MgO | -8.0371 | 0.6925 | 1.3300 | 1.9000 | 3.1700 | -16.4332 | -3.0059 | 0.4600 | 0.4300 | 2.2200 |
13 | MgS | -8.0371 | 0.6925 | 1.3300 | 1.9000 | 3.1700 | -11.7951 | -2.8449 | 0.7400 | 0.8500 | 2.3700 |
14 | MgSe | -8.0371 | 0.6925 | 1.3300 | 1.9000 | 3.1700 | -10.9460 | -2.7510 | 0.8000 | 0.9500 | 2.1800 |
15 | MgTe | -8.0371 | 0.6925 | 1.3300 | 1.9000 | 3.1700 | -9.8667 | -2.6660 | 0.9400 | 1.1400 | 1.8300 |
16 | CaO | -6.4280 | 0.3039 | 1.7600 | 2.3200 | 0.6800 | -16.4332 | -3.0059 | 0.4600 | 0.4300 | 2.2200 |
17 | CaS | -6.4280 | 0.3039 | 1.7600 | 2.3200 | 0.6800 | -11.7951 | -2.8449 | 0.7400 | 0.8500 | 2.3700 |
18 | CaSe | -6.4280 | 0.3039 | 1.7600 | 2.3200 | 0.6800 | -10.9460 | -2.7510 | 0.8000 | 0.9500 | 2.1800 |
19 | CaTe | -6.4280 | 0.3039 | 1.7600 | 2.3200 | 0.6800 | -9.8667 | -2.6660 | 0.9400 | 1.1400 | 1.8300 |
20 | SrO | -6.0316 | 0.3431 | 1.9100 | 2.5500 | 1.2000 | -16.4332 | -3.0059 | 0.4600 | 0.4300 | 2.2200 |
21 | SrS | -6.0316 | 0.3431 | 1.9100 | 2.5500 | 1.2000 | -11.7951 | -2.8449 | 0.7400 | 0.8500 | 2.3700 |
22 | LiBr | -5.3291 | -0.6981 | 1.6500 | 2.0000 | 6.9300 | -12.6496 | -3.7393 | 0.7500 | 0.8800 | 1.8700 |
23 | SrSe | -6.0316 | 0.3431 | 1.9100 | 2.5500 | 1.2000 | -10.9460 | -2.7510 | 0.8000 | 0.9500 | 2.1800 |
24 | SrTe | -6.0316 | 0.3431 | 1.9100 | 2.5500 | 1.2000 | -9.8667 | -2.6660 | 0.9400 | 1.1400 | 1.8300 |
25 | LiI | -5.3291 | -0.6981 | 1.6500 | 2.0000 | 6.9300 | -11.2571 | -3.5135 | 0.9000 | 1.0700 | 1.7200 |
26 | NaF | -5.2231 | -0.7157 | 1.7100 | 2.6000 | 6.5700 | -19.4043 | -4.2735 | 0.4100 | 0.3700 | 1.4300 |
27 | NaCl | -5.2231 | -0.7157 | 1.7100 | 2.6000 | 6.5700 | -13.9018 | -3.9708 | 0.6800 | 0.7600 | 1.6700 |
28 | NaBr | -5.2231 | -0.7157 | 1.7100 | 2.6000 | 6.5700 | -12.6496 | -3.7393 | 0.7500 | 0.8800 | 1.8700 |
29 | NaI | -5.2231 | -0.7157 | 1.7100 | 2.6000 | 6.5700 | -11.2571 | -3.5135 | 0.9000 | 1.0700 | 1.7200 |
30 | KF | -4.4332 | -0.6213 | 2.1300 | 2.4400 | 1.7900 | -19.4043 | -4.2735 | 0.4100 | 0.3700 | 1.4300 |
31 | KCl | -4.4332 | -0.6213 | 2.1300 | 2.4400 | 1.7900 | -13.9018 | -3.9708 | 0.6800 | 0.7600 | 1.6700 |
and this is its corresponding heatmap:

Example: dataset creation for data analytics¶
The code below illustrates how to compute a descriptor (the two-dimensional diffraction fingerprint [1]) for multiple crystal structures, save the results to file, and reload the file for later use (e.g. for classification).
The steps performed in the code below are the following:
- define the folders where the results are going to be saved
- build the four crystal structures (bcc, fcc, diam, hcp) using the ASE package
- create a pristine supercell using the function
ai4materials.utils.utils_crystals.create_supercell
- create a defective supercell (50% of atoms missing) using the function
ai4materials.utils.utils_crystals.create_vacancies
- calculate the two-dimensional diffraction fingerprint for all (eight) crystal structures
- save the results to file
- reload the results from file
- define a user-specified target variable (i.e. the variable that one want to predict with the classification/regression model); in this case this variable is the crystal structure type (‘fcc’, ‘bcc’, ‘diam’, ‘hcp’)
- create a dataset containing the specified
desc_metadata
(this needs to be compatible with the descriptor choice) - save the dataset to file in the folder
dataset_folder
, including data (numpy array), target variable (numpy array), and metadata regarding the dataset (JSON format) - re-load from file the saved dataset to be used for example in a classification task
from ase.spacegroup import crystal
from ai4materials.dataprocessing.preprocessing import load_dataset_from_file
from ai4materials.dataprocessing.preprocessing import prepare_dataset
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import create_supercell
from ai4materials.utils.utils_crystals import create_vacancies
from ai4materials.wrappers import calc_descriptor
from ai4materials.wrappers import load_descriptor
import os.path
# set configs
configs = set_configs(main_folder='./dataset_2d_diff_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)
# setup folder and files
dataset_folder = os.path.join(configs['io']['main_folder'], 'my_datasets')
desc_file_name = 'fcc_bcc_diam_hcp_example'
# build crystal structures
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
bcc_fe = crystal('Fe', [(0, 0, 0)], spacegroup=229, cellpar=[2.87, 2.87, 2.87, 90, 90, 90])
diamond_c = crystal('C', [(0, 0, 0)], spacegroup=227, cellpar=[3.57, 3.57, 3.57, 90, 90, 90])
hcp_mg = crystal('Mg', [(1. / 3., 2. / 3., 3. / 4.)], spacegroup=194, cellpar=[3.21, 3.21, 5.21, 90, 90, 120])
# create supercells - pristine
fcc_al_supercell = create_supercell(fcc_al, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell = create_supercell(bcc_fe, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell = create_supercell(diamond_c, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell = create_supercell(hcp_mg, target_nb_atoms=128, cell_type='standard_no_symmetries')
# create supercells - vacancies
fcc_al_supercell_vac = create_vacancies(fcc_al, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell_vac = create_vacancies(bcc_fe, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell_vac = create_vacancies(diamond_c, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell_vac = create_vacancies(hcp_mg, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
ase_atoms_list = [fcc_al_supercell, fcc_al_supercell_vac,
bcc_fe_supercell, bcc_fe_supercell_vac,
diamond_c_supercell, diamond_c_supercell_vac,
hcp_mg_supercell, hcp_mg_supercell_vac]
# calculate the descriptor for the list of structures and save it to file
descriptor = Diffraction2D(configs=configs)
desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
desc_file=str(desc_file_name)+'.tar.gz', format_geometry='aims',
nb_jobs=-1)
# load the previously saved file containing the crystal structures and their corresponding descriptor
target_list, structure_list = load_descriptor(desc_files=desc_file_path, configs=configs)
# add as target the spacegroup (using spacegroup of the "parental" structure for the defective structure)
targets = ['fcc', 'fcc', 'bcc', 'bcc', 'diam', 'diam', 'hcp', 'hcp']
for idx, item in enumerate(target_list):
item['data'][0]['target'] = targets[idx]
path_to_x, path_to_y, path_to_summary = prepare_dataset(
structure_list=structure_list,
target_list=target_list,
desc_metadata='diffraction_2d_intensity',
dataset_name='bcc-fcc-diam-hcp',
target_name='target',
target_categorical=True,
input_dims=(64, 64),
configs=configs,
dataset_folder=dataset_folder,
main_folder=configs['io']['main_folder'],
desc_folder=configs['io']['desc_folder'],
tmp_folder=configs['io']['tmp_folder'],
notes="Dataset with bcc, fcc, diam and hcp structures, pristine and with 50% of defects.")
x, y, dataset_info = load_dataset_from_file(path_to_x=path_to_x, path_to_y=path_to_y,
path_to_summary=path_to_summary)
In the code above, the numpy array x contains the specified desc_metadata
, the numpy array y contains
the specified targets, and dataset_info is a dictionary containing information regarding the dataset was just loaded:
{
"data":[{
"target_name": "target",
"n_bins": 100,
"path_to_summary": "/home/ziletti/Documents/calc_xray/2d_nature_comm/my_datasets/bcc-fcc-diam-hcp_summary.json",
"creation_date": "2018-06-20T18:42:07.110239",
"numerical_labels": [
2,
2,
0,
0,
1,
1,
3,
3
],
"classes": [
"bcc",
"diam",
"fcc",
"hcp"
],
"nb_classes": 4,
"path_to_y": "/home/ziletti/Documents/calc_xray/2d_nature_comm/my_datasets/bcc-fcc-diam-hcp_y.pkl",
"path_to_x": "/home/ziletti/Documents/calc_xray/2d_nature_comm/my_datasets/bcc-fcc-diam-hcp_x.pkl",
"text_labels": [
"fcc",
"fcc",
"bcc",
"bcc",
"diam",
"diam",
"hcp",
"hcp"
],
"target_categorical": true,
"disc_type": null,
"notes": "Dataset with bcc, fcc, diam and hcp structures, pristine and with 50% of defects.",
"dataset_name": "bcc-fcc-diam-hcp"
}
] }
[1] | (1, 2) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article] |
[2] | L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article] |
Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Regression and classification models¶
ai4materials allows to apply state-of-the-art data-analytics models to relevant materials science. Below, we present an example based on compressed sensing.
Example regression: LASSO+l0 method¶
This example shows how to find descriptive parameters (short formulas) that predict crystal structure, using the example of octet binary compounds that have either rocksalt (RS) or zincblende (ZB) structure. It is based on Ref. [1], and it allows to reproduce the results presented in Fig. 2 of this reference.
Starting from simple physical quantities (“building blocks”, here properties of the constituent free atoms such as orbital radii), thousands of candidate formulas are generated by applying arithmetic operations combining building blocks, for example forming sums and products of them. These candidate formulas constitute the so-called “feature space”. Then, a sparse regression method is used to select only a few of these formulas that explain the data.
The code below performs following steps:
- read the dataset containing binary materials from file
- calculate the atomic features using the descriptor
ai4materials.descriptors.atomic_features.AtomicFeatures
- calculate the descriptive parameters using the LASSO+l0 method with
ai4materials.wrappers.calc_model
- plot the results.
import sys
import os.path
atomic_data_dir = os.path.normpath('/home/ziletti/nomad/nomad-lab-base/analysis-tools/atomic-data')
sys.path.insert(0, atomic_data_dir)
import matplotlib.pyplot as plt
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_data_retrieval import read_ase_db
from ai4materials.wrappers import load_descriptor
from ai4materials.wrappers import calc_model
from ai4materials.wrappers import calc_descriptor
from ai4materials.descriptors.atomic_features import AtomicFeatures
from ai4materials.descriptors.atomic_features import get_table_atomic_features
from ai4materials.utils.utils_config import get_data_filename
from ai4materials.visualization.viewer import read_control_file
import numpy as np
import pandas as pd
# modify this path if you want to save the calculation results in another location
configs = set_configs(main_folder='./l1_l0_example')
logger = setup_logger(configs, level='INFO')
# setup folder and files
lookup_file = os.path.join(configs['io']['main_folder'], 'lookup.dat')
materials_map_plot_file = os.path.join(configs['io']['main_folder'], 'binaries_l1_l0_map_prl2015.png')
# define descriptor - atomic features in this case
kwargs = {'energy_unit': 'eV', 'length_unit': 'angstrom'}
descriptor = AtomicFeatures(configs=configs, **kwargs)
# =============================================================================
# Descriptor calculation
# =============================================================================
desc_file_name = 'atomic_features_binaries'
ase_db_file = get_data_filename('data/db_ase/binaries_lowest_energy_ghiringhelli2015.json')
ase_atoms_list = read_ase_db(db_path=ase_db_file)
selected_feature_list = ['atomic_ionization_potential', 'atomic_electron_affinity', 'atomic_rs_max',
'atomic_rp_max', 'atomic_rd_max']
allowed_operations = ['+', '-', '/', '|-|', 'exp', '^2']
desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
desc_file='lasso_l0_binaries_example.tar.gz',
format_geometry='aims',
selected_feature_list=selected_feature_list,
nb_jobs=-1)
# load descriptor
target_list, structure_list = load_descriptor(desc_files=desc_file_path, configs=configs)
df_atomic_features = get_table_atomic_features(structure_list)
# =============================================================================
# Model calculation
# =============================================================================
chemical_formulas = [structure.get_chemical_formula(mode='hill') for structure in structure_list]
df_atomic_features['chemical_formula'] = chemical_formulas
df_atomic_features = df_atomic_features.sort_values(by='chemical_formula').reset_index(drop=True)
# target values to predict
dict_delta_e = dict(SeZn=0.2631369195046646, BaTe=-0.37538683850924387, BN=1.7120803923951688,
CGe=0.8114429425515818, GaP=0.3487518245522925, MgS=-0.08669951164989079,
GaN=0.4334452723999156, AlAs=0.21326186549251072, BP=1.019225239514441, FK=-0.14640610974868423,
BrLi=-0.03274621540254649, BSb=0.5808491589999847, CaTe=-0.3504563060008138,
ClK=-0.16446069285018655, BrCs=-0.1558673149861294, BrCu=0.15244265149855352,
ILi=-0.021660938008450818, CuF=-0.01702227364862989, FNa=-0.14578814899027592,
C2=2.6286038411199026, AgBr=-0.030033419005850936, CuI=0.20467459898973175,
GaSb=0.15462529698986593, ClLi=-0.03838148564873346, AsIn=0.13404758548892423,
OZn=0.10196818460305757, MgO=-0.2322747421651549, InP=0.17919330099729866,
Ge2=0.20085254149716641, InN=0.15372030450150198, CSn=0.45353800899655555,
CdTe=0.11453954098812649, TeZn=0.24500131400199776, MgTe=-0.004591286999846332,
BaS=-0.3197624539995756, CaSe=-0.36079776214906895, FRb=-0.1355957874033439,
BeO=0.6918376303948839, AsB=0.8749782510022386, CaS=-0.36913322290101264,
CaO=-0.2652190617003161, BaO=-0.09299856100784433, AlSb=0.15686874600534004,
SrTe=-0.3792947550252322, BeS=0.5063277134499351, InSb=0.0780598790169251,
SZn=0.27581334679854935, OSr=-0.2203066401004525, BrRb=-0.1638205440075271,
BeSe=0.4949404808020511, ClRb=-0.16050356640655905, BrNa=-0.1264287376032476,
MgSe=-0.05530180620975655, GeSn=0.08166336650886348, GeSi=0.2632101904042582,
CsF=-0.10826332699038382, CdSe=0.08357195550137826, FLi=-0.059488321434879074,
AlN=0.07294907877519896, Si2=0.2791658430004932, SiSn=0.13510880949563495,
ClNa=-0.13299199530041886, CdO=-0.0841613645001312, SSr=-0.36843415824218,
IK=-0.16703915799644553, BaSe=-0.3434451604764059, BrK=-0.1661759769597461,
BeTe=0.4685859464949282, CdS=0.07267280149604124, CsI=-0.16238748698990838,
INa=-0.11483823100687315, AlP=0.2189583583002711, AsGa=0.27427779349540243,
SeSr=-0.3745109805057823, CSi=0.669023778644634, AgCl=-0.04279728149250233,
AgI=0.03692542249419624, AgF=-0.15375768499313544, ClCs=-0.1503461689991465,
Sn2=0.016963900503544026, ClCu=0.15625872520000064, IRb=-0.16720145498980848)
df_atomic_features['target'] = df_atomic_features['chemical_formula'].map(dict_delta_e)
target = np.asarray(df_atomic_features['target'].values.astype(float))
cols_to_drop = ['chemical_formula', 'target', 'ordered_chemical_symbols']
# use the l1-l0 method proposed in Ghiringhelli et al. (2015)
calc_model(method='l1_l0', df_features=df_atomic_features, cols_to_drop=cols_to_drop,
target=target, max_dim=2, allowed_operations=allowed_operations,
tmp_folder=configs['io']['tmp_folder'], results_folder=configs['io']['results_folder'],
lookup_file=lookup_file, control_file=configs['io']['control_file'], energy_unit='eV',
length_unit='angstrom')
# read the results for the two-dimensional descriptor
viewer_filename = 'l1_l0_dim1_for_viewer.csv'
viewer_filepath = os.path.join(configs['io']['results_folder'], viewer_filename)
df_viewer = pd.read_csv(viewer_filepath)
x_axis_label, y_axis_label = read_control_file(configs['io']['control_file'])
# plot the results for the two-dimensional descriptor
fig, ax = plt.subplots()
x = df_viewer['coord_0']
y = df_viewer['coord_1']
color = df_viewer['y_true']
chemical_formula = df_viewer['chemical_formula']
cm = plt.cm.get_cmap('rainbow')
sc = plt.scatter(x, y, c=color, cmap=cm)
# annotate the points
for i, txt in enumerate(chemical_formula):
ax.annotate(txt, (x[i], y[i]), size=4)
plt.xlabel(x_axis_label)
plt.ylabel(y_axis_label)
cbar = plt.colorbar(sc)
cbar.set_label('Reference E(RS)-E(ZB)', rotation=90)
plt.title("l1/l0 structure map for binary compounds\n ")
plt.subplots_adjust(bottom=0.2)
plt.figtext(0.5, 0.02, "Compare with Fig. 2 in Ghiringhelli et al., Phys. Rev. Lett 114 (10), 105503 (2015)",
horizontalalignment='center', style='italic')
plt.savefig(materials_map_plot_file, dpi=300)
This is the plot showing the calculated energy differences between rocksalt and zincblende structures of the 82 octet binary AB materials used in Ref. [1] according to the two-dimensional descriptor found via the LASSO+l0 procedure:

Implementation details of how atomic features are automatically constructed can be found at
ai4materials.descriptors.atomic_features
. Implementation details of the LASSO+l0 method can be found at
ai4materials.wrappers.calc_model
and at ai4materials.models.l1_l0
.
Example classification: convolutional neural network for crystal-structure classification¶
This example shows how to load a dataset of crystal structures (represented by the diffraction fingerprint [2]), train a convolutional neural network on pristine (perfect) crystal structures, and use this neural network to predict the crystal class of highly defective crystal structures. This method - introduced in Ref. [2] - allows to correctly classify heavily defective crystal structures. In this particular case, even if 25% of the atoms were removed from each structure, the model still retains an accuracy of 100%.
The code below performs following steps:
- read the dataset from crystal-structure classification used in Ref. [2]
- train a convolutional neural network for crystal-structure classification using
ai4materials.models.cnn_nature_comm_ziletti2018.train_neural_network
- predict the class for each crystal structure using the neural network trained in in Ref. [2] using
ai4materials.models.cnn_nature_comm_ziletti2018.predict
from functools import partial
from ai4materials.utils.utils_config import set_configs
from ai4materials.dataprocessing.preprocessing import load_dataset_from_file
from ai4materials.models.cnn_architectures import cnn_nature_comm_ziletti2018
from ai4materials.models.cnn_nature_comm_ziletti2018 import load_datasets
from ai4materials.models.cnn_nature_comm_ziletti2018 import predict
from ai4materials.models.cnn_nature_comm_ziletti2018 import train_neural_network
from ai4materials.utils.utils_config import setup_logger
import numpy as np
import os
configs = set_configs()
logger = setup_logger(configs, level='DEBUG', display_configs=False)
dataset_folder = configs['io']['main_folder']
# =============================================================================
# Download the dataset from the online repository and load it
# =============================================================================
x_pristine, y_pristine, dataset_info_pristine, x_vac25, y_vac25, dataset_info_vac25 = load_datasets(dataset_folder)
train_set_name = 'pristine_dataset'
path_to_x_pristine = os.path.join(dataset_folder, train_set_name + '_x.pkl')
path_to_y_pristine = os.path.join(dataset_folder, train_set_name + '_y.pkl')
path_to_summary_pristine = os.path.join(dataset_folder, train_set_name + '_summary.json')
test_set_name = 'vac25_dataset'
path_to_x_vac25 = os.path.join(dataset_folder, test_set_name + '_x.pkl')
path_to_y_vac25 = os.path.join(dataset_folder, test_set_name + '_y.pkl')
path_to_summary_vac25 = os.path.join(dataset_folder, test_set_name + '_summary.json')
x_pristine, y_pristine, dataset_info_pristine = load_dataset_from_file(path_to_x_pristine, path_to_y_pristine,
path_to_summary_pristine)
x_vac25, y_vac25, dataset_info_vac25 = load_dataset_from_file(path_to_x_vac25, path_to_y_vac25,
path_to_summary_vac25)
# =============================================================================
# Train the convolutional neural network
# =============================================================================
# load the convolutional neural network architecture from Ziletti et al., Nature Communications 9, pp. 2775 (2018)
partial_model_architecture = partial(cnn_nature_comm_ziletti2018, conv2d_filters=[32, 32, 16, 16, 8, 8],
kernel_sizes=[3, 3, 3, 3, 3, 3], max_pool_strides=[2, 2],
hidden_layer_size=128)
# use x_train also for validation - this is only to run the test
results = train_neural_network(x_train=x_pristine, y_train=y_pristine, x_val=x_pristine, y_val=y_pristine,
configs=configs, partial_model_architecture=partial_model_architecture,
nb_epoch=1)
text_labels = np.asarray(dataset_info_vac25["data"][0]["text_labels"])[:100]
numerical_labels = np.asarray(dataset_info_vac25["data"][0]["numerical_labels"])[:100]
# =============================================================================
# Predict the crystal class of a material using the trained neural network
# =============================================================================
# load the convolutional neural network architecture from Ziletti et al., Nature Communications 9, pp. 2775 (2018)
# you can also use your own neural network to predict, passing it to the variable 'model'
results = predict(x_vac25, y_vac25, configs=configs, numerical_labels=numerical_labels,
text_labels=text_labels, model=None)
This is the confusion matrix obtained using the convolutional neural network to predict the class of structures with 25% of missing atoms:

The model has an accuracy of 100%, even in the presence of defects (25% atoms missing in this case).
The neural network’s training and prediction is performed with Keras. Implementation details on the convolutional neural network used can be found at
ai4materials.models.cnn_nature_comm_ziletti2018
.
[1] | (1, 2) L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article] |
[2] | (1, 2, 3, 4) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article] |
Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Submodules¶
ai4materials.models.clustering module¶
ai4materials.models.cnn_architectures module¶
-
ai4materials.models.cnn_architectures.
cnn_architecture_polycrystals
(learning_rate=0.0003, conv2d_filters=[32, 16, 8, 8, 16, 32], kernel_sizes=[3, 3, 3, 3, 3, 3], hidden_layer_size=64, n_rows=50, n_columns=32, nb_classes=5, dropout=0.125, img_channels=1)[source]¶ Deep convolutional neural network model for crystal structure recognition.
This neural network architecture was used to classify crystal structures - represented by the three-dimensional diffraction fingerprint - in Ref. [1].
[1] A. Ziletti et al., “Automatic structure identification in polycrystals via Bayesian deep learning”, in preparation (2018) Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.models.cnn_architectures.
cnn_nature_comm_ziletti2018
(conv2d_filters, kernel_sizes, max_pool_strides, hidden_layer_size, n_rows, n_columns, img_channels, nb_classes)[source]¶ Deep convolutional neural network model for crystal structure recognition.
This neural network architecture was used to classify crystal structures - represented by the two-dimensional diffraction fingerprint - in Ref. [2]
[2] A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning”, Nature Communications, vol. 9, pp. 2775 (2018) Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
ai4materials.models.cnn_nature_comm_ziletti2018 module¶
ai4materials.models.cnn_polycrystals module¶
-
ai4materials.models.cnn_polycrystals.
predict
(x, y, configs, numerical_labels, text_labels, nb_classes=3, results_file=None, model=None, batch_size=32, conf_matrix_file=None, verbose=1, with_uncertainty=True, mc_samples=50, consider_memory=True, max_length=1000000.0)[source]¶
-
ai4materials.models.cnn_polycrystals.
predict_with_uncertainty
(data, model, model_type='classification', n_iter=1000)[source]¶ This function allows to calculate the uncertainty of a neural network model using dropout.
This follows Chap. 3 in Yarin Gal’s PhD thesis: http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf
- We calculate the uncertainty of the neural network predictions in the three ways proposed in Gal’s PhD thesis,
- as presented at pag. 51-54:
- variation_ratio: defined in Eq. 3.19
- predictive_entropy: defined in Eq. 3.20
- mutual_information: defined at pag. 53 (no Eq. number)
-
ai4materials.models.cnn_polycrystals.
reshape_images
(images, target_shape)[source]¶ Reshape images according to the target shape
-
ai4materials.models.cnn_polycrystals.
train_neural_network
(x_train, y_train, x_val, y_val, configs, partial_model_architecture, batch_size=32, nb_epoch=5, normalize=True, checkpoint_dir=None, neural_network_name='my_neural_network', training_log_file='training.log', early_stopping=False, data_augmentation=True)[source]¶ Train a neural network to classify crystal structures represented as two-dimensional diffraction fingerprints.
This model was introduced in [1].
x_train: np.array, [batch, width, height, channels]
[1] A. Ziletti, A. Leitherer, M. Scheffler, and L. M. Ghiringhelli, “Crystal-structure identification via Bayesian deep learning: towards superhuman performance”, in preparation (2018) Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
ai4materials.models.embedding module¶
ai4materials.models.l1_l0 module¶
-
ai4materials.models.l1_l0.
choose_atomic_features
(selected_feature_list=None, atomic_data_file=None, binary_data_file=None)[source]¶ Choose primary features for the extended lasso procedure.
-
ai4materials.models.l1_l0.
classify_rs_zb
(structure)[source]¶ Classify if a structure is rocksalt of zincblend from a list of NoMaD structure. (one json file). Supports multiple frames (TO DO: check that). Hard-coded.
rocksalt: atom_frac1 0.0 0.0 0.0 atom_frac2 0.5 0.5 0.5
zincblende: atom_frac1 0.0 0.0 0.0 atom_frac2 0.25 0.25 0.25
zincblende –> label=0 rocksalt –> label=1
-
ai4materials.models.l1_l0.
combine_features
(df=None, energy_unit=None, length_unit=None, metadata_info=None, allowed_operations=None, derived_features=None)[source]¶ Generate combination of features given a dataframe and a list of allowed operations.
For the exponentials, we introduce a characteristic energy/length converting the ..todo:: Fix under/overflow errors, and introduce handling of exceptions.
-
ai4materials.models.l1_l0.
e_sqrt_z
(row)[source]¶ Calculates e/sqrt(val_Z).
Es/sqrt(Zval) and Ep/sqrt(Zval) from Phys. Rev. B 85, 104104 (2012). Input Es(A) or Ep(A), val(A) (A–>B) They need to be given in this order.
-
ai4materials.models.l1_l0.
get_energy_diff
(chemical_formula_list, energy_list, label_list)[source]¶ Obtain difference in energy (eV) between rocksalt and zincblend structures of a given binary.
From a list of chemical formulas, energies and labels returns a dictionary with {material: delta_e} where delta_e is the difference between the energy with label 1 and energy with label 0, grouped by material. Each element of such list corresponds to a json file. The delta_e is exactly what reported in the PRL 114, 105503(2015).
Todo
Check if it works for multiple frames.
-
ai4materials.models.l1_l0.
get_lowest_energy_structures
(structure, dict_delta_e)[source]¶ Get lowest energy structure for each material and label type.
Works only with two possible labels for a given material.
Todo
Check if it works for multiple frames.
-
ai4materials.models.l1_l0.
l1_l0_minimization
(y_true, D, features, energy_unit=None, print_lasso=False, lambda_grid=None, lassonumber=25, max_dim=3, lambda_grid_points=100, lambda_max_factor=1.0, lambda_min_factor=0.001)[source]¶ Select an optimal descriptor using a combined l1-l0 procedure.
- step (l 1): Solve the LASSO minimization problem
\[argmin_c {||P-Dc||^2 + \lambda |c|_1}\]for different lambdas, starting from a ‘high’ lambda. Collect all indices(Features) i appearing with nonzero coefficients c_i, while decreasing lambda, until size of collection equals lassonumber.
- step (l 0): Check the least-squares errors for all single features/pairs/triples/… of
- collection from 1. step. Choose the single/pair/triple/… with the lowest mean squared error (MSE) to be the best 1D/2D/3D-descriptor.
Parameters:
- y_true : array, [n_samples]
- Array with the target property (ground truth)
- D : array, [n_samples, n_features]
- Matrix with the data.
- features : list of strings
- List of feature names. Needs to be in the same order as the feature vectors in D
- dimrange : list of int
- Specify for which dimensions the optimal descriptor is calculated. It is the number of feature vectors used in the linear combination
- lassonumber : int, default 25
- The number of features, which will be collected in ther l1-step
- lamdba_grid_points : int, default 100
- Number of lamdbas between lamdba_max and lambdba_min for which the l1-problem shall be solved. Sometimes a denser grid could be needed, if the lamda-steps are too high. This can be checked with ‘print_lasso’. lamdba_max and lamdba_min are chosen as in Tibshirani’s paper “Regularization Paths for Generalized Linear Models via Coordinate Descent”. The values in between are generated on the log scale.
- lambda_min_factor : float, default 0.001
- Sets lam_min = lambda_min_factor * lam_max.
- lambda_max_factor : float, default 1.0
- Sets calculated lam_max = lam_max * lambda_max_factor.
- print_lasso: bool, default True
- Prints the indices of coulumns of D with nonzero coefficients for each lambda.
- lambda_grid: array
- The list/array of lambda values for the l1-problem can be chosen by the user. The list/array should start from the highest number and lambda_i > lamda_i+1 should hold. (?) lambda_grid_point is then ignored. (?)
Returns:
list of panda dataframes (D’, c’, selected_features) :
A list of tuples (D’,c’,selected_features) for each dimension. selected_features is a list of strings. D’*c’ is the selected linear model/fit where the last column of D is a vector with ones.References:
[1] Luca M. Ghiringhelli, Jan Vybiral, Sergey V. Levchenko, Claudia Draxl, and Matthias Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor” Phys. Rev. Lett. 114, 105503 (2015)
-
ai4materials.models.l1_l0.
r_pi
(row)[source]¶ Calculates r_pi.
John-Bloch’s indicator2: |rp(A) - rs(A)| +| rp(B) -rs(B)| from Phys. Rev. Lett. 33, 1095 (1974). Input rp(A), rs(A), rp(B), rs(B) They need to be given in this order. combine_features
-
ai4materials.models.l1_l0.
r_sigma
(row)[source]¶ Calculates r_sigma.
John-Bloch’s indicator1: |rp(A) + rs(A) - rp(B) -rs(B)| from Phys. Rev. Lett. 33, 1095 (1974).
Input rp(A), rs(A), rp(B), rs(B) They need to be given in this order.
-
ai4materials.models.l1_l0.
write_atomic_features
(structure, selected_feature_list, df, dict_delta_e=None, path=None, filename_suffix='.json', json_file=None)[source]¶ Given the chemical composition, build the descriptor made of atomic features only.
Includes all the frames in the same json file.
Todo
Check if it works for multiple frames.
ai4materials.models.sis module¶
-
class
ai4materials.models.sis.
SIS
(P, D, feature_list, feature_unit_classes=None, target_unit='eV', control=None, output_log_file='/home/beaker/.beaker/v1/web/tmp/output.log', rm_existing_files=False, if_print=True, check_only_control=False)[source]¶ Bases:
object
Python interface with the fortran SIS+(Sure Independent Screening)+L0/L1L0 code.
The SIS+(Sure Independent Screening)+L0/L1L0 is a greedy algorithm. It enhances the OMP, by considering not only the closest feature vector to the residual in each step, but collects the closest ‘n_SIS’ features vectors. The final model is then built after a given number of iterations by determining the (approximately) best linear combination of the collected features using the L0 (L1-L0) algorithm.
To execute the code, besides the SIS code parameters also folder paths are needed as well as account information of a remote machine to let the code be executed on it.
- P : array, [n_sample]; list; [n_sample]
- P refers to the target (label). If ptype = ‘quali’ list of ints is required
- D : array, [n_sample, n_features]
- D refers to the feature matrix. The SIS code calculates algebraic combinations of the features and then applies the SIS+L0/L1L0 algorithm.
- feature_list : list of strings
- List of feature names. Needs to be in the same order as the feature vectors (columns) in D. Features must consist of strings which are in F_unit (See above).
- feature_unit_classes : None or {list integers or the string: ‘no_unit’}
- integers correspond to the unit class of the features from feature_list. ‘no_unit’ is reserved for dimensionless unit.
- output_log_file : string
- file path for the logger output.
- rm_existing_files : bool
- If SIS_input_path on local or remote machine (remote_input_path) exists, it is removed. Otherwise it is renamed to SIS_input_path_$number.
- control : dict of dicts (of dicts)
- Dict tree: {
‘local_paths’: { ‘local_path’:str, ‘SIS_input_folder_name’:str}, (‘local_run’,’remote_run’) : (
{‘SIS_code_path’:str, ‘mpi_command’:str}, {‘SIS_code_path’:str, ‘username’:str, ‘hostname’:str, ‘remote_path’:str, ‘eos’:bool, ‘mpi_command’:str, ‘nodes’:int, (‘key_file’, ‘password’):(str,str)}), ‘parameters’ : {‘n_comb’:int, ‘n_sis’:int, ‘max_dim’:int, ‘OP_list’:list}, ‘advanced_parameters’ : {‘FC’:FC_dic,’DI’:DI_dic, ‘FCDI’:FCDI_dic}
} Here the tuples (.,.) mean that one and only one of the both keys has to be set. To see forms of FC_dic, DI_dic, FCDI_dic check FC_tuplelist, DI_tuplelist and FCDI_tuplelist above in PARAMETERS REFERENCE.
- start : -
- starts the code
- get_results : list [max_dim] of dicts {‘D’, ‘coefficients’, ‘P_pred’}
- get_results[model_dim-1][‘D’] : pandas data frame [n_sample, model_dim+1]
- Descriptor matrix with the columns being algebraic combinations of the input feature matrix. Column names are thus strings of the algebraic combinations of strings of inout feature_list. Last column is full of ones corresponding to the intercept
- get_results[model_dim-1][‘coefficients’] : array [model_dim+1]
- Optimizing coefficients.
- get_results[model_dim-1][‘P_pred’] : array [m_sample]
- Fit : np.dot( np.array(D), coefficients)
For remote_run the library nomad_sim.ssh_code is needed. If remote machine is eos, in dict control[‘remote_run’] the (key:value) ‘eos’:True has to be set. Then set for example in addition ‘nodes’:1 and ‘mpi_run -np 32’ can be set.
Paths (say name: path) are all set in the intialization part with self.path and used in other functions with self.path. In general the other variables are directly passed as arguements to the functions. There are a few exceptions as self.ssh.
# >>> import numpy as np # >>> from nomad_sim.SIS import SIS # >>> ### Specify where on local machine input files for the SIS fortran code shall be created # >>> Local_paths = { # >>> ‘local_path’ : ‘/home/beaker/’, # >>> ‘SIS_input_folder_name’ : ‘SIS_input’, # >>> } # >>> # Information for ssh connection. Instead of password also ‘key_file’ for rsa key # >>> # file path is possible. # >>> Remote_run = { # >>> ‘mpi_command’:’’, # >>> ‘remote_path’ : ‘/home/username/’, # >>> ‘SIS_code_path’ : ‘/home/username/SIS_code/’, # >>> ‘hostname’ :’hostname’, # >>> ‘username’ : ‘username’, # >>> ‘password’ : ‘XXX’ # >>> } # >>> # Parameters for the SIS fortran code. If at each iteration a different ‘OP_list’ # >>> # shall be used, set a list of max_dim lists, e.g. [ [‘+’,’-‘,’*’], [‘/’,’*’] ], if # >>> # n_comb = 2 # >>> Parameters = { # >>> ‘n_comb’ : 2, # >>> ‘OP_list’ : [‘+’,’|-|’,’-‘,’*’,’/’,’exp’,’^2’], # >>> ‘max_dim’ : 2, # >>> ‘n_sis’ : 10 # >>> } # >>> # Final control dict for the SIS class. Instead of remote_run also local_run can be set # >>> # (with different keys). Also advanced_parameters can be set, but should be done only # >>> # if the parameters of the SIS fortran code are understood. # >>> SIS_control = {‘local_paths’:Local_paths, ‘remote_run’:Remote_run, ‘parameters’:Parameters} # >>> # Target (label) vector P , feature_list, feature matrix D. The values are made up. # >>> P = np.array( [1,2,3,-2,-9] ) # >>> feature_list=[‘r_p(A)’,’r_p(B)’, ‘Z(A)’] # >>> D = np.array([[7,-11,3], # >>> [-1,-2,4], # >>> [2,20,3], # >>> [8,1,8], # >>> [-3,4,1]]) # >>> # Use the code # >>> sis = SIS(P,D,feature_list, control = SIS_control, output_log_file =’/home/ahmetcik/codes/beaker/output.log’) # >>> sis.start() # >>> results = sis.get_results() # >>> # >>> coef_1dim = results[0][‘coefficients’] # >>> coef_2dim = results[1][‘coefficients’] # >>> D_1dim = results[0][‘D’] # >>> D_2dim = results[1][‘D’] # >>> print coef_2dim # [-3.1514 -5.9171 3.9697] # >>> # >>> print D_2dim # ((rp(B)/Z(A))/(rp(A)+rp(B))) ((Z(A)/rp(B))/(rp(B)*Z(A))) intercept # 0 0.916670 0.008264 1.0 # 1 0.166670 0.250000 1.0 # 2 0.303030 0.002500 1.0 # 3 0.013889 1.000000 1.0 # 4 4.000000 0.062500 1.0 # #
-
ask_periodically
(sc, seconds, counter, username)[source]¶ Recursive function that runs periodically (each seconds) the function self.check_status.
-
check_FC
(file_path)[source]¶ Check FC.out, if calculation has finished and feature space_sizes.
- calc_finished : bool
- If calculation finished there shoul be a ‘Have a nice day !’.
- featurespace : integer
- Total feature space size generated, before the redundant check.
- n_collected : integer
- The number of features collected in the current iteration. Should be n_sis.
-
check_OP_list
(control)[source]¶ Checks form and items of control[‘parameters’][‘OP_list’].
control[‘parameters’][‘OP_list’] must be a list of operations strings or list of n_comb lists of operation strings. Furthermore if operation strings are item of available_OPs (see above) is checked.
control : dict
control : with manipulated control[‘parameters’][‘OP_list’]
-
check_arrays
(P_in, D, feature_list, feature_unit_classes, ptype)[source]¶ Check arrays/list P, D and feature_list
-
check_control
(par_in, par_ref, par_in_path)[source]¶ Recursive Function to check input control dict tree.
If for example check_control(control,control_ref,’control’) function goes through dcit tree control and compares with control_ref if correct keys (mandotory, not_mandotory, typos of key string) are set and if values are of correct type or of optional list. Furthermore it gives Errors with hints what is wrong, and what is needed.
- par_in : any key
- if par_in is dict, then recursion.
- par_ref: any key
- Is compared to par_in, if of same time. If par_in and par_key are dict, alse keys are compared.
- par_in_path: string
- Gives the dict tree path where, when error occurs, e.g. control[key_1][key_2]… For using function from outside start with name of input dict, e.g. ‘control’
-
check_feature_units
(feature_unit_classes)[source]¶ Check feature units
Checks which
- feature_unit_classes : list integers
- list must be sorted.
- unit_strings : list of strings
- In the form [‘(1:3)’,’(4:8)’,..], where the indices start from 1,
-
check_files
(iter_folder_name, dimension)[source]¶ Check which file is missing and maybe why.
This function, if something went wrong to find out where the problem occured. Returns an error string.
-
check_keys
(par_in, par_ref, par_in_path)[source]¶ Compares the dicts par_in and par_ref.
- Collects which keys are missing (only if keys are not in not_mandotary) amd
- whcih keys are not expected (if for example there is a typo).
If there are missing or not expected ones, error message with missing/not expected ones.
par_in : dict
par_ref : dict
- par_in_path : string
- Dictionary path string for error message, e.g ‘control[key_1][key_2]’.
-
check_l0_steps
(max_dim, n_sis, upper_limit=10000)[source]¶ Check if number of l0 steps is larger then a upper_limit
-
check_status
(filename, username)[source]¶ Check if calculation on eos is finished
Parameters filename: str
qstat will be written into this file. The file will be then read.- username: str
- search in filename for this username. If not appears calculation is finished.
- status : bool
- True if calculations is still running.
-
check_type
(par_in, par_ref, par_in_path, if_also_none=False)[source]¶ Check type of par_in and par_ref.
If par_ref is tuple, par_in must be item of par_ref: else: they must have same type.
-
convert_2_fortran
(parameter, parameter_value)[source]¶ Convert parameters to SIS fortran code style.
Converts e.g. True to string ‘.true.’ or a string ‘s’ to “‘s’”, and other special formats. Returns the converted parameter.
-
convert_feature_strings
(feature_list)[source]¶ Convert feature strings.
Puts an ‘sr’ for reals and an ‘si’ for integers at the beginning of a string. Returns the list with the changed strings.
-
do_transfer
(ssh=None, eos=None, username=None, CPUs=None)[source]¶ Run the calcualtion on remote machine
First checks if already folder self.remote_input_path exists on remote machine, if yes it deletes or renames it. Then copies file system self.SIS_input_path with SIS fortran code files into the folder self.remote_input_path. Finally lets run the calculations on remote machine and copy back the file system with results. If eos, writes submission script, submits script and checks qstat if calculation finished.
- ssh : object
- Must be from code nomad_sim.ssh_code.
- eos : bool
- If remote machine is eos. To write submission script and submit …
- username: string
- needed to check qstat on eos
- CPUs : int
- To reserve the write number of CPUs in the eos submission script
-
flatten
(list_in)[source]¶ Returns the list_in collapsed into a one dimensional list
list_in : list/tuple of lists/tuples of …
-
get_des
(x)[source]¶ Change the descriptor strings read from the output DI.out. Remove characters as ‘:’ ‘si’, ‘sr’. Then convert feature strings for printing
-
get_results
(ith_descriptor=0)[source]¶ Attribute to get results from the file system.
- ith_descriptor: int
- Return the ith best descriptor.
out : list [max_dim] of dicts {‘D’, ‘coefficients’, ‘P_pred’}
- out[model_dim-1][‘D’] : pandas data frame [n_sample, model_dim+1]
- Descriptor matrix with the columns being algebraic combinations of the input feature matrix. Column names are thus strings of the algebraic combinations of strings of inout feature_list. Last column is full of ones corresponding to the intercept
- out[model_dim-1][‘coefficients’] : array [model_dim+1]
- Optimizing coefficients.
- out[model_dim-1][‘P_pred’] : array [m_sample]
- Fit : np.dot( np.array(D) , coefficients)
-
get_value_from_dic
(dictionary, key_tree_path)[source]¶ Returns value of the dict tree
- dictionary: dict or ‘dict tree’ as control_ref
- dict_tree is when key is tuple of keys and value is tuple of corresponding values.
- key_tree_path: list of keys
- Must be in the correct order beginning from the top of the tree/dict.
# Examples # ——– # >>> print get_value_from_dic[control_ref, [‘local_run’,’SIS_code_path’]] # <type ‘str’>
-
read_results
(iter_folder_name, dimension, task, tsizer)[source]¶ Read results from DI.out.
- iter_folder : string
- Name of the iter_folder the outputs of the corresponding iteration of SIS+l1/l1l0, e.g. ‘iter01’, ‘iter02’.
- dimension : integer
- DI.out provides for example in iteration three 1-3 dimensionl descriptors. Here choose which dimension should be returned.
- task : integer < 100
- For multi task, must be worked on.
- tsizer : integer
- Number of samples, e.g. number ofrows of D or P.
- RMSE : float
- Root means squares error of model
- Des : list of strings
- List of the descriptors
- coef : array [model_dim+1]
- Coefficients including the intercept
- D : array [n_sample, model_dim+1]
- Matrix with columns being the selected features (descriptors) for the model. The last column is full of ones corresponding to the intercept
-
read_results_quali
()[source]¶ Read results for 2D desriptor from calculations with qualitative run.
- results: list of lists
- Each sublist characterizes separate model (if multiple model have same score/cost all of them are returned). Sublist contains [descriptor_strings, D, n_overlap] where D (D.shape = (n_smaple,2)) is array with descriptor vectors.
-
set_SIS_parameters
(desc_dim=2, subs_sis=100, rung=1, opset=['+', '-', '/', '^2', 'exp'], ptype='quanti', advanced_parameters=None)[source]¶ Set the SIS fortran code parameters
If advanced parameters is passed, they will be used, otherwise default values will be used. Also max_dim, n_sis, n_comb, and OP_list can be overwritten by advanced_parameters if specified.
-
set_local_run
(SIS_code_path='~/codes/SIS_code/', mpi_command='')[source]¶ Set and check local enviroment if local_run is used.
-
set_main_settings
(P, D, feature_list, feature_unit_classes, local_path='/home/beaker/', SIS_input_folder_name='input_folder')[source]¶ Set local environment and P, D and feature_list.
-
set_ssh_connection
(hostname=None, username=None, port=22, key_file=None, password=None, remote_path=None, SIS_code_path=None, eos=False, nodes=1, mpi_command='')[source]¶ Set ssh connection. Set and check remote enviroment if remote_run is used.
-
string_descriptor
(RMSE, features, coefficients, target_unit)[source]¶ Make string for output in the terminal with model and its RMSE.
-
write_P_D
(P, D, feature_list)[source]¶ Writes ‘train.dat’ as SIS fortran code input with P, D and feature strings
-
ai4materials.models.sis.
converted_2_standard
= {'disA': 'd(A)', 'disAB': 'd(AB)', 'disB': 'd(B)', 'eaA': 'EA(A)', 'eaB': 'EA(B)', 'ebA': 'E_b(A)', 'ebAB': 'E_b(AB)', 'ebB': 'E_b(B)', 'hlgapA': 'HL_gap(A)', 'hlgapAB': 'HL_gap(AB)', 'hlgapB': 'HL_gap(B)', 'homoA': 'E_HOMO(A)', 'homoB': 'E_HOMO(B)', 'ipA': 'IP(A)', 'ipB': 'IP(B)', 'lumoA': 'E_LUMO(A)', 'lumoB': 'E_LUMO(B)', 'periodA': 'period(A)', 'periodB': 'period(B)', 'rdA': 'r_d(A)', 'rdB': 'r_d(B)', 'rpA': 'r_p(A)', 'rpB': 'r_p(B)', 'rpiAB': 'r_pi(AB)', 'rsA': 'r_s(A)', 'rsB': 'r_s(B)', 'rsigmaAB': 'r_sigma(AB)', 'valA': 'Z_val(A)', 'valB': 'Z_val(B)', 'zA': 'Z(A)', 'zB': 'Z(B)'}¶ Set logger for outputs as errors, warnings, infos.
ai4materials.models.strided_pattern_matching module¶
Module contents¶
Neural network interpretation¶
Understanding why a machine learning algorithm arrives at the classification decision is of paramount importance, especially in the natural sciences. For deep learning models this is particularly challenging because of their tendency to represent information in a highly distributed manner, and the presence of non-linearities in the network’s layers.
Here we provide a materials science use case of interpretable machine learning for crystal-structure classification from Ziletti et al. (2018) [1].
Example: attentive response maps in deep-learning-driven crystal recognition¶
This example shows how to identify the regions in the image that are the most important in the neural network’s classification decision. In particular, attentive response maps are calculated using the fractionally strided convolutional technique introduced by Zeiler and Fergus (2014) [2], and applied for the first time in materials science by Ziletti et al. (2018) [1].
The steps performed in the code below are the following:
- define the folders where the results are going to be saved
- build four crystal structures (bcc, fcc, diam, hcp) using the ASE package
- create a pristine supercell using the function
ai4materials.utils.utils_crystals.create_supercell
- calculate the two-dimensional diffraction fingerprint for all four crystal structures (a RGB image) with from
ai4materials.descriptors.diffraction2d.Diffraction2D
- obtain the attentive response maps for each diffraction fingerprints with
ai4materials.interpretation.deconv_resp_maps.plot_att_response_maps
. These identify the parts of the image that are more important in the classification decision.
from ase.spacegroup import crystal
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.interpretation.deconv_resp_maps import plot_att_response_maps
from ai4materials.utils.utils_config import get_data_filename
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import create_supercell
import numpy as np
import os.path
# set configs
configs = set_configs(main_folder='./nn_interpretation_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)
# setup folder and files
# checkpoint_folder = os.path.join(configs['io']['main_folder'], 'saved_models')
figure_folder = os.path.join(configs['io']['main_folder'], 'attentive_resp_maps')
# build crystal structures
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
bcc_fe = crystal('Fe', [(0, 0, 0)], spacegroup=229, cellpar=[2.87, 2.87, 2.87, 90, 90, 90])
diamond_c = crystal('C', [(0, 0, 0)], spacegroup=227, cellpar=[3.57, 3.57, 3.57, 90, 90, 90])
hcp_mg = crystal('Mg', [(1. / 3., 2. / 3., 3. / 4.)], spacegroup=194, cellpar=[3.21, 3.21, 5.21, 90, 90, 120])
# create supercells - pristine
fcc_al_supercell = create_supercell(fcc_al, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell = create_supercell(bcc_fe, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell = create_supercell(diamond_c, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell = create_supercell(hcp_mg, target_nb_atoms=128, cell_type='standard_no_symmetries')
ase_atoms_list = [fcc_al_supercell, bcc_fe_supercell, diamond_c_supercell, hcp_mg_supercell]
# calculate the two-dimensional diffraction fingerprint for all four structures
descriptor = Diffraction2D(configs=configs)
diffraction_fingerprints_rgb = [descriptor.calculate(ase_atoms).info['descriptor']['diffraction_2d_intensity'] for ase_atoms in ase_atoms_list]
model_weights_file = get_data_filename('data/nn_models/ziletti_et_2018_rgb.h5')
model_arch_file = get_data_filename('data/nn_models/ziletti_et_2018_rgb.json')
# convert list of diffraction fingerprint images to to numpy array
# images needs to be a numpy array with shape (n_images, dim1, dim2, channels)
images = np.asarray(diffraction_fingerprints_rgb)
plot_att_response_maps(images, model_arch_file, model_weights_file, figure_folder, nb_conv_layers=6, nb_top_feat_maps=4,
layer_nb='all', plot_all_filters=False, plot_filter_sum=True, plot_summary=True)
In each image below we show:
- (left) original image to be classified corresponding to the two-dimensional diffraction fingerprint of a given structure
- (center) attentive response maps from the top four most activated filters (red channel) for the diffraction fingerprint. The brighter the pixel, the most important is that location for classification
- (right) sum of the last convolutional layer attentive response maps
for the case of a face-centered-cubic structure:

and a body-centered-cubic structure:

From the attentive response maps (center), we notice that the convolutional neural network filters are composed in a hierarchical fashion, increasing their complexity from one layer to another. At the third convolutional layer, the neural network discovers that the diffraction peaks, and their relative arrangement, are the most effective way to predict crystal classes (as a human expert would do). Furthermore, from the sum of the last convolutional layer attentive response maps, we observe that the neural network learned crystal templates automatically from the data.
[1] | (1, 2) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article] |
[2] | D. M. Zeiler, and R. Fergus, “Visualizing and understanding convolutional networks,” European Conference on Computer Vision, Springer. pp. 818, 2014. [Link to article] |
Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Submodules¶
ai4materials.interpretation.deconv_resp_maps module¶
-
class
ai4materials.interpretation.deconv_resp_maps.
DeconvNet
(model)[source]¶ Bases:
object
DeconvNet class. Code taken from: https://github.com/tdeboissiere/DeepLearningImplementations/blob/master/DeconvNet/KerasDeconv.py
-
ai4materials.interpretation.deconv_resp_maps.
deconv_visualize
(model, target_layer, input_data, nb_top_feat_maps)[source]¶ Obtain attentive response maps back-projected to image space using transposed convolutions (sometimes referred as deconvolutions in machine learning).
Parameters:
- model: instance of the Keras model
- The ConvNet model to be used.
- target_layer: str
- Name of the layer for which we want to obtain the attentive response maps. The names of the layers are defined
in the Keras instance
model
. - input_data: ndarray
- The image data to be passed through the network. Shape: (n_samples, n_channels, img_dim1, img_dim2)
- nb_top_feat_maps: int
- Top-n filter you want to visualize, e.g. nb_top_feat_maps = 25 will visualize top 25 filters in target layer
Code author: Devinder Kumar <d22kumar@uwaterloo.ca>
-
ai4materials.interpretation.deconv_resp_maps.
get_deconv_imgs
(img_index, data, dec_layer, target_layer, feat_maps)[source]¶ Return the attentive response maps of the images specified in img_index for the target layer and feature maps specified in the arguments.
Parameters:
- img_index: list or ndarray
- Array or list of index. These are the indices of the images (contained in
data
) for which we want to obtain the attentive response maps. - data: ndarray
- The image data. Shape : (n_samples, n_channels, img_dim1, img_dim2)
- Dec: instance of class
ai4materials.interpretation.deconv_resp_maps.DeconvNet
- DeconvNet model: instance of the DeconvNet class
- target_layer: str
- Name of the layer for which we want to obtain the attentive response maps. The names of the layers are defined in the Keras instance
model
. - feat_map: int
- Index of the attentive response map to visualise.
Code author: Devinder Kumar <d22kumar@uwaterloo.ca>
-
ai4materials.interpretation.deconv_resp_maps.
get_max_activated_filter_for_layer
(target_layer, model, input_data, nb_top_feat_maps, img_index)[source]¶ Find the indices of the most activated filters for a given image in the specified target layer of a Keras model.
Parameters:
- target_layer: str
- Name of the layer for which we want to obtain the attentive response maps. The names of the layers are defined in the Keras instance
model
. - model: instance of the Keras model
- The ConvNet model to be used.
- input_data:
- input_data: ndarray The image data to be passed through the network. Shape: (n_samples, n_channels, img_dim1, img_dim2)
- nb_top_feat_maps:
- Number of the top attentive response maps to be calculated and plotted. It must be <= to the minimum number of filters used in the neural network layers. This is not checked by the code, and respecting this criterion is up to the user.
- img_index: list or ndarray
- Array or list of index. These are the indices of the images (contained in
data
) for which we want to obtain the attentive response maps. - Returns: list of int
- List containing the indices of the filters with the highest response (activation) for the given image.
Code author: Devinder Kumar <d22kumar@uwaterloo.ca>
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.interpretation.deconv_resp_maps.
load_model
(model_arch_file, model_weights_file)[source]¶ Load Keras model from .json and .h5 files
-
ai4materials.interpretation.deconv_resp_maps.
plot_att_response_maps
(data, model_arch_file, model_weights_file, figure_dir, nb_conv_layers, layer_nb='all', nb_top_feat_maps=4, filename_maps='attentive_response_maps', cmap=<matplotlib.colors.LinearSegmentedColormap object>, plot_all_filters=False, plot_filter_sum=True, plot_summary=True)[source]¶ Plot attentive response maps given a Keras trained model and input images.
Parameters:
- data: ndarray, shape (n_images, dim1, dim2, channels)
- Array of input images that will be used to calculate the attentive response maps.
- model_arch_file: string
- Full path to the model architecture file (.json format) written by Keras after the neural network training. This is used by the load_model function to load the neural network architecture.
- model_weights_file: string
- Full path to the model weights file (.h5 format) written by Keras after the neural network training . This is used by the load_model function to load the neural network architecture.
- figure_dir: string
- Full path of the directory where the images resulting from the transposed convolution procedure will be saved.
- nb_conv_layers: int
- Numbers of Convolution2D layers in the neural network architecture.
- layer_nb: list of int, or ‘all’
- List with the layer number which will be deconvolved starting from 0. E.g. layer_nb=[0, 1, 4] will deconvolve the 1st, 2nd, and 5th convolution2d layer. Only up to 6 conv_2d layers are supported. If ‘all’ is selected, all conv_2d layers will be deconvolved, up to nb_conv_layers.
- nb_top_feat_maps: int
- Number of the top attentive response maps to be calculated and plotted. It must be <= to the minimum number of filters used in the neural network layers. This is not checked by the code, and respecting this criterion is up to the user.
- filename_maps: str
- Base filename (without extension and path) of the files where the attentive response maps will be saved.
- cmap: Matplotlib cmap, optional, default=`cm.hot`
- Type of coloring for the heatmap, if images are greyscale.
Possible cmaps can be found here:
https://matplotlib.org/examples/color/colormaps_reference.html
If images are RGB, then an RGB color map is used.
The RGB colormap can be found at
ai4materials.utils.utils_plotting.rgb_colormaps
. - plot_all_filters: bool
- If True, plot and save the nb_top_feat_maps for each layer. The files will be saved in different folders according to the layer: - “convolution2d_1” for the 1st layer - “convolution2d_2” for the 2nd layer etc.
- plot_filter_sum: bool
- If True, plot and save the sum of all the filters for a given layer.
- plot_summary: bool
- If True, plot and save a summary figure containing: (left) input image (center) nb_top_feat_maps filters for each deconvolved layer (right) sum of the all filters of the last layer If set to True, also plot_filter_sum must be set to True.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
Module contents¶
Visualization¶
The ai4materials Viewer combines Bokeh (interactive visualization of large dataset) and jsmol (3D visualization of chemical structures) to allow the interactive exploration of materials science datasets. Users can visualize crystal structures and properties of - a possibly large number of - materials in one webpage, interactively.
Below we present an example of how to create an interactive Viewer using ai4materials. The code below allows to generate an interactive plot of the results of Ref. [1], in particular Fig. 2 in the article.
from ai4materials.visualization.viewer import Viewer
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_data_retrieval import read_ase_db
from ai4materials.visualization.viewer import read_control_file
from ai4materials.utils.utils_config import get_data_filename
import pandas as pd
import webbrowser
# read data: crystal structures, information on the plot, name of the axis
ase_db_file_binaries = get_data_filename('data/db_ase/binaries_lowest_energy_ghiringhelli2015.json')
results_binaries_lasso = get_data_filename('data/viewer_files/l1_l0_dim2_for_viewer.csv')
control_file_binaries = get_data_filename('data/viewer_files/binaries_control.json')
configs = set_configs()
ase_atoms_binaries = read_ase_db(db_path=ase_db_file_binaries)
# from the table, extract the coordinates for the plot, the true and the predicted value
df_viewer = pd.read_csv(results_binaries_lasso)
x = df_viewer['coord_0']
y = df_viewer['coord_1']
target = df_viewer['y_true']
target_pred = df_viewer['y_pred']
# define titles in the plot
legend_title = 'Reference E(RS)-E(ZB)'
target_name = 'E(RS)-E(ZB)'
plot_title = 'SISSO(L0) structure map'
# create an instance if the ai4materials Viewer
viewer = Viewer(configs=configs)
# read x and y axis labels from control file
x_axis_label, y_axis_label = read_control_file(control_file_binaries)
# generate interactive plot
file_html_link, file_html_name = viewer.plot_with_structures(x=x, y=y, target=target, target_pred=target_pred,
ase_atoms_list=ase_atoms_binaries, target_unit='eV',
target_name=target_name, legend_title=legend_title,
is_classification=False, x_axis_label=x_axis_label,
y_axis_label=y_axis_label, plot_title=plot_title,
tmp_folder=configs['io']['tmp_folder'])
# open the interactive plot in a web browser
webbrowser.open(file_html_name)
This is a screenshot of the interactive ai4materials Viewer generated with the code above:

Implementation details of the ai4materials Viewer can be found at ai4materials.visualization.viewer
.
In some systems, Google Chrome will not correctly to load jmol, so you will be able to load the interactive plot,
but not to explore crystal structures in 3D.
To exploit all functionalities of the ai4materials Viewer, we recommend to use Firefox; in particular, the Viewer
was tested on Firefox Quantum 61.0.1.
[1] | L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article] |
Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Module contents¶
Utils¶
This package contains utility functions for data analytics applied to materials science data. Specifically,
ai4materials.utils.utils_config
: functions to set up useful parameters for calculations.ai4materials.utils.utils_crystals
: functions related to crystal structuresai4materials.utils.utils_data_retrieval
: functions to retrieve data.ai4materials.utils.utils_mp
: functions for parallel execution.ai4materials.utils.utils_plotting
: functions for plotting results of modelling.ai4materials.utils.utils_vol_data
: functions to deal with three-dimensional volumetric data.
Utils crystals¶
This package contains functions to build pristine and defective supercells, starting from ASE (Atomistic Simulation Environment) Atoms object [link]. It also allows to obtain the spacegroup of a given structure, or to get the standard conventional cell (using Pymatgen).
Pristine and defective supercell generation¶
The main functions available to modify crystal structures are:
ai4materials.utils.utils_crystals.create_supercell
creates a pristine supercell starting from a given atom structure.ai4materials.utils.utils_crystals.random_displace_atoms
creates a supercell with randomly displace atoms.ai4materials.utils.utils_crystals.create_vacancies
creates a supercell with vacancies.ai4materials.utils.utils_crystals.substitute_atoms
creates a supercell with randomly substitute atoms.
For additional details on each function, see their respective descriptions below.
Example: pristine supercell creation¶
Starting from a given ASE structure, the script below uses ai4materials.utils.utils_crystals.create_supercell
to generate a supercell of (approximately) 128 atoms:
from ase.io import write
from ase.build import bulk
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
from ai4materials.utils.utils_crystals import create_supercell
cu_fcc = bulk('Cu', 'fcc', a=3.6, orthorhombic=True)
supercell_cu_fcc = create_supercell(cu_fcc, create_replicas_by='nb_atoms', target_nb_atoms=128)
write('cu_fcc.png', cu_fcc)
write('cu_fcc_supercell.png', supercell_cu_fcc)
This is the original structure:

and this is the supercell obtained replicating the unit cells up to a target number of atoms (target_nb_atoms
)

Example: defective supercell creation¶
Starting from a given ASE structure, the script below uses ai4materials.utils.utils_crystals.create_vacancies
to generate a defective supercell of (approximately) 128 atoms with 25% vacancies:
from ase.io import write
from ase.build import bulk
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
from ai4materials.utils.utils_crystals import create_vacancies
cu_fcc = bulk('Cu', 'fcc', a=3.6, orthorhombic=True)
supercell_vac25_cu_fcc = create_vacancies(cu_fcc, target_vacancy_ratio=0.25, create_replicas_by='nb_atoms', target_nb_atoms=128)
write('cu_fcc.png', cu_fcc)
write('cu_fcc_supercell_vac25.png', supercell_vac25_cu_fcc)

Similarly, it is possible to generate a supercell with randomly displaced atoms with
ai4materials.utils.utils_crystals.random_displace_atoms
.
In the script below,
we generate a defective supercell of (approximately) 200 atoms with displacements sampled from a Gaussian
distribution with standard deviation of 0.5 Angstrom:
from ase.io import write
from ase.build import bulk
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
from ai4materials.utils.utils_crystals import random_displace_atoms
cu_fcc = bulk('Cu', 'fcc', a=3.6, orthorhombic=True)
supercell_rand_disp_cu_fcc = random_displace_atoms(cu_fcc, displacement=0.5, create_replicas_by='nb_atoms', noise_distribution='gaussian', target_nb_atoms=256)
write('cu_fcc.png', cu_fcc)
write('supercell_rand_disp_cu_fcc_05A.png', supercell_rand_disp_cu_fcc)

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>
Submodules¶
ai4materials.utils.unit_conversion module¶
Module for unit conversion routines. Currently uses the Pint unit conversion library (https://pint.readthedocs.org) to do the conversions.
Any new units and constants can be added to the text files “units.txt” and “constants.txt”.
NOTE: this is taken from python-common in nomad-lab-base. It is copied here to remove the dependency from nomad-lab-base. For more info on python-common visit: https://gitlab.mpcdf.mpg.de/nomad-lab/python-common
The author of this code is: Dr. Fawzi Roberto Mohamed E-mail: mohamed@fhi-berlin.mpg.de
-
class
ai4materials.utils.unit_conversion.
LazyF
(unit, target_unit)[source]¶ Bases:
future.types.newobject.newobject
helper class for lazy evaluation of conversion function
-
ai4materials.utils.unit_conversion.
convert_unit
(value, unit, target_unit=None)[source]¶ Converts the given value from the given units to the target units. For examples see the bottom section.
- Args:
- value: The numeric value to be converted. Accepts integers, floats,
- lists and numpy arrays
- unit: The units that the value is currently given in as a string. All
- units that have a corresponding declaration in the “units.txt” file and combinations like “meter*second**-2” are supported.
- target_unit: The target unit as string. Same rules as for the unit
- argument. If this argument is not given, SI units are assumed.
- Returns:
- The given value in the target units. returned as the same data type as the original values.
Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>
-
ai4materials.utils.unit_conversion.
convert_unit_function
(unit, target_unit=None)[source]¶ Returns a function that converts scalar floats from unit to target_unit if any of the unit are user defined (usr*), then the conversion is done lazily at the first call (i.e. user defined conversions might be undefined when calling this)
For more details see the convert_unit function. Could be optimized a bit caching the pint quantities
- Args:
- unit: The units that the value is currently given in as a string. All
- units that have a corresponding declaration in the “units.txt” file and combinations like “meter*second**-2” are supported.
- target_unit: The target unit as string. Same rules as for the unit
- argument. If this argument is not given, SI units are assumed.
- Returns:
- The given value in the target units. returned as the same data type as the original values.
Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>
-
ai4materials.utils.unit_conversion.
convert_unit_function_immediate
(unit, target_unit=None)[source]¶ Returns a function that converts scalar floats from unit to target_unit All units need to be already known.
For more details see the convert_unit function. Could be optimized a bit caching the pint quantities
- Args:
- unit: The units that the value is currently given in as a string. All
- units that have a corresponding declaration in the “units.txt” file and combinations like “meter*second**-2” are supported.
- target_unit: The target unit as string. Same rules as for the unit
- argument. If this argument is not given, SI units are assumed.
- Returns:
- The given value in the target units. returned as the same data type as the original values.
Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>
ai4materials.utils.utils_binaries module¶
-
ai4materials.utils.utils_binaries.
get_binaries_dict_delta_e
(chemical_formula_list, energy_list, label_list, equiv_spgroups)[source]¶
-
ai4materials.utils.utils_binaries.
get_energy_diff_by_spacegroup
(ase_atoms_list, target='energy_total', equiv_spgroups=None)[source]¶
-
ai4materials.utils.utils_binaries.
get_target_diff_dic
(df, sample_key=None, energy=None, spacegroup=None)[source]¶ Get a dictionary of dictionaries: samples -> space group tuples -> energy differences.
Dropping all rows which do not correspond to the minimum energy per sample AND space group, then making a new data frame with space groups as columns. Finally constructing the dictionary of dictionaries.
Parameters:
- df: pandas data frame
- with columns=[samples_title, energies_title, SG_title]
- sample_key: string
- Needs to be column title of samples of input df
- energy: string
- Needs to be column title of energies of input df
- spacegroup : string
- Needs to be column title of space groups of input df
Returns:
- dic_out: dictionary of dictionaries:
- In the form: { sample_a: { (SG_1,SG_2):E_diff_a12, (SG_1,SG_3):E_diff_a13,…}, sample_b: { (SG_1,SG_2):E_diff_b12, (SG_1,SG_3):E_diff_b13,… }, … } E_diff_a12 = energy_SG_1 - energy_SG_2 of sample a. Both (SG_1,SG_2) and (SG_2,SG_1) are considered. If SG_1 or SG_2 is NaN, energy difference to it is ignored.
-
ai4materials.utils.utils_binaries.
select_diff_from_dic
(dic, spacegroup_tuples, sample_key='Mat', drop_nan=None)[source]¶ Get data frame of selected spacegroup_tuples from dictionary of dictionaries.
Creating a pandas data frame with columns of samples and selected space group tuples (energy differnces).
Parameters:
dic: dict {samples -> space group tuples -> energy differences.}
- spacegroup_tuples: tuple, list of tuples, tuples of tuples
- Each tuple has to contain two space groups numbers, to be looked up in the input dic.
- sample_key: string
- Will be the column title of the samples of the created data frame
- drop_nan: string, optional {‘rows’, ‘SG_tuples’}
- Drops all rows or columns (SG_tuples) containing NaN.
ai4materials.utils.utils_config module¶
-
class
ai4materials.utils.utils_config.
SSH
(hostname='172.17.0.3', username='tutorial', port=22, key_file='/home/beaker/docker.openmpi/ssh/id_rsa.mpi', password=None)[source]¶ Bases:
object
SSH class to connect to the cluster to perform a calculation.
Code author: Emre Ahmetcik <ahmetcik@fhi-berlin.mpg.de>
-
ai4materials.utils.utils_config.
get_data_filename
(resource, package='ai4materials')[source]¶ Rewrite of pkgutil.get_data() that return the file path.
Taken from: https://stackoverflow.com/questions/5003755/how-to-use-pkgutils-get-data-with-csv-reader-in-python
ai4materials.utils.utils_crystals module¶
ai4materials.utils.utils_data_retrieval module¶
ai4materials.utils.utils_mp module¶
ai4materials.utils.utils_parsing module¶
ai4materials.utils.utils_plotting module¶
-
ai4materials.utils.utils_plotting.
aggregate_struct_trans_data
(filename, nb_rows_to_cut=0, nb_samples=None, nb_order_param_steps=None, min_order_param=0.0, max_order_param=None, prob_idxs=None, with_uncertainty=True, uncertainty_types=('variation_ratio', 'predictive_entropy', 'mutual_information'))[source]¶ Aggregate structural transition data in order to plot it later.
Starting from the results_file of the run_cnn_model function, aggregate the data by a given order parameter and the probabilities of each class. This is used to prepare the data for the structural transition plots, as shown in Fig. 4, Ziletti et al., Nature Communications 9, 2775 (2018).
Parameters:
- filename: string,
- Full path to the results_file created by the run_cnn_model function. This is a csv file
- nb_samples: int
- Number of samples present in results_file for each order parameter step.
- nb_order_param_steps: int
- Number of order parameter steps. For example, if we are interpolating between structure_1 and structure_2 with 10 steps, nb_order_param_steps=10.
- max_order_param: float
- Maximum number that the order parameter will take in the dataset. This is used to create (together with nb_order_param_steps) to create the linear space which will be later used by the plotting function.
- prob_idxs: list of int
- List of integers which correspond to the classes for which the probabilities will be extracted from the results_file. prob_idxs=[0, 3] will extract only prob_predictions_0 and prob_predictions_3 from the results_file.
Returns:
- panda dataframe
A panda dataframe with the following columns:
- a_to_b_index_ : value of the order parameter
- 2i columns (where the i’s are the elements of the list prob_idxs)
as below:
prob_predictions_i_mean : mean of the distribution of classification probability i for the given a_to_b_index_ value of the order parameter.
prob_predictions_i_std : standard deviation of the distribution of classification probability i for the given a_to_b_index_ value of the order parameter.
- [optional]: columns containing uncertainty quantification
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.utils.utils_plotting.
make_crossover_plot
(df_results, filename, filename_suffix, title, labels, nb_order_param_steps, plot_type='probability', prob_idxs=None, uncertainty_type='mutual_information', linewidth=1.0, markersize=1.0, max_nb_ticks=None, palette=None, show_plot=False, style='publication', x_label='Order parameter')[source]¶ Starting from an aggregated data panda dataframe, plot classification probability distributions as a function of an order parameter.
This will produce a plot along the lines of Fig. 4, Ziletti et al.
Parameters:
- df_results: panda dataframe,
- Panda dataframe returned by the aggregate_struct_trans_data function.
- filename: string
- Full path to the results_file created by the run_cnn_model function. This is a csv file. Only used to name the generated plot appriately.
- filename_suffix: string
- Suffix to be put for the plot filename. This suffix will determine the format of the output plot (e.g. ‘.png’ or ‘.svg’ will create a png or an svg file, respectively.)
- title: string
- Title of the plot
- plot_type: str (options: ‘probability’, ‘uncertainty’)
- Plot either probabilities of classification or uncertainty.
- uncertainty_type: str (options: ‘mutual_information’, ‘predictive_entropy’)
- Type of uncertainty estimation to be plotted. Used only if `plot_type`=’uncertainty’.
- prob_idxs: list of int
- List of integers which correspond to the classes for which the probabilities will be extracted from the results_file. prob_idxs=[0, 3] will extract only prob_predictions_0 and prob_predictions_3 from the results_file. They should correspond (or be a subset) of the prob_idxs specified in aggregate_struct_trans_data.
- nb_order_param_steps: int
- Number of order parameter steps. For example, if we are interpolating between structure_1 and structure_2 with 10 steps, nb_order_param_steps=10. Must be the same as specified in aggregate_struct_trans_data. Different values might work, but could give rise to unexpected behaviour.
- show_plot: bool, optional, default: False
- If True, it opens the generated plot.
- style: string, optional, {‘publication’}
- If style==’publication’, load the default matplotlib style (white background). Otherwise, use the ‘fivethirtyeight’ matplotlib style (black background). plt.style.use(‘fivethirtyeight’)
- x_label: string, optional, default: “Order parameter”
- Label for the x-axis (the order parameter axis)
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.utils.utils_plotting.
make_multiple_image_plot
(data, title='Figure 1', cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_rows=None, n_cols=None, vmin=None, vmax=None, filename=None, save=False)[source]¶
-
ai4materials.utils.utils_plotting.
make_plot_cross_entropy_loss
(step, train_data, val_data, title=None)[source]¶
-
ai4materials.utils.utils_plotting.
plot_confusion_matrix
(conf_matrix, classes, conf_matrix_file, normalize=False, title='Confusion matrix', title_true_label='True label', title_pred_label='Predicted label', cmap='Blues')[source]¶ This function prints and plots the confusion matrix. Normalization can be applied by setting normalize=True.
-
ai4materials.utils.utils_plotting.
plot_save_cnn_results
(filename, accuracy=True, cross_entropy_loss=True, show_plot=False)[source]¶ - Plot and save results of a convolutional neural network calculation
- from the .csv file written by Keras CSVLogger.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.utils.utils_plotting.
rgb_colormaps
(color)[source]¶ Obtain colormaps for RGB.
For a general overview: https://matplotlib.org/examples/pylab_examples/custom_cmap.html
-
ai4materials.utils.utils_plotting.
show_images
(images, filename_png, cols=1, titles=None)[source]¶ Display a list of images in a single figure with matplotlib.
Taken from https://stackoverflow.com/questions/11159436/multiple-figures-in-a-single-window
Parameters:
- images: list of np.arrays
- Images to be plotted. It must be compatible with plt.imshow.
- cols: int, optional, (default = 1)
- Number of columns in figure (number of rows is set to np.ceil(n_images/float(cols))).
- titles: list of strings
- List of titles corresponding to each image.
ai4materials.utils.utils_vol_data module¶
-
ai4materials.utils.utils_vol_data.
get_shells_from_indices
(xyz_r, vol_data)[source]¶ Obtain concentric shells from volumetric data.
The starting point are an array containing the volumetric data and a list of indices which assign points of the volume to the corresponding concentric shell. Using these indices, we perform two operations:
- extract the concentric shells in the volumetric space
- transform the concentric shells to spherical coordinates, i.e. project each sphere to a (theta, phi) plane
Point 1) gives volumetric 3d data containing a given shell. Point 2) gives 2d data in the (theta, phi) plane for a given shell; this can be interpreted as a heatmap.
Parameters:
- xyz_r: list of list of tuples
- The length of the list corresponds to the number of concentric shells considered.
Each element in the list - representing a concentric shell - contains a list of 3 dimensional tuples,
with the indices of the volume elements which belong to the given concentric shell.
This is the list returned by
ai4materials.utils.utils_vol_data.get_slice_volume_indices
. - vol_data: numpy.ndarray
- Volumetric data as numpy.ndarray.
Return: vox_by_slices, theta_phi_by_slices
- vox_by_slices: np.ndarray, shape [n_slices, n_px, n_py, n_pz]
- 4-dimensional array containing each concentric shell obtained from
ai4materials.descriptors.diffraction3d.Diffraction3D
.n_px
,n_py
,n_pz
are given by the interpolation and the region of the space considered. In our case,n_slices=52
,n_px=n_py=n_pz=176
. - theta_phi_by_slices: list of tuples
- Each element in the list correspond to a concentric shell.
In each concentric shell, there is a list of tuples (theta, phi, intensity) of the non-zero points
in the volume considered, as return by
ai4materials.utils.utils_vol_data.shells_to_sph
. The length of the tuple list of each concentric shell is different because a different number of points is non-zero for each shell.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.utils.utils_vol_data.
get_slice_volume_indices
(vol_data, min_r, max_r, dr=1.0, phi_bins=100, theta_bins=50)[source]¶ Given 3d volume return the indices of points belonging to the specified concentric shells.
The volume is should be centered according to its center of mass to have centered concentric shells. In our case we do not have to do that because the diffraction intensity obtained with the Fourier Transform is centered by definition. For use reference, we nevertheless calculate the center of mass within the function.
Parameters:
- vol_data:
- Numpy 3D array containing the volumetric data to be sliced. In our case, this is the three-dimensional diffraction intensity.
- theta_bins: int, optional (default=50)
- Bins to be used for the theta angle of the spherical coordinates.
- phi_bins: int, optional (default=100)
- Bins to be used for the phi angle of the spherical coordinates.
- Returns: list
- List of length = (max_r - min_r)/dr; the length corresponds to the number of concentric shells considered. Each element in the list - representing a concentric shell - contains a list of 3 dimensional tuples, with the indices of the volume elements which belong to the given concentric shell. For example, let us assume the output of the function is stored in variable xyz_r. xyz[0] gives a list of tuples corresponding to the points in the first concentric shell. If xyz[0][0] = (82, 97, 119), this means that the element of the volumetric shape with index (82, 97, 119) belong to the first shell.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
-
ai4materials.utils.utils_vol_data.
interp_theta_phi_surfaces
(theta_phi_by_slices_coarse, theta_bins=256, phi_bins=512)[source]¶ Interpolate the spherical shells in spherical coordinate to a finer grid.
For more information on the interpolation, please refer to: http://scipy-cookbook.readthedocs.io/items/Matplotlib_Gridding_irregularly_spaced_data.html
- theta_phi_by_slices_coarse: list of tuples
- Each element in the list correspond to a concentric shell.
In each concentric shell, there is a list of tuples (theta, phi, intensity) of the non-zero points
in the volume considered, as return by
ai4materials.utils.utils_vol_data.shells_to_sph
. The length of the tuple list of each concentric shell is different because a different number of points is non-zero for each shell. - theta_bins: int, optional (default=256)
- Bins to be used for the interpolation of the theta angle of the spherical coordinates.
- phi_bins: int, optional (default=512)
- Bins to be used for the interpolation of the phi angle of the spherical coordinates.
- Return: np.ndarray, shape [n_slices, theta_bins_fine, phi_bins_fine]
- Three-dimensional array containing each concentric shell in spherical coordinate.
n_slices
is given by the region of the space considered.
Code author: Angelo Ziletti <angelo.ziletti@gmail.com>
Module contents¶
Authors¶
Development Lead¶

Dr. Angelo Ziletti:
- Email: name.surname@gmail.com
- website: https://angeloziletti.github.io/
Angelo Ziletti wrote ai4materials in almost in its entirety (~ 95%). Below we list the other contributors, and the parts to which they contributed to.
Contributors¶
- Andreas Leitherer
- Contributions:
ai4materials.utils.utils_crystals.get_boxes_from_xyz
ai4materials.descriptors.ft_soap_descriptor
ai4materials.descriptors.quippy_soap_descriptor
- Emre Ahmetcik:
- Contributions: