Welcome to ai4materials’s documentation!

The current documentation is not actively mantained and thus might not be up-to-date. For the most recent documentation, please visit ai4materials github repository https://github.com/angeloziletti/ai4materials.

ai4materials allows to perform complex analysis of materials science data using machine learning. It also provide functions to pre-process (on parallel processors), save and subsequently load materials science datasets, thus easing the traceability, reproducibility, and prototyping of new models.

ai4materials allows perform crystal-structure classification and analysis, as introduced in:

[1]A. Leitherer, A. Ziletti, and L. M. Ghiringhelli, “Robust recognition and exploratory analysis of crystal structures via Bayesian deep learning”, https://arxiv.org/abs/2103.09777 (2021)

Installation instructions can be found in the ai4materials github repository: https://github.com/angeloziletti/ai4materials.

On the left panel, you can find a few examples that showcase what ai4materials can do.

Moreover, ai4materials can also reproduce results from the following publications:

[2]A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article]
[3]L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article]

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

Installation

Installation instructions can be found in the ai4materials github repository: https://github.com/angeloziletti/ai4materials.

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Data preprocessing

Submodules

ai4materials.dataprocessing.preprocessing module

Module contents

Representing crystal structures: descriptors

The first necessary step to perform any machine learning and/or automatized analysis on materials science data is to represent the material under consideration in a way that is understandable for a computer. This representation - termed descriptor - should contain all the relevant information on the system needed for the desired learning task.

Starting from crystal structure, provided as ASE (Atomistic Simulation Environment) Atoms object [link], the code allows to calculate different representations. Currently, the following descriptors (i.e. function to represent crystal structures) are implemented:

  • ai4materials.descriptors.atomic_features returns the atomic features corresponding to the chemical species of the system [1]
  • ai4materials.descriptors.diffraction2d calculates the two-dimensional diffraction fingerprint [2]
  • ai4materials.descriptors.diffraction3d calculates the three-dimensional diffraction fingerprint [3]
  • ai4materials.descriptors.prdf calculates the partial radial distribution function [4]
  • ai4materials.descriptors.SOAP calculates the SOAP descriptor [5]

For example of descriptors’ usage and their references, see below.

Example: atomic features

It was recently shown in Ref. [1] that the crystal structure of binary compounds can be predicted using compressed-sensing technique using atomic features only.

The code below illustrates how to retrieve atomic features for one crystal structure. It performs the following steps:

  • build a NaCl crystal structure using the ASE package
  • calculate atomic features using the descriptor ai4materials.descriptors.atomic_features.AtomicFeatures
  • retrieve the atomic features of this crystal structure as the panda dataframe nacl_atomic_features
  • save this table to file.
import sys
import os.path

atomic_data_dir = os.path.abspath(os.path.normpath("/home/ziletti/nomad/nomad-lab-base/analysis-tools/atomic-data"))
sys.path.insert(0, atomic_data_dir)

from ase.spacegroup import crystal
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.descriptors.atomic_features import AtomicFeatures
from nomadcore.local_meta_info import loadJsonFile, InfoKindEl

# setup configs
configs = set_configs(main_folder='./desc_atom_features_ai4materials')
logger = setup_logger(configs, level='INFO', display_configs=False)

desc_file_name = 'atomic_features_try1'

# build atomic structure
structure = crystal(['Na', 'Cl'], [(0, 0, 0), (0.5, 0.5, 0.5)], spacegroup=225, cellpar=[5.64, 5.64, 5.64, 90, 90, 90])

selected_feature_list = ['atomic_ionization_potential', 'atomic_electron_affinity',
                         'atomic_rs_max', 'atomic_rp_max', 'atomic_rd_max']


# define and calculate descriptor
kwargs = {'feature_order_by': 'atomic_mulliken_electronegativity', 'energy_unit': 'eV', 'length_unit': 'angstrom'}

descriptor = AtomicFeatures(configs=configs, **kwargs)

structure_result = descriptor.calculate(structure, selected_feature_list=selected_feature_list)
nacl_atomic_features = structure_result.info['descriptor']['atomic_features_table']

# write table to file
nacl_atomic_features.to_csv('nacl_atomic_features_table.csv', float_format='%.4f')

This is the table (saved in the file nacl_atomic_features_table.csv) containing the atomic features obtained using the code above:

  ordered_chemical_symbols atomic_ionization_potential(A) atomic_electron_affinity(A) atomic_rs_max(A) atomic_rp_max(A) atomic_rd_max(A) atomic_ionization_potential(B) atomic_electron_affinity(B) atomic_rs_max(B) atomic_rp_max(B) atomic_rd_max(B)
0 NaCl -5.2231 -0.7157 1.7100 2.6000 6.5700 -13.9018 -3.9708 0.6800 0.7600 1.6700

Example: two-dimensional diffraction fingerprint

The two-dimensional diffraction fingerprint was introduced in Ref. [2].

The code below illustrates how to calculate the two-dimensional diffraction fingerprint for a supercell of face-center-cubic aluminium containing approximately 256 atoms, performing following steps:

  • build a face-centered-cubic aluminium crystal structure using the ASE package
  • create a supercell using the function ai4materials.utils.utils_crystals.create_supercell
  • calculate the two-dimensional diffraction fingerprint of this crystal structure as the numpy.array intensity_rgb
  • convert the two-dimensional diffraction fingerprint as RGB image and write it to file.
from ase.spacegroup import crystal
from ase.io import write
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_crystals import create_supercell
import numpy as np
from PIL import Image

# setup configs
configs = set_configs(main_folder='./desc_2d_diff_ai4materials')

# create the fcc aluminium structure
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
structure = create_supercell(fcc_al, target_nb_atoms=256)

# calculate the two-dimensional diffraction fingerprint
descriptor = Diffraction2D(configs=configs)
structure_result = descriptor.calculate(structure)
intensity_rgb = structure_result.info['descriptor']['diffraction_2d_intensity']

# write the diffraction fingerprint as png image
rgb_array = np.zeros((intensity_rgb.shape[0], intensity_rgb.shape[1], intensity_rgb.shape[2]), 'uint8')
current_img = list(intensity_rgb.reshape(-1, intensity_rgb.shape[0], intensity_rgb.shape[1]))
for ix_ch in range(len(current_img)):
    rgb_array[..., ix_ch] = current_img[ix_ch] * 255
img = Image.fromarray(rgb_array)
img = img.resize([256, 256], Image.ANTIALIAS)
img.save('fcc_al_diffraction2d_fingerprint.png')

This is the calculated two-dimensional diffraction fingerprint for face-centered-cubic aluminium:

_images/fcc_al_diffraction2d_fingerprint.png

Implementation details of the two-dimensional diffraction fingerprint can be found at ai4materials.descriptors.diffraction2d.

Example: three-dimensional diffraction fingerprint

The three-dimensional diffraction fingerprint was introduced in Ref. [3].

The code below illustrates how to calculate the three-dimensional diffraction fingerprint for a supercell of face-center-cubic aluminium containing approximately 256 atoms, performing following steps:

  • build a face-centered-cubic aluminium crystal structure using the ASE package
  • create a supercell using the function ai4materials.utils.utils_crystals.create_supercell
  • calculate the three-dimensional diffraction fingerprint of this crystal structure as the numpy.array diff3d_spectrum
  • convert the two-dimensional diffraction fingerprint as a heatmap image and write it to file.
from ase.spacegroup import crystal
import matplotlib.pyplot as plt
from ai4materials.descriptors.diffraction3d import DISH
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_crystals import create_supercell
from scipy import ndimage

# setup configs
configs = set_configs(main_folder='./dish_ai4materials')

# create the fcc aluminium structure
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
structure = create_supercell(fcc_al, target_nb_atoms=256, random_rotation=True, cell_type='standard', optimal_supercell=False)

# calculate the two-dimensional diffraction fingerprint
descriptor = DISH(configs=configs)
structure_result = descriptor.calculate(structure)
diff3d_spectrum = descriptor.calculate(structure).info['descriptor']['diffraction_3d_sh_spectrum']

# plot the (enlarged) array as image (enlarging is unphysical, only for visualization purposes)
plt.imsave('fcc_al_diffraction3d_fingerprint.png', ndimage.zoom(diff3d_spectrum, (4, 4)))

This is the calculated three-dimensional diffraction fingerprint for face-centered-cubic aluminium (zoomed for visualization purposes):

_images/fcc_al_diffraction3d_fingerprint.png

Implementation details of the three-dimensional diffraction fingerprint can be found at ai4materials.descriptors.diffraction3d.

[1](1, 2) L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article]
[2](1, 2) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article]
[3](1, 2)
  1. Ziletti, A. Leitherer, M. Scheffler, and L. M. Ghiringhelli, in preparation (2018)
[4]K. T. Schuett, H. Glawe, F. Brockherde, A. Sanna, K. R. M”uller, and E. K. U.Gross, “How to represent crystal structures for machine learning: Towards fast prediction of electronic properties,” Physical Review B, vol. 89, pp. 205118 (2014). [Link to article]
[5]A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical environments,” Physical Review B, vol. 87, no. 18, p.184115 (2013) [Link to article]

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Submodules

ai4materials.descriptors.atomic_features module

ai4materials.descriptors.base_descriptor module

class ai4materials.descriptors.base_descriptor.Descriptor(configs=None, **params)[source]

Bases: object

calculate(structure, **kwargs)[source]

Method that calculates the descriptor

static params(self)[source]
write(structure, **kwargs)[source]

Method to write the descriptor to file

write_desc_info(desc_info_file, ase_atoms_result)[source]
ai4materials.descriptors.base_descriptor.is_descriptor_consistent(structure, descriptor)[source]

ai4materials.descriptors.diffraction1d module

class ai4materials.descriptors.diffraction1d.Diffraction1D(configs, wavelength='CuKa')[source]

Bases: ai4materials.descriptors.base_descriptor.Descriptor

calculate(structure, show=False)[source]

Method that calculates the descriptor

write(structure, tar, write_xrd_pattern=True)[source]

Method to write the descriptor to file

ai4materials.descriptors.diffraction2d module

ai4materials.descriptors.diffraction3d module

ai4materials.descriptors.ft_soap_descriptor module

ai4materials.descriptors.prdf module

class ai4materials.descriptors.prdf.PRDF(configs=None, cutoff_radius=20, rdf_only=False)[source]

Bases: ai4materials.descriptors.base_descriptor.Descriptor

Compute the partial radial distribution of a given crystal structure.

Cell vectors v1,v2,v3 with values in the columns: [[v1x,v2x,v3x],[v1y,v2y,v3x],[v1z,v2z,v3z]]

Parameters:

cutoff_radius: float, optional (default=20)
Atoms within a sphere of cut-off radius (in Angstrom) are considered.
rdf_only: bool, optional (defaults=`False`)
If False calculates partial radial distribution function. If True calculates radial distribution function (all atom types are considered as the same)

Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de> and Angelo Ziletti <angelo.ziletti@gmail.com>

calculate(structure, **kwargs)[source]

Calculate the descriptor for the given ASE structure.

Parameters:

structure: ase.Atoms object
Atomic structure.

Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>

write(structure, tar, op_id=0, write_geo=True, format_geometry='aims')[source]

Write the descriptor to file.

Parameters:

structure: ase.Atoms object
Atomic structure.
tar: TarFile object
TarFile archive where the descriptor is added. This is created internally with tarfile.open.
op_id: int, optional (default=0)
Number of the applied operation to the descriptor. At present always set to zero in the code.
write_geo: bool, optional (default=`True`)
If True, write a coordinate file of the structure for which the diffraction pattern is calculated.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.descriptors.prdf.get_design_matrix(structures, total_bins=50, max_dist=25)[source]

Starting from atomic structures calculate the design matrix for the partial radial distribution function.

The list of structures must contain the calculated ai4materials.descriptors.prdf.PRDF. The discretization is performed using a logarithmic grid as follows:

bins = np.logspace(0, np.log10(max_dist), num=total_bins + 1) - 1

Parameters:

structures: ase.Atoms object or list of ase.Atoms object
Atomic structure or list of atomic structure.
total_bins: int, optional (default=50)
Total number of bins to be used in the discretization of the partial radial distribution function.
max_dist: float, optional (default=25)
Maximum distance to consider in the partial radial distribution function when the design matrix is calculated. Unit in Angstrom. The unit of measure is the same as ai4materials.descriptors.prdf.PRDF.

Return:

scipy.sparse.csr.csr_matrix, shape [n_samples, largest_atomic_nb * largest_atomic_nb * total_bins]
Returns a sparse row-compressed matrix.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.descriptors.prdf.get_unique_chemical_species(structures)[source]

Get the set of unique chemical species from a list of atomic structures.

The list of structures must contain the calculated ai4materials.descriptors.prdf.PRDF.

Parameters:

structures: ase.Atoms object or list of ase.Atoms objects
Atomic structure or list of atomic structure.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.descriptors.quippy_soap_descriptor module

ai4materials.descriptors.soap_model module

Module contents

Creating and loading materials science datasets

Before performing any data analysis, pre-processing steps (e.g. descriptor calculation) are often needed to transform materials science data in a suitable form for the algorithm of choice, being it for example a neural network. This pre-processing is usually a computationally demanding step, especially if hundred of thousands of structures needs to be calculated, possible for different parameters setting.

Since hyperparameter tuning of the regression/classification algorithm typically requires to run the model several times (for a given pre-processed dataset), it is thus highly beneficial to be able to save and re-load the pre-processed results in a consistent and traceable manner.

Here we provide functions to pre-process (in parallel), save and subsequently load materials science datasets; this not only eases the traceability and reproduciblity of data analysis on materials science data, but speeds up the prototyping of new models.

Example: diffraction fingerprint calculation for multiple structures

The code below illustrates how to compute a descriptor for multiple crystal structures using multiple processors, save the results to file, and reload the file for later use (e.g. for classification).

As illustrative example we calculate the two-dimensional diffraction fingerprint [1] of pristine (e.g. perfect) and highly defective (50% of missing atoms) crystal structures. In particular, the four crystal structures considered are: body-centered cubic (bcc), face-centered cubic(fcc), diamond (diam), and hexagonal closed packed (hcp) structures; more than 80% of elemental solids adopt one of these four crystal structures under standard conditions.

The steps performed in the code below are the following:

  • define the folders where the results are going to be saved
  • build the four crystal structures (bcc, fcc, diam, hcp) using the ASE package
  • create a pristine supercell using the function ai4materials.utils.utils_crystals.create_supercell
  • create a defective supercell (50% of atoms missing) using the function ai4materials.utils.utils_crystals.create_vacancies
  • calculate the two-dimensional diffraction fingerprint for all (eight) crystal structures using ai4materials.wrappers.calc_descriptor
  • save the results to file
  • reload the results from file
  • generate a texture atlas with the two-dimensional diffraction fingerprints of all structures and write it to file.

Implementation details of the two-dimensional diffraction fingerprint can be found at ai4materials.descriptors.diffraction2d.

from ase.spacegroup import crystal
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import create_supercell
from ai4materials.utils.utils_crystals import create_vacancies
from ai4materials.utils.utils_data_retrieval import generate_facets_input
from ai4materials.wrappers import calc_descriptor
from ai4materials.wrappers import load_descriptor
import os.path

# set configs
configs = set_configs(main_folder='./multiple_2d_diff_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)

# setup folder and files
desc_file_name = 'fcc_bcc_diam_hcp_example'

# build crystal structures
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
bcc_fe = crystal('Fe', [(0, 0, 0)], spacegroup=229, cellpar=[2.87, 2.87, 2.87, 90, 90, 90])
diamond_c = crystal('C', [(0, 0, 0)], spacegroup=227, cellpar=[3.57, 3.57, 3.57, 90, 90, 90])
hcp_mg = crystal('Mg', [(1. / 3., 2. / 3., 3. / 4.)], spacegroup=194, cellpar=[3.21, 3.21, 5.21, 90, 90, 120])
# create supercells - pristine
fcc_al_supercell = create_supercell(fcc_al, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell = create_supercell(bcc_fe, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell = create_supercell(diamond_c, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell = create_supercell(hcp_mg, target_nb_atoms=128, cell_type='standard_no_symmetries')
# create supercells - vacancies
fcc_al_supercell_vac = create_vacancies(fcc_al, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell_vac = create_vacancies(bcc_fe, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell_vac = create_vacancies(diamond_c, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell_vac = create_vacancies(hcp_mg, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')

ase_atoms_list = [fcc_al_supercell, fcc_al_supercell_vac,
                  bcc_fe_supercell, bcc_fe_supercell_vac,
                  diamond_c_supercell, diamond_c_supercell_vac,
                  hcp_mg_supercell, hcp_mg_supercell_vac]

# calculate the descriptor for the list of structures and save it to file
descriptor = Diffraction2D(configs=configs)
desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
                                 desc_file=str(desc_file_name)+'.tar.gz', format_geometry='aims',
                                 nb_jobs=-1)

# load the previously saved file containing the crystal structures and their corresponding descriptor
target_list, structure_list = load_descriptor(desc_files=desc_file_path, configs=configs)

# create a texture atlas with all the two-dimensional diffraction fingerprints
df, texture_atlas = generate_facets_input(structure_list=structure_list, desc_metadata='diffraction_2d_intensity',
                                          target_list=target_list,
                                          sprite_atlas_filename=desc_file_name,
                                          configs=configs, normalize=True)

This are the calculated two-dimensional diffraction fingerprints for all crystal structures in the list :

_images/fcc_bcc_diam_hcp_example.png

Example: atomic feature retrieval for multiple structures

It was recently shown in Ref. [2] that the crystal structure of binary compounds can be predicted using compressed-sensing technique using atomic features only.

The code below illustrates how to retrieve atomic features, performing the following steps:

  • build a list of crystal structure using the ASE package
  • retrieve atomic features using the descriptor ai4materials.descriptors.atomic_features.AtomicFeatures for all crystal structures
  • save the results to file
  • reload the results from file
  • construct a table df_atomic_features containing the atomic features using the function ai4materials.descriptors.atomic_features.get_table_atomic_features
  • write the atomic feature table as csv file
  • build a heatmap of the atomic feature table
import sys
import os.path

atomic_data_dir = os.path.abspath(os.path.normpath("/home/ziletti/nomad/nomad-lab-base/analysis-tools/atomic-data"))
sys.path.insert(0, atomic_data_dir)

from ase.spacegroup import crystal
import matplotlib.pyplot as plt
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import get_spacegroup_old
from ai4materials.utils.utils_binaries import get_binaries_dict_delta_e
from ai4materials.wrappers import calc_descriptor
from ai4materials.wrappers import load_descriptor
from ai4materials.descriptors.atomic_features import AtomicFeatures
from ai4materials.descriptors.atomic_features import get_table_atomic_features
import seaborn as sns

# set configs
configs = set_configs(main_folder='./dataset_atomic_features_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)

desc_file_name = 'atomic_features_try1'

# build atomic structures
group_1a = ['Li', 'Na', 'K', 'Rb']
group_1b = ['F', 'Cl', 'Br', 'I']
group_2a = ['Be', 'Mg', 'Ca', 'Sr']
group_2b = ['O', 'S', 'Se', 'Te']

ase_atoms_list = []
for el_1a in group_1a:
    for el_1b in group_1b:
        ase_atoms_list.append(crystal([el_1a, el_1b], [(0, 0, 0), (0.5, 0.5, 0.5)], spacegroup=225, cellpar=[5.64, 5.64, 5.64, 90, 90, 90]))

for el_2a in group_2a:
    for el_2b in group_2b:
        ase_atoms_list.append(crystal([el_2a, el_2b], [(0, 0, 0), (0.5, 0.5, 0.5)], spacegroup=225, cellpar=[5.64, 5.64, 5.64, 90, 90, 90]))

selected_feature_list = ['atomic_ionization_potential', 'atomic_electron_affinity',
                         'atomic_rs_max', 'atomic_rp_max', 'atomic_rd_max']


# define and calculate descriptor
kwargs = {'feature_order_by': 'atomic_mulliken_electronegativity', 'energy_unit': 'eV', 'length_unit': 'angstrom'}

descriptor = AtomicFeatures(configs=configs, **kwargs)

desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
                                 desc_file=str(desc_file_name)+'.tar.gz', format_geometry='aims',
                                 selected_feature_list=selected_feature_list,
                                 nb_jobs=-1)

target_list, ase_atoms_list = load_descriptor(desc_files=desc_file_path, configs=configs)
df_atomic_features = get_table_atomic_features(ase_atoms_list)

# write table to file
df_atomic_features.to_csv('atomic_features_table.csv', float_format='%.4f')

# plot the table with seaborn
df_atomic_features = df_atomic_features.set_index('ordered_chemical_symbols')
mask = df_atomic_features.isnull()
fig = plt.figure()
sns.set(font_scale=0.5)
sns_plot = sns.heatmap(df_atomic_features, annot=True, mask=mask)
fig = sns_plot.get_figure()
fig.tight_layout()
fig.savefig('atomic_features_plot.png', dpi=200)

This is the table containing the atomic features obtained using the code above:

  ordered_chemical_symbols atomic_ionization_potential(A) atomic_electron_affinity(A) atomic_rs_max(A) atomic_rp_max(A) atomic_rd_max(A) atomic_ionization_potential(B) atomic_electron_affinity(B) atomic_rs_max(B) atomic_rp_max(B) atomic_rd_max(B)
0 LiF -5.3291 -0.6981 1.6500 2.0000 6.9300 -19.4043 -4.2735 0.4100 0.3700 1.4300
1 KBr -4.4332 -0.6213 2.1300 2.4400 1.7900 -12.6496 -3.7393 0.7500 0.8800 1.8700
2 KI -4.4332 -0.6213 2.1300 2.4400 1.7900 -11.2571 -3.5135 0.9000 1.0700 1.7200
3 RbF -4.2889 -0.5904 2.2400 3.2000 1.9600 -19.4043 -4.2735 0.4100 0.3700 1.4300
4 RbCl -4.2889 -0.5904 2.2400 3.2000 1.9600 -13.9018 -3.9708 0.6800 0.7600 1.6700
5 RbBr -4.2889 -0.5904 2.2400 3.2000 1.9600 -12.6496 -3.7393 0.7500 0.8800 1.8700
6 RbI -4.2889 -0.5904 2.2400 3.2000 1.9600 -11.2571 -3.5135 0.9000 1.0700 1.7200
7 BeO -9.4594 0.6305 1.0800 1.2100 2.8800 -16.4332 -3.0059 0.4600 0.4300 2.2200
8 BeS -9.4594 0.6305 1.0800 1.2100 2.8800 -11.7951 -2.8449 0.7400 0.8500 2.3700
9 BeSe -9.4594 0.6305 1.0800 1.2100 2.8800 -10.9460 -2.7510 0.8000 0.9500 2.1800
10 BeTe -9.4594 0.6305 1.0800 1.2100 2.8800 -9.8667 -2.6660 0.9400 1.1400 1.8300
11 LiCl -5.3291 -0.6981 1.6500 2.0000 6.9300 -13.9018 -3.9708 0.6800 0.7600 1.6700
12 MgO -8.0371 0.6925 1.3300 1.9000 3.1700 -16.4332 -3.0059 0.4600 0.4300 2.2200
13 MgS -8.0371 0.6925 1.3300 1.9000 3.1700 -11.7951 -2.8449 0.7400 0.8500 2.3700
14 MgSe -8.0371 0.6925 1.3300 1.9000 3.1700 -10.9460 -2.7510 0.8000 0.9500 2.1800
15 MgTe -8.0371 0.6925 1.3300 1.9000 3.1700 -9.8667 -2.6660 0.9400 1.1400 1.8300
16 CaO -6.4280 0.3039 1.7600 2.3200 0.6800 -16.4332 -3.0059 0.4600 0.4300 2.2200
17 CaS -6.4280 0.3039 1.7600 2.3200 0.6800 -11.7951 -2.8449 0.7400 0.8500 2.3700
18 CaSe -6.4280 0.3039 1.7600 2.3200 0.6800 -10.9460 -2.7510 0.8000 0.9500 2.1800
19 CaTe -6.4280 0.3039 1.7600 2.3200 0.6800 -9.8667 -2.6660 0.9400 1.1400 1.8300
20 SrO -6.0316 0.3431 1.9100 2.5500 1.2000 -16.4332 -3.0059 0.4600 0.4300 2.2200
21 SrS -6.0316 0.3431 1.9100 2.5500 1.2000 -11.7951 -2.8449 0.7400 0.8500 2.3700
22 LiBr -5.3291 -0.6981 1.6500 2.0000 6.9300 -12.6496 -3.7393 0.7500 0.8800 1.8700
23 SrSe -6.0316 0.3431 1.9100 2.5500 1.2000 -10.9460 -2.7510 0.8000 0.9500 2.1800
24 SrTe -6.0316 0.3431 1.9100 2.5500 1.2000 -9.8667 -2.6660 0.9400 1.1400 1.8300
25 LiI -5.3291 -0.6981 1.6500 2.0000 6.9300 -11.2571 -3.5135 0.9000 1.0700 1.7200
26 NaF -5.2231 -0.7157 1.7100 2.6000 6.5700 -19.4043 -4.2735 0.4100 0.3700 1.4300
27 NaCl -5.2231 -0.7157 1.7100 2.6000 6.5700 -13.9018 -3.9708 0.6800 0.7600 1.6700
28 NaBr -5.2231 -0.7157 1.7100 2.6000 6.5700 -12.6496 -3.7393 0.7500 0.8800 1.8700
29 NaI -5.2231 -0.7157 1.7100 2.6000 6.5700 -11.2571 -3.5135 0.9000 1.0700 1.7200
30 KF -4.4332 -0.6213 2.1300 2.4400 1.7900 -19.4043 -4.2735 0.4100 0.3700 1.4300
31 KCl -4.4332 -0.6213 2.1300 2.4400 1.7900 -13.9018 -3.9708 0.6800 0.7600 1.6700

and this is its corresponding heatmap:

_images/atomic_features_plot.png

Example: dataset creation for data analytics

The code below illustrates how to compute a descriptor (the two-dimensional diffraction fingerprint [1]) for multiple crystal structures, save the results to file, and reload the file for later use (e.g. for classification).

The steps performed in the code below are the following:

  • define the folders where the results are going to be saved
  • build the four crystal structures (bcc, fcc, diam, hcp) using the ASE package
  • create a pristine supercell using the function ai4materials.utils.utils_crystals.create_supercell
  • create a defective supercell (50% of atoms missing) using the function ai4materials.utils.utils_crystals.create_vacancies
  • calculate the two-dimensional diffraction fingerprint for all (eight) crystal structures
  • save the results to file
  • reload the results from file
  • define a user-specified target variable (i.e. the variable that one want to predict with the classification/regression model); in this case this variable is the crystal structure type (‘fcc’, ‘bcc’, ‘diam’, ‘hcp’)
  • create a dataset containing the specified desc_metadata (this needs to be compatible with the descriptor choice)
  • save the dataset to file in the folder dataset_folder, including data (numpy array), target variable (numpy array), and metadata regarding the dataset (JSON format)
  • re-load from file the saved dataset to be used for example in a classification task
from ase.spacegroup import crystal
from ai4materials.dataprocessing.preprocessing import load_dataset_from_file
from ai4materials.dataprocessing.preprocessing import prepare_dataset
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import create_supercell
from ai4materials.utils.utils_crystals import create_vacancies
from ai4materials.wrappers import calc_descriptor
from ai4materials.wrappers import load_descriptor
import os.path

# set configs
configs = set_configs(main_folder='./dataset_2d_diff_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)

# setup folder and files
dataset_folder = os.path.join(configs['io']['main_folder'], 'my_datasets')
desc_file_name = 'fcc_bcc_diam_hcp_example'

# build crystal structures
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
bcc_fe = crystal('Fe', [(0, 0, 0)], spacegroup=229, cellpar=[2.87, 2.87, 2.87, 90, 90, 90])
diamond_c = crystal('C', [(0, 0, 0)], spacegroup=227, cellpar=[3.57, 3.57, 3.57, 90, 90, 90])
hcp_mg = crystal('Mg', [(1. / 3., 2. / 3., 3. / 4.)], spacegroup=194, cellpar=[3.21, 3.21, 5.21, 90, 90, 120])
# create supercells - pristine
fcc_al_supercell = create_supercell(fcc_al, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell = create_supercell(bcc_fe, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell = create_supercell(diamond_c, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell = create_supercell(hcp_mg, target_nb_atoms=128, cell_type='standard_no_symmetries')
# create supercells - vacancies
fcc_al_supercell_vac = create_vacancies(fcc_al, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell_vac = create_vacancies(bcc_fe, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell_vac = create_vacancies(diamond_c, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell_vac = create_vacancies(hcp_mg, target_vacancy_ratio=0.50, target_nb_atoms=128, cell_type='standard_no_symmetries')

ase_atoms_list = [fcc_al_supercell, fcc_al_supercell_vac,
                  bcc_fe_supercell, bcc_fe_supercell_vac,
                  diamond_c_supercell, diamond_c_supercell_vac,
                  hcp_mg_supercell, hcp_mg_supercell_vac]

# calculate the descriptor for the list of structures and save it to file
descriptor = Diffraction2D(configs=configs)
desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
                                 desc_file=str(desc_file_name)+'.tar.gz', format_geometry='aims',
                                 nb_jobs=-1)

# load the previously saved file containing the crystal structures and their corresponding descriptor
target_list, structure_list = load_descriptor(desc_files=desc_file_path, configs=configs)

# add as target the spacegroup (using spacegroup of the "parental" structure for the defective structure)
targets = ['fcc', 'fcc', 'bcc', 'bcc', 'diam', 'diam', 'hcp', 'hcp']
for idx, item in enumerate(target_list):
    item['data'][0]['target'] = targets[idx]

path_to_x, path_to_y, path_to_summary = prepare_dataset(
    structure_list=structure_list,
    target_list=target_list,
    desc_metadata='diffraction_2d_intensity',
    dataset_name='bcc-fcc-diam-hcp',
    target_name='target',
    target_categorical=True,
    input_dims=(64, 64),
    configs=configs,
    dataset_folder=dataset_folder,
    main_folder=configs['io']['main_folder'],
    desc_folder=configs['io']['desc_folder'],
    tmp_folder=configs['io']['tmp_folder'],
    notes="Dataset with bcc, fcc, diam and hcp structures, pristine and with 50% of defects.")

x, y, dataset_info = load_dataset_from_file(path_to_x=path_to_x, path_to_y=path_to_y,
                                                              path_to_summary=path_to_summary)

In the code above, the numpy array x contains the specified desc_metadata, the numpy array y contains the specified targets, and dataset_info is a dictionary containing information regarding the dataset was just loaded:

    {
          "data":[{
  "target_name": "target", 
  "n_bins": 100, 
  "path_to_summary": "/home/ziletti/Documents/calc_xray/2d_nature_comm/my_datasets/bcc-fcc-diam-hcp_summary.json", 
  "creation_date": "2018-06-20T18:42:07.110239", 
  "numerical_labels": [
    2, 
    2, 
    0, 
    0, 
    1, 
    1, 
    3, 
    3
  ], 
  "classes": [
    "bcc", 
    "diam", 
    "fcc", 
    "hcp"
  ], 
  "nb_classes": 4, 
  "path_to_y": "/home/ziletti/Documents/calc_xray/2d_nature_comm/my_datasets/bcc-fcc-diam-hcp_y.pkl", 
  "path_to_x": "/home/ziletti/Documents/calc_xray/2d_nature_comm/my_datasets/bcc-fcc-diam-hcp_x.pkl", 
  "text_labels": [
    "fcc", 
    "fcc", 
    "bcc", 
    "bcc", 
    "diam", 
    "diam", 
    "hcp", 
    "hcp"
  ], 
  "target_categorical": true, 
  "disc_type": null, 
  "notes": "Dataset with bcc, fcc, diam and hcp structures, pristine and with 50% of defects.", 
  "dataset_name": "bcc-fcc-diam-hcp"
}
    ] }
[1](1, 2) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article]
[2]L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article]

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Regression and classification models

ai4materials allows to apply state-of-the-art data-analytics models to relevant materials science. Below, we present an example based on compressed sensing.

Example regression: LASSO+l0 method

This example shows how to find descriptive parameters (short formulas) that predict crystal structure, using the example of octet binary compounds that have either rocksalt (RS) or zincblende (ZB) structure. It is based on Ref. [1], and it allows to reproduce the results presented in Fig. 2 of this reference.

Starting from simple physical quantities (“building blocks”, here properties of the constituent free atoms such as orbital radii), thousands of candidate formulas are generated by applying arithmetic operations combining building blocks, for example forming sums and products of them. These candidate formulas constitute the so-called “feature space”. Then, a sparse regression method is used to select only a few of these formulas that explain the data.

The code below performs following steps:

  • read the dataset containing binary materials from file
  • calculate the atomic features using the descriptor ai4materials.descriptors.atomic_features.AtomicFeatures
  • calculate the descriptive parameters using the LASSO+l0 method with ai4materials.wrappers.calc_model
  • plot the results.
import sys
import os.path

atomic_data_dir = os.path.normpath('/home/ziletti/nomad/nomad-lab-base/analysis-tools/atomic-data')
sys.path.insert(0, atomic_data_dir)

import matplotlib.pyplot as plt
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_data_retrieval import read_ase_db
from ai4materials.wrappers import load_descriptor
from ai4materials.wrappers import calc_model
from ai4materials.wrappers import calc_descriptor
from ai4materials.descriptors.atomic_features import AtomicFeatures
from ai4materials.descriptors.atomic_features import get_table_atomic_features
from ai4materials.utils.utils_config import get_data_filename
from ai4materials.visualization.viewer import read_control_file
import numpy as np
import pandas as pd

# modify this path if you want to save the calculation results in another location
configs = set_configs(main_folder='./l1_l0_example')
logger = setup_logger(configs, level='INFO')

# setup folder and files
lookup_file = os.path.join(configs['io']['main_folder'], 'lookup.dat')
materials_map_plot_file = os.path.join(configs['io']['main_folder'], 'binaries_l1_l0_map_prl2015.png')

# define descriptor - atomic features in this case
kwargs = {'energy_unit': 'eV', 'length_unit': 'angstrom'}
descriptor = AtomicFeatures(configs=configs, **kwargs)

# =============================================================================
# Descriptor calculation
# =============================================================================

desc_file_name = 'atomic_features_binaries'
ase_db_file = get_data_filename('data/db_ase/binaries_lowest_energy_ghiringhelli2015.json')
ase_atoms_list = read_ase_db(db_path=ase_db_file)

selected_feature_list = ['atomic_ionization_potential', 'atomic_electron_affinity', 'atomic_rs_max',
                         'atomic_rp_max', 'atomic_rd_max']
allowed_operations = ['+', '-', '/', '|-|', 'exp', '^2']

desc_file_path = calc_descriptor(descriptor=descriptor, configs=configs, ase_atoms_list=ase_atoms_list,
                                 desc_file='lasso_l0_binaries_example.tar.gz',
                                 format_geometry='aims',
                                 selected_feature_list=selected_feature_list,
                                 nb_jobs=-1)

# load descriptor
target_list, structure_list = load_descriptor(desc_files=desc_file_path, configs=configs)
df_atomic_features = get_table_atomic_features(structure_list)

# =============================================================================
# Model calculation
# =============================================================================

chemical_formulas = [structure.get_chemical_formula(mode='hill') for structure in structure_list]
df_atomic_features['chemical_formula'] = chemical_formulas
df_atomic_features = df_atomic_features.sort_values(by='chemical_formula').reset_index(drop=True)

# target values to predict
dict_delta_e = dict(SeZn=0.2631369195046646, BaTe=-0.37538683850924387, BN=1.7120803923951688,
                    CGe=0.8114429425515818, GaP=0.3487518245522925, MgS=-0.08669951164989079,
                    GaN=0.4334452723999156, AlAs=0.21326186549251072, BP=1.019225239514441, FK=-0.14640610974868423,
                    BrLi=-0.03274621540254649, BSb=0.5808491589999847, CaTe=-0.3504563060008138,
                    ClK=-0.16446069285018655, BrCs=-0.1558673149861294, BrCu=0.15244265149855352,
                    ILi=-0.021660938008450818, CuF=-0.01702227364862989, FNa=-0.14578814899027592,
                    C2=2.6286038411199026, AgBr=-0.030033419005850936, CuI=0.20467459898973175,
                    GaSb=0.15462529698986593, ClLi=-0.03838148564873346, AsIn=0.13404758548892423,
                    OZn=0.10196818460305757, MgO=-0.2322747421651549, InP=0.17919330099729866,
                    Ge2=0.20085254149716641, InN=0.15372030450150198, CSn=0.45353800899655555,
                    CdTe=0.11453954098812649, TeZn=0.24500131400199776, MgTe=-0.004591286999846332,
                    BaS=-0.3197624539995756, CaSe=-0.36079776214906895, FRb=-0.1355957874033439,
                    BeO=0.6918376303948839, AsB=0.8749782510022386, CaS=-0.36913322290101264,
                    CaO=-0.2652190617003161, BaO=-0.09299856100784433, AlSb=0.15686874600534004,
                    SrTe=-0.3792947550252322, BeS=0.5063277134499351, InSb=0.0780598790169251,
                    SZn=0.27581334679854935, OSr=-0.2203066401004525, BrRb=-0.1638205440075271,
                    BeSe=0.4949404808020511, ClRb=-0.16050356640655905, BrNa=-0.1264287376032476,
                    MgSe=-0.05530180620975655, GeSn=0.08166336650886348, GeSi=0.2632101904042582,
                    CsF=-0.10826332699038382, CdSe=0.08357195550137826, FLi=-0.059488321434879074,
                    AlN=0.07294907877519896, Si2=0.2791658430004932, SiSn=0.13510880949563495,
                    ClNa=-0.13299199530041886, CdO=-0.0841613645001312, SSr=-0.36843415824218,
                    IK=-0.16703915799644553, BaSe=-0.3434451604764059, BrK=-0.1661759769597461,
                    BeTe=0.4685859464949282, CdS=0.07267280149604124, CsI=-0.16238748698990838,
                    INa=-0.11483823100687315, AlP=0.2189583583002711, AsGa=0.27427779349540243,
                    SeSr=-0.3745109805057823, CSi=0.669023778644634, AgCl=-0.04279728149250233,
                    AgI=0.03692542249419624, AgF=-0.15375768499313544, ClCs=-0.1503461689991465,
                    Sn2=0.016963900503544026, ClCu=0.15625872520000064, IRb=-0.16720145498980848)

df_atomic_features['target'] = df_atomic_features['chemical_formula'].map(dict_delta_e)
target = np.asarray(df_atomic_features['target'].values.astype(float))

cols_to_drop = ['chemical_formula', 'target', 'ordered_chemical_symbols']

# use the l1-l0 method proposed in Ghiringhelli et al. (2015)
calc_model(method='l1_l0', df_features=df_atomic_features, cols_to_drop=cols_to_drop,
           target=target, max_dim=2, allowed_operations=allowed_operations,
           tmp_folder=configs['io']['tmp_folder'], results_folder=configs['io']['results_folder'],
           lookup_file=lookup_file, control_file=configs['io']['control_file'], energy_unit='eV',
           length_unit='angstrom')

# read the results for the two-dimensional descriptor
viewer_filename = 'l1_l0_dim1_for_viewer.csv'
viewer_filepath = os.path.join(configs['io']['results_folder'], viewer_filename)
df_viewer = pd.read_csv(viewer_filepath)
x_axis_label, y_axis_label = read_control_file(configs['io']['control_file'])

# plot the results for the two-dimensional descriptor
fig, ax = plt.subplots()
x = df_viewer['coord_0']
y = df_viewer['coord_1']
color = df_viewer['y_true']
chemical_formula = df_viewer['chemical_formula']
cm = plt.cm.get_cmap('rainbow')
sc = plt.scatter(x, y, c=color, cmap=cm)

# annotate the points
for i, txt in enumerate(chemical_formula):
    ax.annotate(txt, (x[i], y[i]),  size=4)

plt.xlabel(x_axis_label)
plt.ylabel(y_axis_label)
cbar = plt.colorbar(sc)
cbar.set_label('Reference E(RS)-E(ZB)', rotation=90)
plt.title("l1/l0 structure map for binary compounds\n ")
plt.subplots_adjust(bottom=0.2)
plt.figtext(0.5, 0.02, "Compare with Fig. 2 in Ghiringhelli et al., Phys. Rev. Lett 114 (10), 105503 (2015)",
            horizontalalignment='center', style='italic')

plt.savefig(materials_map_plot_file, dpi=300)

This is the plot showing the calculated energy differences between rocksalt and zincblende structures of the 82 octet binary AB materials used in Ref. [1] according to the two-dimensional descriptor found via the LASSO+l0 procedure:

_images/binaries_l1_l0_map_prl2015.png

Implementation details of how atomic features are automatically constructed can be found at ai4materials.descriptors.atomic_features. Implementation details of the LASSO+l0 method can be found at ai4materials.wrappers.calc_model and at ai4materials.models.l1_l0.

Example classification: convolutional neural network for crystal-structure classification

This example shows how to load a dataset of crystal structures (represented by the diffraction fingerprint [2]), train a convolutional neural network on pristine (perfect) crystal structures, and use this neural network to predict the crystal class of highly defective crystal structures. This method - introduced in Ref. [2] - allows to correctly classify heavily defective crystal structures. In this particular case, even if 25% of the atoms were removed from each structure, the model still retains an accuracy of 100%.

The code below performs following steps:

  • read the dataset from crystal-structure classification used in Ref. [2]
  • train a convolutional neural network for crystal-structure classification using ai4materials.models.cnn_nature_comm_ziletti2018.train_neural_network
  • predict the class for each crystal structure using the neural network trained in in Ref. [2] using ai4materials.models.cnn_nature_comm_ziletti2018.predict
from functools import partial
from ai4materials.utils.utils_config import set_configs
from ai4materials.dataprocessing.preprocessing import load_dataset_from_file
from ai4materials.models.cnn_architectures import cnn_nature_comm_ziletti2018
from ai4materials.models.cnn_nature_comm_ziletti2018 import load_datasets
from ai4materials.models.cnn_nature_comm_ziletti2018 import predict
from ai4materials.models.cnn_nature_comm_ziletti2018 import train_neural_network
from ai4materials.utils.utils_config import setup_logger
import numpy as np
import os

configs = set_configs()
logger = setup_logger(configs, level='DEBUG', display_configs=False)
dataset_folder = configs['io']['main_folder']

# =============================================================================
# Download the dataset from the online repository and load it
# =============================================================================

x_pristine, y_pristine, dataset_info_pristine, x_vac25, y_vac25, dataset_info_vac25 = load_datasets(dataset_folder)

train_set_name = 'pristine_dataset'
path_to_x_pristine = os.path.join(dataset_folder, train_set_name + '_x.pkl')
path_to_y_pristine = os.path.join(dataset_folder, train_set_name + '_y.pkl')
path_to_summary_pristine = os.path.join(dataset_folder, train_set_name + '_summary.json')

test_set_name = 'vac25_dataset'
path_to_x_vac25 = os.path.join(dataset_folder, test_set_name + '_x.pkl')
path_to_y_vac25 = os.path.join(dataset_folder, test_set_name + '_y.pkl')
path_to_summary_vac25 = os.path.join(dataset_folder, test_set_name + '_summary.json')

x_pristine, y_pristine, dataset_info_pristine = load_dataset_from_file(path_to_x_pristine, path_to_y_pristine,
                                                                       path_to_summary_pristine)

x_vac25, y_vac25, dataset_info_vac25 = load_dataset_from_file(path_to_x_vac25, path_to_y_vac25,
                                                              path_to_summary_vac25)


# =============================================================================
# Train the convolutional neural network
# =============================================================================

# load the convolutional neural network architecture from Ziletti et al., Nature Communications 9, pp. 2775 (2018)
partial_model_architecture = partial(cnn_nature_comm_ziletti2018, conv2d_filters=[32, 32, 16, 16, 8, 8],
                                     kernel_sizes=[3, 3, 3, 3, 3, 3], max_pool_strides=[2, 2],
                                     hidden_layer_size=128)

# use x_train also for validation - this is only to run the test
results = train_neural_network(x_train=x_pristine, y_train=y_pristine, x_val=x_pristine, y_val=y_pristine,
                               configs=configs, partial_model_architecture=partial_model_architecture,
                               nb_epoch=1)

text_labels = np.asarray(dataset_info_vac25["data"][0]["text_labels"])[:100]
numerical_labels = np.asarray(dataset_info_vac25["data"][0]["numerical_labels"])[:100]

# =============================================================================
# Predict the crystal class of a material using the trained neural network
# =============================================================================

# load the convolutional neural network architecture from Ziletti et al., Nature Communications 9, pp. 2775 (2018)
# you can also use your own neural network to predict, passing it to the variable 'model'
results = predict(x_vac25, y_vac25, configs=configs, numerical_labels=numerical_labels,
                  text_labels=text_labels, model=None)

This is the confusion matrix obtained using the convolutional neural network to predict the class of structures with 25% of missing atoms:

_images/cnn_nature_comm2018_confusion_matrix.png

The model has an accuracy of 100%, even in the presence of defects (25% atoms missing in this case). The neural network’s training and prediction is performed with Keras. Implementation details on the convolutional neural network used can be found at ai4materials.models.cnn_nature_comm_ziletti2018.

[1](1, 2) L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article]
[2](1, 2, 3, 4) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article]

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Submodules

ai4materials.models.clustering module

ai4materials.models.cnn_architectures module

ai4materials.models.cnn_architectures.cnn_architecture_polycrystals(learning_rate=0.0003, conv2d_filters=[32, 16, 8, 8, 16, 32], kernel_sizes=[3, 3, 3, 3, 3, 3], hidden_layer_size=64, n_rows=50, n_columns=32, nb_classes=5, dropout=0.125, img_channels=1)[source]

Deep convolutional neural network model for crystal structure recognition.

This neural network architecture was used to classify crystal structures - represented by the three-dimensional diffraction fingerprint - in Ref. [1].

[1]A. Ziletti et al., “Automatic structure identification in polycrystals via Bayesian deep learning”, in preparation (2018)

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.models.cnn_architectures.cnn_nature_comm_ziletti2018(conv2d_filters, kernel_sizes, max_pool_strides, hidden_layer_size, n_rows, n_columns, img_channels, nb_classes)[source]

Deep convolutional neural network model for crystal structure recognition.

This neural network architecture was used to classify crystal structures - represented by the two-dimensional diffraction fingerprint - in Ref. [2]

[2]A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning”, Nature Communications, vol. 9, pp. 2775 (2018)

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.models.cnn_architectures.model_architecture_3d(dim1, dim2, dim3, img_channels, nb_classes)[source]

ai4materials.models.cnn_nature_comm_ziletti2018 module

ai4materials.models.cnn_polycrystals module

ai4materials.models.cnn_polycrystals.normalize_images(images)[source]
ai4materials.models.cnn_polycrystals.predict(x, y, configs, numerical_labels, text_labels, nb_classes=3, results_file=None, model=None, batch_size=32, conf_matrix_file=None, verbose=1, with_uncertainty=True, mc_samples=50, consider_memory=True, max_length=1000000.0)[source]
ai4materials.models.cnn_polycrystals.predict_with_uncertainty(data, model, model_type='classification', n_iter=1000)[source]

This function allows to calculate the uncertainty of a neural network model using dropout.

This follows Chap. 3 in Yarin Gal’s PhD thesis: http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf

We calculate the uncertainty of the neural network predictions in the three ways proposed in Gal’s PhD thesis,
as presented at pag. 51-54:
  • variation_ratio: defined in Eq. 3.19
  • predictive_entropy: defined in Eq. 3.20
  • mutual_information: defined at pag. 53 (no Eq. number)
ai4materials.models.cnn_polycrystals.reshape_images(images, target_shape)[source]

Reshape images according to the target shape

ai4materials.models.cnn_polycrystals.train_neural_network(x_train, y_train, x_val, y_val, configs, partial_model_architecture, batch_size=32, nb_epoch=5, normalize=True, checkpoint_dir=None, neural_network_name='my_neural_network', training_log_file='training.log', early_stopping=False, data_augmentation=True)[source]

Train a neural network to classify crystal structures represented as two-dimensional diffraction fingerprints.

This model was introduced in [1].

x_train: np.array, [batch, width, height, channels]

[1]A. Ziletti, A. Leitherer, M. Scheffler, and L. M. Ghiringhelli, “Crystal-structure identification via Bayesian deep learning: towards superhuman performance”, in preparation (2018)

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.models.embedding module

ai4materials.models.l1_l0 module

ai4materials.models.l1_l0.choose_atomic_features(selected_feature_list=None, atomic_data_file=None, binary_data_file=None)[source]

Choose primary features for the extended lasso procedure.

ai4materials.models.l1_l0.classify_rs_zb(structure)[source]

Classify if a structure is rocksalt of zincblend from a list of NoMaD structure. (one json file). Supports multiple frames (TO DO: check that). Hard-coded.

rocksalt: atom_frac1 0.0 0.0 0.0 atom_frac2 0.5 0.5 0.5

zincblende: atom_frac1 0.0 0.0 0.0 atom_frac2 0.25 0.25 0.25

zincblende –> label=0 rocksalt –> label=1

ai4materials.models.l1_l0.combine_features(df=None, energy_unit=None, length_unit=None, metadata_info=None, allowed_operations=None, derived_features=None)[source]

Generate combination of features given a dataframe and a list of allowed operations.

For the exponentials, we introduce a characteristic energy/length converting the ..todo:: Fix under/overflow errors, and introduce handling of exceptions.

ai4materials.models.l1_l0.e_sqrt_z(row)[source]

Calculates e/sqrt(val_Z).

Es/sqrt(Zval) and Ep/sqrt(Zval) from Phys. Rev. B 85, 104104 (2012). Input Es(A) or Ep(A), val(A) (A–>B) They need to be given in this order.

ai4materials.models.l1_l0.get_energy_diff(chemical_formula_list, energy_list, label_list)[source]

Obtain difference in energy (eV) between rocksalt and zincblend structures of a given binary.

From a list of chemical formulas, energies and labels returns a dictionary with {material: delta_e} where delta_e is the difference between the energy with label 1 and energy with label 0, grouped by material. Each element of such list corresponds to a json file. The delta_e is exactly what reported in the PRL 114, 105503(2015).

Todo

Check if it works for multiple frames.

ai4materials.models.l1_l0.get_lowest_energy_structures(structure, dict_delta_e)[source]

Get lowest energy structure for each material and label type.

Works only with two possible labels for a given material.

Todo

Check if it works for multiple frames.

ai4materials.models.l1_l0.l1_l0_minimization(y_true, D, features, energy_unit=None, print_lasso=False, lambda_grid=None, lassonumber=25, max_dim=3, lambda_grid_points=100, lambda_max_factor=1.0, lambda_min_factor=0.001)[source]

Select an optimal descriptor using a combined l1-l0 procedure.

  1. step (l 1): Solve the LASSO minimization problem
\[argmin_c {||P-Dc||^2 + \lambda |c|_1}\]

for different lambdas, starting from a ‘high’ lambda. Collect all indices(Features) i appearing with nonzero coefficients c_i, while decreasing lambda, until size of collection equals lassonumber.

  1. step (l 0): Check the least-squares errors for all single features/pairs/triples/… of
    collection from 1. step. Choose the single/pair/triple/… with the lowest mean squared error (MSE) to be the best 1D/2D/3D-descriptor.

Parameters:

y_true : array, [n_samples]
Array with the target property (ground truth)
D : array, [n_samples, n_features]
Matrix with the data.
features : list of strings
List of feature names. Needs to be in the same order as the feature vectors in D
dimrange : list of int
Specify for which dimensions the optimal descriptor is calculated. It is the number of feature vectors used in the linear combination
lassonumber : int, default 25
The number of features, which will be collected in ther l1-step
lamdba_grid_points : int, default 100
Number of lamdbas between lamdba_max and lambdba_min for which the l1-problem shall be solved. Sometimes a denser grid could be needed, if the lamda-steps are too high. This can be checked with ‘print_lasso’. lamdba_max and lamdba_min are chosen as in Tibshirani’s paper “Regularization Paths for Generalized Linear Models via Coordinate Descent”. The values in between are generated on the log scale.
lambda_min_factor : float, default 0.001
Sets lam_min = lambda_min_factor * lam_max.
lambda_max_factor : float, default 1.0
Sets calculated lam_max = lam_max * lambda_max_factor.
print_lasso: bool, default True
Prints the indices of coulumns of D with nonzero coefficients for each lambda.
lambda_grid: array
The list/array of lambda values for the l1-problem can be chosen by the user. The list/array should start from the highest number and lambda_i > lamda_i+1 should hold. (?) lambda_grid_point is then ignored. (?)

Returns:

list of panda dataframes (D’, c’, selected_features) :

A list of tuples (D’,c’,selected_features) for each dimension. selected_features is a list of strings. D’*c’ is the selected linear model/fit where the last column of D is a vector with ones.

References:

[1]Luca M. Ghiringhelli, Jan Vybiral, Sergey V. Levchenko, Claudia Draxl, and Matthias Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor” Phys. Rev. Lett. 114, 105503 (2015)
ai4materials.models.l1_l0.r_pi(row)[source]

Calculates r_pi.

John-Bloch’s indicator2: |rp(A) - rs(A)| +| rp(B) -rs(B)| from Phys. Rev. Lett. 33, 1095 (1974). Input rp(A), rs(A), rp(B), rs(B) They need to be given in this order. combine_features

ai4materials.models.l1_l0.r_sigma(row)[source]

Calculates r_sigma.

John-Bloch’s indicator1: |rp(A) + rs(A) - rp(B) -rs(B)| from Phys. Rev. Lett. 33, 1095 (1974).

Input rp(A), rs(A), rp(B), rs(B) They need to be given in this order.

ai4materials.models.l1_l0.write_atomic_features(structure, selected_feature_list, df, dict_delta_e=None, path=None, filename_suffix='.json', json_file=None)[source]

Given the chemical composition, build the descriptor made of atomic features only.

Includes all the frames in the same json file.

Todo

Check if it works for multiple frames.

ai4materials.models.sis module

class ai4materials.models.sis.SIS(P, D, feature_list, feature_unit_classes=None, target_unit='eV', control=None, output_log_file='/home/beaker/.beaker/v1/web/tmp/output.log', rm_existing_files=False, if_print=True, check_only_control=False)[source]

Bases: object

Python interface with the fortran SIS+(Sure Independent Screening)+L0/L1L0 code.

The SIS+(Sure Independent Screening)+L0/L1L0 is a greedy algorithm. It enhances the OMP, by considering not only the closest feature vector to the residual in each step, but collects the closest ‘n_SIS’ features vectors. The final model is then built after a given number of iterations by determining the (approximately) best linear combination of the collected features using the L0 (L1-L0) algorithm.

To execute the code, besides the SIS code parameters also folder paths are needed as well as account information of a remote machine to let the code be executed on it.

P : array, [n_sample]; list; [n_sample]
P refers to the target (label). If ptype = ‘quali’ list of ints is required
D : array, [n_sample, n_features]
D refers to the feature matrix. The SIS code calculates algebraic combinations of the features and then applies the SIS+L0/L1L0 algorithm.
feature_list : list of strings
List of feature names. Needs to be in the same order as the feature vectors (columns) in D. Features must consist of strings which are in F_unit (See above).
feature_unit_classes : None or {list integers or the string: ‘no_unit’}
integers correspond to the unit class of the features from feature_list. ‘no_unit’ is reserved for dimensionless unit.
output_log_file : string
file path for the logger output.
rm_existing_files : bool
If SIS_input_path on local or remote machine (remote_input_path) exists, it is removed. Otherwise it is renamed to SIS_input_path_$number.
control : dict of dicts (of dicts)
Dict tree: {

‘local_paths’: { ‘local_path’:str, ‘SIS_input_folder_name’:str}, (‘local_run’,’remote_run’) : (

{‘SIS_code_path’:str, ‘mpi_command’:str}, {‘SIS_code_path’:str, ‘username’:str, ‘hostname’:str, ‘remote_path’:str, ‘eos’:bool, ‘mpi_command’:str, ‘nodes’:int, (‘key_file’, ‘password’):(str,str)}

), ‘parameters’ : {‘n_comb’:int, ‘n_sis’:int, ‘max_dim’:int, ‘OP_list’:list}, ‘advanced_parameters’ : {‘FC’:FC_dic,’DI’:DI_dic, ‘FCDI’:FCDI_dic}

} Here the tuples (.,.) mean that one and only one of the both keys has to be set. To see forms of FC_dic, DI_dic, FCDI_dic check FC_tuplelist, DI_tuplelist and FCDI_tuplelist above in PARAMETERS REFERENCE.

start : -
starts the code
get_results : list [max_dim] of dicts {‘D’, ‘coefficients’, ‘P_pred’}
get_results[model_dim-1][‘D’] : pandas data frame [n_sample, model_dim+1]
Descriptor matrix with the columns being algebraic combinations of the input feature matrix. Column names are thus strings of the algebraic combinations of strings of inout feature_list. Last column is full of ones corresponding to the intercept
get_results[model_dim-1][‘coefficients’] : array [model_dim+1]
Optimizing coefficients.
get_results[model_dim-1][‘P_pred’] : array [m_sample]
Fit : np.dot( np.array(D), coefficients)

For remote_run the library nomad_sim.ssh_code is needed. If remote machine is eos, in dict control[‘remote_run’] the (key:value) ‘eos’:True has to be set. Then set for example in addition ‘nodes’:1 and ‘mpi_run -np 32’ can be set.

Paths (say name: path) are all set in the intialization part with self.path and used in other functions with self.path. In general the other variables are directly passed as arguements to the functions. There are a few exceptions as self.ssh.

# >>> import numpy as np # >>> from nomad_sim.SIS import SIS # >>> ### Specify where on local machine input files for the SIS fortran code shall be created # >>> Local_paths = { # >>> ‘local_path’ : ‘/home/beaker/’, # >>> ‘SIS_input_folder_name’ : ‘SIS_input’, # >>> } # >>> # Information for ssh connection. Instead of password also ‘key_file’ for rsa key # >>> # file path is possible. # >>> Remote_run = { # >>> ‘mpi_command’:’’, # >>> ‘remote_path’ : ‘/home/username/’, # >>> ‘SIS_code_path’ : ‘/home/username/SIS_code/’, # >>> ‘hostname’ :’hostname’, # >>> ‘username’ : ‘username’, # >>> ‘password’ : ‘XXX’ # >>> } # >>> # Parameters for the SIS fortran code. If at each iteration a different ‘OP_list’ # >>> # shall be used, set a list of max_dim lists, e.g. [ [‘+’,’-‘,’*’], [‘/’,’*’] ], if # >>> # n_comb = 2 # >>> Parameters = { # >>> ‘n_comb’ : 2, # >>> ‘OP_list’ : [‘+’,’|-|’,’-‘,’*’,’/’,’exp’,’^2’], # >>> ‘max_dim’ : 2, # >>> ‘n_sis’ : 10 # >>> } # >>> # Final control dict for the SIS class. Instead of remote_run also local_run can be set # >>> # (with different keys). Also advanced_parameters can be set, but should be done only # >>> # if the parameters of the SIS fortran code are understood. # >>> SIS_control = {‘local_paths’:Local_paths, ‘remote_run’:Remote_run, ‘parameters’:Parameters} # >>> # Target (label) vector P , feature_list, feature matrix D. The values are made up. # >>> P = np.array( [1,2,3,-2,-9] ) # >>> feature_list=[‘r_p(A)’,’r_p(B)’, ‘Z(A)’] # >>> D = np.array([[7,-11,3], # >>> [-1,-2,4], # >>> [2,20,3], # >>> [8,1,8], # >>> [-3,4,1]]) # >>> # Use the code # >>> sis = SIS(P,D,feature_list, control = SIS_control, output_log_file =’/home/ahmetcik/codes/beaker/output.log’) # >>> sis.start() # >>> results = sis.get_results() # >>> # >>> coef_1dim = results[0][‘coefficients’] # >>> coef_2dim = results[1][‘coefficients’] # >>> D_1dim = results[0][‘D’] # >>> D_2dim = results[1][‘D’] # >>> print coef_2dim # [-3.1514 -5.9171 3.9697] # >>> # >>> print D_2dim # ((rp(B)/Z(A))/(rp(A)+rp(B))) ((Z(A)/rp(B))/(rp(B)*Z(A))) intercept # 0 0.916670 0.008264 1.0 # 1 0.166670 0.250000 1.0 # 2 0.303030 0.002500 1.0 # 3 0.013889 1.000000 1.0 # 4 4.000000 0.062500 1.0 # #

ask_periodically(sc, seconds, counter, username)[source]

Recursive function that runs periodically (each seconds) the function self.check_status.

check_(k)[source]
check_DI(file_path)[source]

Check DI.out, if calculation has finished.

check_FC(file_path)[source]

Check FC.out, if calculation has finished and feature space_sizes.

calc_finished : bool
If calculation finished there shoul be a ‘Have a nice day !’.
featurespace : integer
Total feature space size generated, before the redundant check.
n_collected : integer
The number of features collected in the current iteration. Should be n_sis.
check_OP_list(control)[source]

Checks form and items of control[‘parameters’][‘OP_list’].

control[‘parameters’][‘OP_list’] must be a list of operations strings or list of n_comb lists of operation strings. Furthermore if operation strings are item of available_OPs (see above) is checked.

control : dict

control : with manipulated control[‘parameters’][‘OP_list’]

check_OP_strings(OPs)[source]

Check if all items of OPs are items of available_OPs

check_arrays(P_in, D, feature_list, feature_unit_classes, ptype)[source]

Check arrays/list P, D and feature_list

check_control(par_in, par_ref, par_in_path)[source]

Recursive Function to check input control dict tree.

If for example check_control(control,control_ref,’control’) function goes through dcit tree control and compares with control_ref if correct keys (mandotory, not_mandotory, typos of key string) are set and if values are of correct type or of optional list. Furthermore it gives Errors with hints what is wrong, and what is needed.

par_in : any key
if par_in is dict, then recursion.
par_ref: any key
Is compared to par_in, if of same time. If par_in and par_key are dict, alse keys are compared.
par_in_path: string
Gives the dict tree path where, when error occurs, e.g. control[key_1][key_2]… For using function from outside start with name of input dict, e.g. ‘control’
check_feature_space_size(feature_list, n_target=5, upper_bound=300000000)[source]
check_feature_units(feature_unit_classes)[source]

Check feature units

Checks which

feature_unit_classes : list integers
list must be sorted.
unit_strings : list of strings
In the form [‘(1:3)’,’(4:8)’,..], where the indices start from 1,
check_files(iter_folder_name, dimension)[source]

Check which file is missing and maybe why.

This function, if something went wrong to find out where the problem occured. Returns an error string.

check_keys(par_in, par_ref, par_in_path)[source]

Compares the dicts par_in and par_ref.

Collects which keys are missing (only if keys are not in not_mandotary) amd
whcih keys are not expected (if for example there is a typo).

If there are missing or not expected ones, error message with missing/not expected ones.

par_in : dict

par_ref : dict

par_in_path : string
Dictionary path string for error message, e.g ‘control[key_1][key_2]’.
check_l0_steps(max_dim, n_sis, upper_limit=10000)[source]

Check if number of l0 steps is larger then a upper_limit

check_quali_dim(control)[source]

Check if quali then also desc_dim=2

check_status(filename, username)[source]

Check if calculation on eos is finished

Parameters filename: str

qstat will be written into this file. The file will be then read.
username: str
search in filename for this username. If not appears calculation is finished.
status : bool
True if calculations is still running.
check_type(par_in, par_ref, par_in_path, if_also_none=False)[source]

Check type of par_in and par_ref.

If par_ref is tuple, par_in must be item of par_ref: else: they must have same type.

convert_2_fortran(parameter, parameter_value)[source]

Convert parameters to SIS fortran code style.

Converts e.g. True to string ‘.true.’ or a string ‘s’ to “‘s’”, and other special formats. Returns the converted parameter.

convert_feature_strings(feature_list)[source]

Convert feature strings.

Puts an ‘sr’ for reals and an ‘si’ for integers at the beginning of a string. Returns the list with the changed strings.

do_transfer(ssh=None, eos=None, username=None, CPUs=None)[source]

Run the calcualtion on remote machine

First checks if already folder self.remote_input_path exists on remote machine, if yes it deletes or renames it. Then copies file system self.SIS_input_path with SIS fortran code files into the folder self.remote_input_path. Finally lets run the calculations on remote machine and copy back the file system with results. If eos, writes submission script, submits script and checks qstat if calculation finished.

ssh : object
Must be from code nomad_sim.ssh_code.
eos : bool
If remote machine is eos. To write submission script and submit …
username: string
needed to check qstat on eos
CPUs : int
To reserve the write number of CPUs in the eos submission script
estimate_calculation_expense(feature_list)[source]

Check the expense of the SIS+l0 calculations

estimate_feature_space(n_comb, n_features, ops, rate=1.0, n_comb_start=0)[source]
flatten(list_in)[source]

Returns the list_in collapsed into a one dimensional list

list_in : list/tuple of lists/tuples of …

get_OPs(OP_list)[source]

Conver OP_list to special format for SIS fortran input.

get_arrays_of_top_descriptors(top_indices)[source]
get_des(x)[source]

Change the descriptor strings read from the output DI.out. Remove characters as ‘:’ ‘si’, ‘sr’. Then convert feature strings for printing

get_indices_of_top_descriptors()[source]
get_next_size(n_features, ops)[source]
get_results(ith_descriptor=0)[source]

Attribute to get results from the file system.

ith_descriptor: int
Return the ith best descriptor.

out : list [max_dim] of dicts {‘D’, ‘coefficients’, ‘P_pred’}

out[model_dim-1][‘D’] : pandas data frame [n_sample, model_dim+1]
Descriptor matrix with the columns being algebraic combinations of the input feature matrix. Column names are thus strings of the algebraic combinations of strings of inout feature_list. Last column is full of ones corresponding to the intercept
out[model_dim-1][‘coefficients’] : array [model_dim+1]
Optimizing coefficients.
out[model_dim-1][‘P_pred’] : array [m_sample]
Fit : np.dot( np.array(D) , coefficients)
get_strings_of_top_descriptors(top_indices)[source]
get_type(value)[source]
get_value_from_dic(dictionary, key_tree_path)[source]

Returns value of the dict tree

dictionary: dict or ‘dict tree’ as control_ref
dict_tree is when key is tuple of keys and value is tuple of corresponding values.
key_tree_path: list of keys
Must be in the correct order beginning from the top of the tree/dict.

# Examples # ——– # >>> print get_value_from_dic[control_ref, [‘local_run’,’SIS_code_path’]] # <type ‘str’>

manipulate_descriptor_string(d)[source]
ncr(n, r)[source]

Binomial coefficient

read_results(iter_folder_name, dimension, task, tsizer)[source]

Read results from DI.out.

iter_folder : string
Name of the iter_folder the outputs of the corresponding iteration of SIS+l1/l1l0, e.g. ‘iter01’, ‘iter02’.
dimension : integer
DI.out provides for example in iteration three 1-3 dimensionl descriptors. Here choose which dimension should be returned.
task : integer < 100
For multi task, must be worked on.
tsizer : integer
Number of samples, e.g. number ofrows of D or P.
RMSE : float
Root means squares error of model
Des : list of strings
List of the descriptors
coef : array [model_dim+1]
Coefficients including the intercept
D : array [n_sample, model_dim+1]
Matrix with columns being the selected features (descriptors) for the model. The last column is full of ones corresponding to the intercept
read_results_quali()[source]

Read results for 2D desriptor from calculations with qualitative run.

results: list of lists
Each sublist characterizes separate model (if multiple model have same score/cost all of them are returned). Sublist contains [descriptor_strings, D, n_overlap] where D (D.shape = (n_smaple,2)) is array with descriptor vectors.
return_OP_error()[source]

Error message if control[‘parameters’][‘OP_list’] has wrong form

set_SIS_parameters(desc_dim=2, subs_sis=100, rung=1, opset=['+', '-', '/', '^2', 'exp'], ptype='quanti', advanced_parameters=None)[source]

Set the SIS fortran code parameters

If advanced parameters is passed, they will be used, otherwise default values will be used. Also max_dim, n_sis, n_comb, and OP_list can be overwritten by advanced_parameters if specified.

set_local_run(SIS_code_path='~/codes/SIS_code/', mpi_command='')[source]

Set and check local enviroment if local_run is used.

set_logger(output_log_file)[source]

Set logger for outputs as errors, warnings, infos.

set_main_settings(P, D, feature_list, feature_unit_classes, local_path='/home/beaker/', SIS_input_folder_name='input_folder')[source]

Set local environment and P, D and feature_list.

set_ssh_connection(hostname=None, username=None, port=22, key_file=None, password=None, remote_path=None, SIS_code_path=None, eos=False, nodes=1, mpi_command='')[source]

Set ssh connection. Set and check remote enviroment if remote_run is used.

start()[source]

Attribute which starts the calculations after init.

string_descriptor(RMSE, features, coefficients, target_unit)[source]

Make string for output in the terminal with model and its RMSE.

write_P_D(P, D, feature_list)[source]

Writes ‘train.dat’ as SIS fortran code input with P, D and feature strings

write_parameters()[source]

Write parameters into the SIS fortran code input files. Convert the parameters into the special format before.

write_submission_script(CPUs)[source]

writes eos job submission script.

ai4materials.models.sis.converted_2_standard = {'disA': 'd(A)', 'disAB': 'd(AB)', 'disB': 'd(B)', 'eaA': 'EA(A)', 'eaB': 'EA(B)', 'ebA': 'E_b(A)', 'ebAB': 'E_b(AB)', 'ebB': 'E_b(B)', 'hlgapA': 'HL_gap(A)', 'hlgapAB': 'HL_gap(AB)', 'hlgapB': 'HL_gap(B)', 'homoA': 'E_HOMO(A)', 'homoB': 'E_HOMO(B)', 'ipA': 'IP(A)', 'ipB': 'IP(B)', 'lumoA': 'E_LUMO(A)', 'lumoB': 'E_LUMO(B)', 'periodA': 'period(A)', 'periodB': 'period(B)', 'rdA': 'r_d(A)', 'rdB': 'r_d(B)', 'rpA': 'r_p(A)', 'rpB': 'r_p(B)', 'rpiAB': 'r_pi(AB)', 'rsA': 'r_s(A)', 'rsB': 'r_s(B)', 'rsigmaAB': 'r_sigma(AB)', 'valA': 'Z_val(A)', 'valB': 'Z_val(B)', 'zA': 'Z(A)', 'zB': 'Z(B)'}

Set logger for outputs as errors, warnings, infos.

ai4materials.models.strided_pattern_matching module

Module contents

Neural network interpretation

Understanding why a machine learning algorithm arrives at the classification decision is of paramount importance, especially in the natural sciences. For deep learning models this is particularly challenging because of their tendency to represent information in a highly distributed manner, and the presence of non-linearities in the network’s layers.

Here we provide a materials science use case of interpretable machine learning for crystal-structure classification from Ziletti et al. (2018) [1].

Example: attentive response maps in deep-learning-driven crystal recognition

This example shows how to identify the regions in the image that are the most important in the neural network’s classification decision. In particular, attentive response maps are calculated using the fractionally strided convolutional technique introduced by Zeiler and Fergus (2014) [2], and applied for the first time in materials science by Ziletti et al. (2018) [1].

The steps performed in the code below are the following:

  • define the folders where the results are going to be saved
  • build four crystal structures (bcc, fcc, diam, hcp) using the ASE package
  • create a pristine supercell using the function ai4materials.utils.utils_crystals.create_supercell
  • calculate the two-dimensional diffraction fingerprint for all four crystal structures (a RGB image) with from ai4materials.descriptors.diffraction2d.Diffraction2D
  • obtain the attentive response maps for each diffraction fingerprints with ai4materials.interpretation.deconv_resp_maps.plot_att_response_maps. These identify the parts of the image that are more important in the classification decision.
from ase.spacegroup import crystal
from ai4materials.descriptors.diffraction2d import Diffraction2D
from ai4materials.interpretation.deconv_resp_maps import plot_att_response_maps
from ai4materials.utils.utils_config import get_data_filename
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_config import setup_logger
from ai4materials.utils.utils_crystals import create_supercell
import numpy as np
import os.path

# set configs
configs = set_configs(main_folder='./nn_interpretation_ai4materials/')
logger = setup_logger(configs, level='INFO', display_configs=False)

# setup folder and files
# checkpoint_folder = os.path.join(configs['io']['main_folder'], 'saved_models')
figure_folder = os.path.join(configs['io']['main_folder'], 'attentive_resp_maps')

# build crystal structures
fcc_al = crystal('Al', [(0, 0, 0)], spacegroup=225, cellpar=[4.05, 4.05, 4.05, 90, 90, 90])
bcc_fe = crystal('Fe', [(0, 0, 0)], spacegroup=229, cellpar=[2.87, 2.87, 2.87, 90, 90, 90])
diamond_c = crystal('C', [(0, 0, 0)], spacegroup=227, cellpar=[3.57, 3.57, 3.57, 90, 90, 90])
hcp_mg = crystal('Mg', [(1. / 3., 2. / 3., 3. / 4.)], spacegroup=194, cellpar=[3.21, 3.21, 5.21, 90, 90, 120])

# create supercells - pristine
fcc_al_supercell = create_supercell(fcc_al, target_nb_atoms=128, cell_type='standard_no_symmetries')
bcc_fe_supercell = create_supercell(bcc_fe, target_nb_atoms=128, cell_type='standard_no_symmetries')
diamond_c_supercell = create_supercell(diamond_c, target_nb_atoms=128, cell_type='standard_no_symmetries')
hcp_mg_supercell = create_supercell(hcp_mg, target_nb_atoms=128, cell_type='standard_no_symmetries')

ase_atoms_list = [fcc_al_supercell, bcc_fe_supercell, diamond_c_supercell, hcp_mg_supercell]

# calculate the two-dimensional diffraction fingerprint for all four structures
descriptor = Diffraction2D(configs=configs)
diffraction_fingerprints_rgb = [descriptor.calculate(ase_atoms).info['descriptor']['diffraction_2d_intensity'] for ase_atoms in ase_atoms_list]

model_weights_file = get_data_filename('data/nn_models/ziletti_et_2018_rgb.h5')
model_arch_file = get_data_filename('data/nn_models/ziletti_et_2018_rgb.json')

# convert list of diffraction fingerprint images to to numpy array
# images needs to be a numpy array with shape (n_images, dim1, dim2, channels)
images = np.asarray(diffraction_fingerprints_rgb)

plot_att_response_maps(images, model_arch_file, model_weights_file, figure_folder, nb_conv_layers=6, nb_top_feat_maps=4,
                       layer_nb='all', plot_all_filters=False, plot_filter_sum=True, plot_summary=True)

In each image below we show:

  • (left) original image to be classified corresponding to the two-dimensional diffraction fingerprint of a given structure
  • (center) attentive response maps from the top four most activated filters (red channel) for the diffraction fingerprint. The brighter the pixel, the most important is that location for classification
  • (right) sum of the last convolutional layer attentive response maps

for the case of a face-centered-cubic structure:

_images/attentive_resp_maps_fcc_red.png

and a body-centered-cubic structure:

_images/attentive_resp_maps_bcc_red.png

From the attentive response maps (center), we notice that the convolutional neural network filters are composed in a hierarchical fashion, increasing their complexity from one layer to another. At the third convolutional layer, the neural network discovers that the diffraction peaks, and their relative arrangement, are the most effective way to predict crystal classes (as a human expert would do). Furthermore, from the sum of the last convolutional layer attentive response maps, we observe that the neural network learned crystal templates automatically from the data.

[1](1, 2) A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli, “Insightful classification of crystal structures using deep learning,” Nature Communications, vol. 9, pp. 2775, 2018. [Link to article]
[2]D. M. Zeiler, and R. Fergus, “Visualizing and understanding convolutional networks,” European Conference on Computer Vision, Springer. pp. 818, 2014. [Link to article]

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Submodules

ai4materials.interpretation.deconv_resp_maps module

class ai4materials.interpretation.deconv_resp_maps.DeconvNet(model)[source]

Bases: object

DeconvNet class. Code taken from: https://github.com/tdeboissiere/DeepLearningImplementations/blob/master/DeconvNet/KerasDeconv.py

get_deconv(X, target_layer, feat_map=None)[source]
get_layers()[source]
ai4materials.interpretation.deconv_resp_maps.deconv_visualize(model, target_layer, input_data, nb_top_feat_maps)[source]

Obtain attentive response maps back-projected to image space using transposed convolutions (sometimes referred as deconvolutions in machine learning).

Parameters:

model: instance of the Keras model
The ConvNet model to be used.
target_layer: str
Name of the layer for which we want to obtain the attentive response maps. The names of the layers are defined in the Keras instance model.
input_data: ndarray
The image data to be passed through the network. Shape: (n_samples, n_channels, img_dim1, img_dim2)
nb_top_feat_maps: int
Top-n filter you want to visualize, e.g. nb_top_feat_maps = 25 will visualize top 25 filters in target layer

Code author: Devinder Kumar <d22kumar@uwaterloo.ca>

ai4materials.interpretation.deconv_resp_maps.get_deconv_imgs(img_index, data, dec_layer, target_layer, feat_maps)[source]

Return the attentive response maps of the images specified in img_index for the target layer and feature maps specified in the arguments.

Parameters:

img_index: list or ndarray
Array or list of index. These are the indices of the images (contained in data) for which we want to obtain the attentive response maps.
data: ndarray
The image data. Shape : (n_samples, n_channels, img_dim1, img_dim2)
Dec: instance of class ai4materials.interpretation.deconv_resp_maps.DeconvNet
DeconvNet model: instance of the DeconvNet class
target_layer: str
Name of the layer for which we want to obtain the attentive response maps. The names of the layers are defined in the Keras instance model.
feat_map: int
Index of the attentive response map to visualise.

Code author: Devinder Kumar <d22kumar@uwaterloo.ca>

ai4materials.interpretation.deconv_resp_maps.get_max_activated_filter_for_layer(target_layer, model, input_data, nb_top_feat_maps, img_index)[source]

Find the indices of the most activated filters for a given image in the specified target layer of a Keras model.

Parameters:

target_layer: str
Name of the layer for which we want to obtain the attentive response maps. The names of the layers are defined in the Keras instance model.
model: instance of the Keras model
The ConvNet model to be used.
input_data:
input_data: ndarray The image data to be passed through the network. Shape: (n_samples, n_channels, img_dim1, img_dim2)
nb_top_feat_maps:
Number of the top attentive response maps to be calculated and plotted. It must be <= to the minimum number of filters used in the neural network layers. This is not checked by the code, and respecting this criterion is up to the user.
img_index: list or ndarray
Array or list of index. These are the indices of the images (contained in data) for which we want to obtain the attentive response maps.
Returns: list of int
List containing the indices of the filters with the highest response (activation) for the given image.

Code author: Devinder Kumar <d22kumar@uwaterloo.ca>

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.interpretation.deconv_resp_maps.load_model(model_arch_file, model_weights_file)[source]

Load Keras model from .json and .h5 files

ai4materials.interpretation.deconv_resp_maps.plot_att_response_maps(data, model_arch_file, model_weights_file, figure_dir, nb_conv_layers, layer_nb='all', nb_top_feat_maps=4, filename_maps='attentive_response_maps', cmap=<matplotlib.colors.LinearSegmentedColormap object>, plot_all_filters=False, plot_filter_sum=True, plot_summary=True)[source]

Plot attentive response maps given a Keras trained model and input images.

Parameters:

data: ndarray, shape (n_images, dim1, dim2, channels)
Array of input images that will be used to calculate the attentive response maps.
model_arch_file: string
Full path to the model architecture file (.json format) written by Keras after the neural network training. This is used by the load_model function to load the neural network architecture.
model_weights_file: string
Full path to the model weights file (.h5 format) written by Keras after the neural network training . This is used by the load_model function to load the neural network architecture.
figure_dir: string
Full path of the directory where the images resulting from the transposed convolution procedure will be saved.
nb_conv_layers: int
Numbers of Convolution2D layers in the neural network architecture.
layer_nb: list of int, or ‘all’
List with the layer number which will be deconvolved starting from 0. E.g. layer_nb=[0, 1, 4] will deconvolve the 1st, 2nd, and 5th convolution2d layer. Only up to 6 conv_2d layers are supported. If ‘all’ is selected, all conv_2d layers will be deconvolved, up to nb_conv_layers.
nb_top_feat_maps: int
Number of the top attentive response maps to be calculated and plotted. It must be <= to the minimum number of filters used in the neural network layers. This is not checked by the code, and respecting this criterion is up to the user.
filename_maps: str
Base filename (without extension and path) of the files where the attentive response maps will be saved.
cmap: Matplotlib cmap, optional, default=`cm.hot`
Type of coloring for the heatmap, if images are greyscale. Possible cmaps can be found here: https://matplotlib.org/examples/color/colormaps_reference.html If images are RGB, then an RGB color map is used. The RGB colormap can be found at ai4materials.utils.utils_plotting.rgb_colormaps.
plot_all_filters: bool
If True, plot and save the nb_top_feat_maps for each layer. The files will be saved in different folders according to the layer: - “convolution2d_1” for the 1st layer - “convolution2d_2” for the 2nd layer etc.
plot_filter_sum: bool
If True, plot and save the sum of all the filters for a given layer.
plot_summary: bool
If True, plot and save a summary figure containing: (left) input image (center) nb_top_feat_maps filters for each deconvolved layer (right) sum of the all filters of the last layer If set to True, also plot_filter_sum must be set to True.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

Module contents

Visualization

The ai4materials Viewer combines Bokeh (interactive visualization of large dataset) and jsmol (3D visualization of chemical structures) to allow the interactive exploration of materials science datasets. Users can visualize crystal structures and properties of - a possibly large number of - materials in one webpage, interactively.

Below we present an example of how to create an interactive Viewer using ai4materials. The code below allows to generate an interactive plot of the results of Ref. [1], in particular Fig. 2 in the article.

from ai4materials.visualization.viewer import Viewer
from ai4materials.utils.utils_config import set_configs
from ai4materials.utils.utils_data_retrieval import read_ase_db
from ai4materials.visualization.viewer import read_control_file
from ai4materials.utils.utils_config import get_data_filename
import pandas as pd
import webbrowser

# read data: crystal structures, information on the plot, name of the axis
ase_db_file_binaries = get_data_filename('data/db_ase/binaries_lowest_energy_ghiringhelli2015.json')
results_binaries_lasso = get_data_filename('data/viewer_files/l1_l0_dim2_for_viewer.csv')
control_file_binaries = get_data_filename('data/viewer_files/binaries_control.json')

configs = set_configs()

ase_atoms_binaries = read_ase_db(db_path=ase_db_file_binaries)

# from the table, extract the coordinates for the plot, the true and the predicted value
df_viewer = pd.read_csv(results_binaries_lasso)
x = df_viewer['coord_0']
y = df_viewer['coord_1']
target = df_viewer['y_true']
target_pred = df_viewer['y_pred']

# define titles in the plot
legend_title = 'Reference E(RS)-E(ZB)'
target_name = 'E(RS)-E(ZB)'
plot_title = 'SISSO(L0) structure map'

# create an instance if the ai4materials Viewer
viewer = Viewer(configs=configs)

# read x and y axis labels from control file
x_axis_label, y_axis_label = read_control_file(control_file_binaries)

# generate interactive plot
file_html_link, file_html_name = viewer.plot_with_structures(x=x, y=y, target=target, target_pred=target_pred,
                                                             ase_atoms_list=ase_atoms_binaries, target_unit='eV',
                                                             target_name=target_name, legend_title=legend_title,
                                                             is_classification=False, x_axis_label=x_axis_label,
                                                             y_axis_label=y_axis_label, plot_title=plot_title,
                                                             tmp_folder=configs['io']['tmp_folder'])

# open the interactive plot in a web browser
webbrowser.open(file_html_name)

This is a screenshot of the interactive ai4materials Viewer generated with the code above:

_images/viewer_binaries_example.png

Implementation details of the ai4materials Viewer can be found at ai4materials.visualization.viewer. In some systems, Google Chrome will not correctly to load jmol, so you will be able to load the interactive plot, but not to explore crystal structures in 3D. To exploit all functionalities of the ai4materials Viewer, we recommend to use Firefox; in particular, the Viewer was tested on Firefox Quantum 61.0.1.

[1]L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor,” Physical Review Letters, vol. 114, no. 10, p. 105503 . [Link to article]

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Submodules

ai4materials.visualization.viewer module

Module contents

Utils

This package contains utility functions for data analytics applied to materials science data. Specifically,

Utils crystals

This package contains functions to build pristine and defective supercells, starting from ASE (Atomistic Simulation Environment) Atoms object [link]. It also allows to obtain the spacegroup of a given structure, or to get the standard conventional cell (using Pymatgen).

Pristine and defective supercell generation

The main functions available to modify crystal structures are:

For additional details on each function, see their respective descriptions below.

Example: pristine supercell creation

Starting from a given ASE structure, the script below uses ai4materials.utils.utils_crystals.create_supercell to generate a supercell of (approximately) 128 atoms:

from ase.io import write
from ase.build import bulk
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
from ai4materials.utils.utils_crystals import create_supercell
cu_fcc = bulk('Cu', 'fcc', a=3.6, orthorhombic=True)
supercell_cu_fcc =  create_supercell(cu_fcc, create_replicas_by='nb_atoms', target_nb_atoms=128)
write('cu_fcc.png', cu_fcc)
write('cu_fcc_supercell.png', supercell_cu_fcc)

This is the original structure:

_images/cu_fcc.png

and this is the supercell obtained replicating the unit cells up to a target number of atoms (target_nb_atoms)

_images/cu_fcc_supercell.png

Example: defective supercell creation

Starting from a given ASE structure, the script below uses ai4materials.utils.utils_crystals.create_vacancies to generate a defective supercell of (approximately) 128 atoms with 25% vacancies:

from ase.io import write
from ase.build import bulk
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
from ai4materials.utils.utils_crystals import create_vacancies
cu_fcc = bulk('Cu', 'fcc', a=3.6, orthorhombic=True)
supercell_vac25_cu_fcc =  create_vacancies(cu_fcc, target_vacancy_ratio=0.25, create_replicas_by='nb_atoms', target_nb_atoms=128)
write('cu_fcc.png', cu_fcc)
write('cu_fcc_supercell_vac25.png', supercell_vac25_cu_fcc)
_images/cu_fcc_supercell_vac25.png

Similarly, it is possible to generate a supercell with randomly displaced atoms with ai4materials.utils.utils_crystals.random_displace_atoms. In the script below, we generate a defective supercell of (approximately) 200 atoms with displacements sampled from a Gaussian distribution with standard deviation of 0.5 Angstrom:

from ase.io import write
from ase.build import bulk
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
from ai4materials.utils.utils_crystals import random_displace_atoms
cu_fcc = bulk('Cu', 'fcc', a=3.6, orthorhombic=True)
supercell_rand_disp_cu_fcc =  random_displace_atoms(cu_fcc, displacement=0.5, create_replicas_by='nb_atoms', noise_distribution='gaussian', target_nb_atoms=256)
write('cu_fcc.png', cu_fcc)
write('supercell_rand_disp_cu_fcc_05A.png', supercell_rand_disp_cu_fcc)
_images/supercell_rand_disp_cu_fcc_05A.png

Section author: Angelo Ziletti <angelo.ziletti@gmail.com>

Submodules

ai4materials.utils.unit_conversion module

Module for unit conversion routines. Currently uses the Pint unit conversion library (https://pint.readthedocs.org) to do the conversions.

Any new units and constants can be added to the text files “units.txt” and “constants.txt”.

NOTE: this is taken from python-common in nomad-lab-base. It is copied here to remove the dependency from nomad-lab-base. For more info on python-common visit: https://gitlab.mpcdf.mpg.de/nomad-lab/python-common

The author of this code is: Dr. Fawzi Roberto Mohamed E-mail: mohamed@fhi-berlin.mpg.de

class ai4materials.utils.unit_conversion.LazyF(unit, target_unit)[source]

Bases: future.types.newobject.newobject

helper class for lazy evaluation of conversion function

ai4materials.utils.unit_conversion.convert_unit(value, unit, target_unit=None)[source]

Converts the given value from the given units to the target units. For examples see the bottom section.

Args:
value: The numeric value to be converted. Accepts integers, floats,
lists and numpy arrays
unit: The units that the value is currently given in as a string. All
units that have a corresponding declaration in the “units.txt” file and combinations like “meter*second**-2” are supported.
target_unit: The target unit as string. Same rules as for the unit
argument. If this argument is not given, SI units are assumed.
Returns:
The given value in the target units. returned as the same data type as the original values.

Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>

ai4materials.utils.unit_conversion.convert_unit_function(unit, target_unit=None)[source]

Returns a function that converts scalar floats from unit to target_unit if any of the unit are user defined (usr*), then the conversion is done lazily at the first call (i.e. user defined conversions might be undefined when calling this)

For more details see the convert_unit function. Could be optimized a bit caching the pint quantities

Args:
unit: The units that the value is currently given in as a string. All
units that have a corresponding declaration in the “units.txt” file and combinations like “meter*second**-2” are supported.
target_unit: The target unit as string. Same rules as for the unit
argument. If this argument is not given, SI units are assumed.
Returns:
The given value in the target units. returned as the same data type as the original values.

Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>

ai4materials.utils.unit_conversion.convert_unit_function_immediate(unit, target_unit=None)[source]

Returns a function that converts scalar floats from unit to target_unit All units need to be already known.

For more details see the convert_unit function. Could be optimized a bit caching the pint quantities

Args:
unit: The units that the value is currently given in as a string. All
units that have a corresponding declaration in the “units.txt” file and combinations like “meter*second**-2” are supported.
target_unit: The target unit as string. Same rules as for the unit
argument. If this argument is not given, SI units are assumed.
Returns:
The given value in the target units. returned as the same data type as the original values.

Code author: Fawzi Mohamed <mohamed@fhi-berlin.mpg.de>

ai4materials.utils.unit_conversion.register_userdefined_quantity(quantity, units, value=1)[source]

Registers a user defined quantity, valid until redefined. The value should be equal to value using units, with value defaulting to 1

ai4materials.utils.utils_binaries module
ai4materials.utils.utils_binaries.get_binaries_dict_delta_e(chemical_formula_list, energy_list, label_list, equiv_spgroups)[source]
ai4materials.utils.utils_binaries.get_chemical_formula_binaries(atoms)[source]
ai4materials.utils.utils_binaries.get_energy_diff_by_spacegroup(ase_atoms_list, target='energy_total', equiv_spgroups=None)[source]
ai4materials.utils.utils_binaries.get_target_diff_dic(df, sample_key=None, energy=None, spacegroup=None)[source]

Get a dictionary of dictionaries: samples -> space group tuples -> energy differences.

Dropping all rows which do not correspond to the minimum energy per sample AND space group, then making a new data frame with space groups as columns. Finally constructing the dictionary of dictionaries.

Parameters:

df: pandas data frame
with columns=[samples_title, energies_title, SG_title]
sample_key: string
Needs to be column title of samples of input df
energy: string
Needs to be column title of energies of input df
spacegroup : string
Needs to be column title of space groups of input df

Returns:

dic_out: dictionary of dictionaries:
In the form: { sample_a: { (SG_1,SG_2):E_diff_a12, (SG_1,SG_3):E_diff_a13,…}, sample_b: { (SG_1,SG_2):E_diff_b12, (SG_1,SG_3):E_diff_b13,… }, … } E_diff_a12 = energy_SG_1 - energy_SG_2 of sample a. Both (SG_1,SG_2) and (SG_2,SG_1) are considered. If SG_1 or SG_2 is NaN, energy difference to it is ignored.
ai4materials.utils.utils_binaries.select_diff_from_dic(dic, spacegroup_tuples, sample_key='Mat', drop_nan=None)[source]

Get data frame of selected spacegroup_tuples from dictionary of dictionaries.

Creating a pandas data frame with columns of samples and selected space group tuples (energy differnces).

Parameters:

dic: dict {samples -> space group tuples -> energy differences.}

spacegroup_tuples: tuple, list of tuples, tuples of tuples
Each tuple has to contain two space groups numbers, to be looked up in the input dic.
sample_key: string
Will be the column title of the samples of the created data frame
drop_nan: string, optional {‘rows’, ‘SG_tuples’}
Drops all rows or columns (SG_tuples) containing NaN.
ai4materials.utils.utils_config module
class ai4materials.utils.utils_config.SSH(hostname='172.17.0.3', username='tutorial', port=22, key_file='/home/beaker/docker.openmpi/ssh/id_rsa.mpi', password=None)[source]

Bases: object

SSH class to connect to the cluster to perform a calculation.

Code author: Emre Ahmetcik <ahmetcik@fhi-berlin.mpg.de>

close()[source]
command(cmd)[source]
exists(path)[source]
get(remotefile, localfile)[source]
get_all(remotepath, localpath)[source]
isdir(path)[source]
mkdir(path)[source]
open_file(filename)[source]
put(localfile, remotefile)[source]
put_all(localpath, remotepath)[source]
remove(path)[source]
rename(remotefile_1, remotefile_2)[source]
rm(path)[source]
sftp_walk(remotepath)[source]
ai4materials.utils.utils_config.copy_directory(src, dest)[source]
ai4materials.utils.utils_config.get_data_filename(resource, package='ai4materials')[source]

Rewrite of pkgutil.get_data() that return the file path.

Taken from: https://stackoverflow.com/questions/5003755/how-to-use-pkgutils-get-data-with-csv-reader-in-python

ai4materials.utils.utils_config.get_metadata_info()[source]

Get the descriptor metadata info

ai4materials.utils.utils_config.overwrite_configs(configs, dataset_folder=None, desc_folder=None, main_folder=None, tmp_folder=None)[source]
ai4materials.utils.utils_config.set_configs(main_folder='./', config_file=None)[source]
ai4materials.utils.utils_config.setup_logger(configs=None, level=None, display_configs=False)[source]

Given specified configurations, setup a logger.

ai4materials.utils.utils_crystals module
ai4materials.utils.utils_data_retrieval module
ai4materials.utils.utils_mp module
ai4materials.utils.utils_parsing module
ai4materials.utils.utils_plotting module
ai4materials.utils.utils_plotting.aggregate_struct_trans_data(filename, nb_rows_to_cut=0, nb_samples=None, nb_order_param_steps=None, min_order_param=0.0, max_order_param=None, prob_idxs=None, with_uncertainty=True, uncertainty_types=('variation_ratio', 'predictive_entropy', 'mutual_information'))[source]

Aggregate structural transition data in order to plot it later.

Starting from the results_file of the run_cnn_model function, aggregate the data by a given order parameter and the probabilities of each class. This is used to prepare the data for the structural transition plots, as shown in Fig. 4, Ziletti et al., Nature Communications 9, 2775 (2018).

Parameters:

filename: string,
Full path to the results_file created by the run_cnn_model function. This is a csv file
nb_samples: int
Number of samples present in results_file for each order parameter step.
nb_order_param_steps: int
Number of order parameter steps. For example, if we are interpolating between structure_1 and structure_2 with 10 steps, nb_order_param_steps=10.
max_order_param: float
Maximum number that the order parameter will take in the dataset. This is used to create (together with nb_order_param_steps) to create the linear space which will be later used by the plotting function.
prob_idxs: list of int
List of integers which correspond to the classes for which the probabilities will be extracted from the results_file. prob_idxs=[0, 3] will extract only prob_predictions_0 and prob_predictions_3 from the results_file.

Returns:

panda dataframe

A panda dataframe with the following columns:

  • a_to_b_index_ : value of the order parameter
  • 2i columns (where the i’s are the elements of the list prob_idxs)

as below:

prob_predictions_i_mean : mean of the distribution of classification probability i for the given a_to_b_index_ value of the order parameter.

prob_predictions_i_std : standard deviation of the distribution of classification probability i for the given a_to_b_index_ value of the order parameter.

  • [optional]: columns containing uncertainty quantification

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.utils.utils_plotting.insert_newlines(string, every=64)[source]
ai4materials.utils.utils_plotting.make_crossover_plot(df_results, filename, filename_suffix, title, labels, nb_order_param_steps, plot_type='probability', prob_idxs=None, uncertainty_type='mutual_information', linewidth=1.0, markersize=1.0, max_nb_ticks=None, palette=None, show_plot=False, style='publication', x_label='Order parameter')[source]

Starting from an aggregated data panda dataframe, plot classification probability distributions as a function of an order parameter.

This will produce a plot along the lines of Fig. 4, Ziletti et al.

Parameters:

df_results: panda dataframe,
Panda dataframe returned by the aggregate_struct_trans_data function.
filename: string
Full path to the results_file created by the run_cnn_model function. This is a csv file. Only used to name the generated plot appriately.
filename_suffix: string
Suffix to be put for the plot filename. This suffix will determine the format of the output plot (e.g. ‘.png’ or ‘.svg’ will create a png or an svg file, respectively.)
title: string
Title of the plot
plot_type: str (options: ‘probability’, ‘uncertainty’)
Plot either probabilities of classification or uncertainty.
uncertainty_type: str (options: ‘mutual_information’, ‘predictive_entropy’)
Type of uncertainty estimation to be plotted. Used only if `plot_type`=’uncertainty’.
prob_idxs: list of int
List of integers which correspond to the classes for which the probabilities will be extracted from the results_file. prob_idxs=[0, 3] will extract only prob_predictions_0 and prob_predictions_3 from the results_file. They should correspond (or be a subset) of the prob_idxs specified in aggregate_struct_trans_data.
nb_order_param_steps: int
Number of order parameter steps. For example, if we are interpolating between structure_1 and structure_2 with 10 steps, nb_order_param_steps=10. Must be the same as specified in aggregate_struct_trans_data. Different values might work, but could give rise to unexpected behaviour.
show_plot: bool, optional, default: False
If True, it opens the generated plot.
style: string, optional, {‘publication’}
If style==’publication’, load the default matplotlib style (white background). Otherwise, use the ‘fivethirtyeight’ matplotlib style (black background). plt.style.use(‘fivethirtyeight’)
x_label: string, optional, default: “Order parameter”
Label for the x-axis (the order parameter axis)

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.utils.utils_plotting.make_multiple_image_plot(data, title='Figure 1', cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_rows=None, n_cols=None, vmin=None, vmax=None, filename=None, save=False)[source]
ai4materials.utils.utils_plotting.make_plot_accuracy(step, train_data, val_data)[source]
ai4materials.utils.utils_plotting.make_plot_cross_entropy_loss(step, train_data, val_data, title=None)[source]
ai4materials.utils.utils_plotting.plot_confusion_matrix(conf_matrix, classes, conf_matrix_file, normalize=False, title='Confusion matrix', title_true_label='True label', title_pred_label='Predicted label', cmap='Blues')[source]

This function prints and plots the confusion matrix. Normalization can be applied by setting normalize=True.

ai4materials.utils.utils_plotting.plot_save_cnn_results(filename, accuracy=True, cross_entropy_loss=True, show_plot=False)[source]
Plot and save results of a convolutional neural network calculation
from the .csv file written by Keras CSVLogger.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.utils.utils_plotting.plot_sph_harmonics()[source]
ai4materials.utils.utils_plotting.rgb_colormaps(color)[source]

Obtain colormaps for RGB.

For a general overview: https://matplotlib.org/examples/pylab_examples/custom_cmap.html

ai4materials.utils.utils_plotting.show_images(images, filename_png, cols=1, titles=None)[source]

Display a list of images in a single figure with matplotlib.

Taken from https://stackoverflow.com/questions/11159436/multiple-figures-in-a-single-window

Parameters:

images: list of np.arrays
Images to be plotted. It must be compatible with plt.imshow.
cols: int, optional, (default = 1)
Number of columns in figure (number of rows is set to np.ceil(n_images/float(cols))).
titles: list of strings
List of titles corresponding to each image.
ai4materials.utils.utils_vol_data module
ai4materials.utils.utils_vol_data.get_shells_from_indices(xyz_r, vol_data)[source]

Obtain concentric shells from volumetric data.

The starting point are an array containing the volumetric data and a list of indices which assign points of the volume to the corresponding concentric shell. Using these indices, we perform two operations:

  1. extract the concentric shells in the volumetric space
  2. transform the concentric shells to spherical coordinates, i.e. project each sphere to a (theta, phi) plane

Point 1) gives volumetric 3d data containing a given shell. Point 2) gives 2d data in the (theta, phi) plane for a given shell; this can be interpreted as a heatmap.

Parameters:

xyz_r: list of list of tuples
The length of the list corresponds to the number of concentric shells considered. Each element in the list - representing a concentric shell - contains a list of 3 dimensional tuples, with the indices of the volume elements which belong to the given concentric shell. This is the list returned by ai4materials.utils.utils_vol_data.get_slice_volume_indices.
vol_data: numpy.ndarray
Volumetric data as numpy.ndarray.

Return: vox_by_slices, theta_phi_by_slices

vox_by_slices: np.ndarray, shape [n_slices, n_px, n_py, n_pz]
4-dimensional array containing each concentric shell obtained from ai4materials.descriptors.diffraction3d.Diffraction3D. n_px, n_py, n_pz are given by the interpolation and the region of the space considered. In our case, n_slices=52, n_px=n_py=n_pz=176.
theta_phi_by_slices: list of tuples
Each element in the list correspond to a concentric shell. In each concentric shell, there is a list of tuples (theta, phi, intensity) of the non-zero points in the volume considered, as return by ai4materials.utils.utils_vol_data.shells_to_sph. The length of the tuple list of each concentric shell is different because a different number of points is non-zero for each shell.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.utils.utils_vol_data.get_slice_volume_indices(vol_data, min_r, max_r, dr=1.0, phi_bins=100, theta_bins=50)[source]

Given 3d volume return the indices of points belonging to the specified concentric shells.

The volume is should be centered according to its center of mass to have centered concentric shells. In our case we do not have to do that because the diffraction intensity obtained with the Fourier Transform is centered by definition. For use reference, we nevertheless calculate the center of mass within the function.

Parameters:

vol_data:
Numpy 3D array containing the volumetric data to be sliced. In our case, this is the three-dimensional diffraction intensity.
theta_bins: int, optional (default=50)
Bins to be used for the theta angle of the spherical coordinates.
phi_bins: int, optional (default=100)
Bins to be used for the phi angle of the spherical coordinates.
Returns: list
List of length = (max_r - min_r)/dr; the length corresponds to the number of concentric shells considered. Each element in the list - representing a concentric shell - contains a list of 3 dimensional tuples, with the indices of the volume elements which belong to the given concentric shell. For example, let us assume the output of the function is stored in variable xyz_r. xyz[0] gives a list of tuples corresponding to the points in the first concentric shell. If xyz[0][0] = (82, 97, 119), this means that the element of the volumetric shape with index (82, 97, 119) belong to the first shell.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

ai4materials.utils.utils_vol_data.interp_theta_phi_surfaces(theta_phi_by_slices_coarse, theta_bins=256, phi_bins=512)[source]

Interpolate the spherical shells in spherical coordinate to a finer grid.

For more information on the interpolation, please refer to: http://scipy-cookbook.readthedocs.io/items/Matplotlib_Gridding_irregularly_spaced_data.html

theta_phi_by_slices_coarse: list of tuples
Each element in the list correspond to a concentric shell. In each concentric shell, there is a list of tuples (theta, phi, intensity) of the non-zero points in the volume considered, as return by ai4materials.utils.utils_vol_data.shells_to_sph. The length of the tuple list of each concentric shell is different because a different number of points is non-zero for each shell.
theta_bins: int, optional (default=256)
Bins to be used for the interpolation of the theta angle of the spherical coordinates.
phi_bins: int, optional (default=512)
Bins to be used for the interpolation of the phi angle of the spherical coordinates.
Return: np.ndarray, shape [n_slices, theta_bins_fine, phi_bins_fine]
Three-dimensional array containing each concentric shell in spherical coordinate. n_slices is given by the region of the space considered.

Code author: Angelo Ziletti <angelo.ziletti@gmail.com>

Module contents

Authors

Development Lead

alternate text

Contributors

  • Andreas Leitherer
    • Contributions:
      • ai4materials.utils.utils_crystals.get_boxes_from_xyz
      • ai4materials.descriptors.ft_soap_descriptor
      • ai4materials.descriptors.quippy_soap_descriptor
  • Emre Ahmetcik:

History

0.1.0 (2018-11-10)

  • First release. Code available on github.

Module contents

Indices and tables