ai4materials.models.l1_l0 module

ai4materials.models.l1_l0.choose_atomic_features(selected_feature_list=None, atomic_data_file=None, binary_data_file=None)[source]

Choose primary features for the extended lasso procedure.

ai4materials.models.l1_l0.classify_rs_zb(structure)[source]

Classify if a structure is rocksalt of zincblend from a list of NoMaD structure. (one json file). Supports multiple frames (TO DO: check that). Hard-coded.

rocksalt: atom_frac1 0.0 0.0 0.0 atom_frac2 0.5 0.5 0.5

zincblende: atom_frac1 0.0 0.0 0.0 atom_frac2 0.25 0.25 0.25

zincblende –> label=0 rocksalt –> label=1

ai4materials.models.l1_l0.combine_features(df=None, energy_unit=None, length_unit=None, metadata_info=None, allowed_operations=None, derived_features=None)[source]

Generate combination of features given a dataframe and a list of allowed operations.

For the exponentials, we introduce a characteristic energy/length converting the ..todo:: Fix under/overflow errors, and introduce handling of exceptions.

ai4materials.models.l1_l0.e_sqrt_z(row)[source]

Calculates e/sqrt(val_Z).

Es/sqrt(Zval) and Ep/sqrt(Zval) from Phys. Rev. B 85, 104104 (2012). Input Es(A) or Ep(A), val(A) (A–>B) They need to be given in this order.

ai4materials.models.l1_l0.get_energy_diff(chemical_formula_list, energy_list, label_list)[source]

Obtain difference in energy (eV) between rocksalt and zincblend structures of a given binary.

From a list of chemical formulas, energies and labels returns a dictionary with {material: delta_e} where delta_e is the difference between the energy with label 1 and energy with label 0, grouped by material. Each element of such list corresponds to a json file. The delta_e is exactly what reported in the PRL 114, 105503(2015).

Todo

Check if it works for multiple frames.

ai4materials.models.l1_l0.get_lowest_energy_structures(structure, dict_delta_e)[source]

Get lowest energy structure for each material and label type.

Works only with two possible labels for a given material.

Todo

Check if it works for multiple frames.

ai4materials.models.l1_l0.l1_l0_minimization(y_true, D, features, energy_unit=None, print_lasso=False, lambda_grid=None, lassonumber=25, max_dim=3, lambda_grid_points=100, lambda_max_factor=1.0, lambda_min_factor=0.001)[source]

Select an optimal descriptor using a combined l1-l0 procedure.

  1. step (l 1): Solve the LASSO minimization problem
\[argmin_c {||P-Dc||^2 + \lambda |c|_1}\]

for different lambdas, starting from a ‘high’ lambda. Collect all indices(Features) i appearing with nonzero coefficients c_i, while decreasing lambda, until size of collection equals lassonumber.

  1. step (l 0): Check the least-squares errors for all single features/pairs/triples/… of
    collection from 1. step. Choose the single/pair/triple/… with the lowest mean squared error (MSE) to be the best 1D/2D/3D-descriptor.

Parameters:

y_true : array, [n_samples]
Array with the target property (ground truth)
D : array, [n_samples, n_features]
Matrix with the data.
features : list of strings
List of feature names. Needs to be in the same order as the feature vectors in D
dimrange : list of int
Specify for which dimensions the optimal descriptor is calculated. It is the number of feature vectors used in the linear combination
lassonumber : int, default 25
The number of features, which will be collected in ther l1-step
lamdba_grid_points : int, default 100
Number of lamdbas between lamdba_max and lambdba_min for which the l1-problem shall be solved. Sometimes a denser grid could be needed, if the lamda-steps are too high. This can be checked with ‘print_lasso’. lamdba_max and lamdba_min are chosen as in Tibshirani’s paper “Regularization Paths for Generalized Linear Models via Coordinate Descent”. The values in between are generated on the log scale.
lambda_min_factor : float, default 0.001
Sets lam_min = lambda_min_factor * lam_max.
lambda_max_factor : float, default 1.0
Sets calculated lam_max = lam_max * lambda_max_factor.
print_lasso: bool, default True
Prints the indices of coulumns of D with nonzero coefficients for each lambda.
lambda_grid: array
The list/array of lambda values for the l1-problem can be chosen by the user. The list/array should start from the highest number and lambda_i > lamda_i+1 should hold. (?) lambda_grid_point is then ignored. (?)

Returns:

list of panda dataframes (D’, c’, selected_features) :

A list of tuples (D’,c’,selected_features) for each dimension. selected_features is a list of strings. D’*c’ is the selected linear model/fit where the last column of D is a vector with ones.

References:

[1]Luca M. Ghiringhelli, Jan Vybiral, Sergey V. Levchenko, Claudia Draxl, and Matthias Scheffler, “Big Data of Materials Science: Critical Role of the Descriptor” Phys. Rev. Lett. 114, 105503 (2015)
ai4materials.models.l1_l0.r_pi(row)[source]

Calculates r_pi.

John-Bloch’s indicator2: |rp(A) - rs(A)| +| rp(B) -rs(B)| from Phys. Rev. Lett. 33, 1095 (1974). Input rp(A), rs(A), rp(B), rs(B) They need to be given in this order. combine_features

ai4materials.models.l1_l0.r_sigma(row)[source]

Calculates r_sigma.

John-Bloch’s indicator1: |rp(A) + rs(A) - rp(B) -rs(B)| from Phys. Rev. Lett. 33, 1095 (1974).

Input rp(A), rs(A), rp(B), rs(B) They need to be given in this order.

ai4materials.models.l1_l0.write_atomic_features(structure, selected_feature_list, df, dict_delta_e=None, path=None, filename_suffix='.json', json_file=None)[source]

Given the chemical composition, build the descriptor made of atomic features only.

Includes all the frames in the same json file.

Todo

Check if it works for multiple frames.