Lasso+L0 performed on a tailor made feature space was introduced in Phys. Rev. Lett. 114, 105503 (2015).

The application of the method goes through these step:

- The
*feature space*is generated by creating a list of analytical expressions (the*derived features*), obtained by combining the selected*primary features*and operations. - The Least Absolute Shrinkage and Selection Operator (LASSO) is applied. In practice the following minimization is performed:
where
is a vector listing the property of interest (here, the RS - ZB difference in energy) for all data points (here, binary materials),**P**is a matrix whose columns are the**D***derived features*listed for each material,is the (sparse) vector of coefficients that is found upon minimization, lambda is the regularization parameter that determines the level of sparsity (number of non zero elements) of**c**, and the subscript 1 stays for L1 (also known as Manhattan) norm, i.e., differently from the usual Euclidean norm (L2 norm), the sum of the absolute values of the elements of the argument, that is a vector. The regularization parameter is decreased in small steps starting from the largest value that gives one non-zero element in**c**, and 50 distinct features that have non-zero coefficient in**c**are collected.**c** - A L0 optimization is performed, formally written as:

where the subscript 0 stays for the L0 quasinorm, that counts the number of non-zero elements of the argument, and * D'* is the matrix whose columns are the 50 columns selected from

*from the previous step. In practice all singletons, pairs, triplets, ...*

**D***n*-tuples (up to the selected maximum dimension of the descriptor) are listed and for each set a linear least-square regression (LLSR) is performed. The

*n*-tuple that gives the lowest mean square error for the LLSR fit is selected as the resulting

*n*-dimensional descriptor.