Initialise the graduated optimisation by sampling candidates — slise_initialisation

The procedure starts with creating num_init subsets of size d. For each subset a linear model is fitted and the model that has the smallest loss is selected.

Usage

slise_initialisation_candidates2(
  X,
  Y,
  epsilon,
  weight = NULL,
  beta_max = 20/epsilon^2,
  max_approx = 1.15,
  num_init = 500,
  beta_max_init = 2.5/epsilon^2,
  max_iterations = 300,
  ...
)

Arguments

X: data matrix
Y: response vector
epsilon: error tolerance
weight: weight vector (default: NULL)
beta_max: the maximum sigmoid steepness (default: 20/epsilon^2)
max_approx: the target approximation ratio (default: 1.15)
num_init: the number of initial subsets to generate (default: 400)
beta_max_init: the maximum sigmoid steepness in the initialisation
max_iterations: if ncol(X) is huge, then ols is replaced with optimisation (default:300)
...: unused parameters

Value

list(alpha, beta)

Details

The chance that one of these subsets contains only "clean" data is: $$ 1-(1-(1-noise_fraction)^d)^num_init $$ This means that high-dimensional data (large d) can cause issues, which is solved by using LASSO-regularisation (which enables fitting of linear models with smaller subsets than d).