Skip to contents

The procedure starts with creating num_init subsets of size d. For each subset a linear model is fitted and the model that has the smallest loss is selected.

Usage

slise_initialisation_candidates2(
  X,
  Y,
  epsilon,
  weight = NULL,
  beta_max = 20/epsilon^2,
  max_approx = 1.15,
  num_init = 500,
  beta_max_init = 2.5/epsilon^2,
  max_iterations = 300,
  ...
)

Arguments

X

data matrix

Y

response vector

epsilon

error tolerance

weight

weight vector (default: NULL)

beta_max

the maximum sigmoid steepness (default: 20/epsilon^2)

max_approx

the target approximation ratio (default: 1.15)

num_init

the number of initial subsets to generate (default: 400)

beta_max_init

the maximum sigmoid steepness in the initialisation

max_iterations

if ncol(X) is huge, then ols is replaced with optimisation (default:300)

...

unused parameters

Value

list(alpha, beta)

Details

The chance that one of these subsets contains only "clean" data is: $$ 1-(1-(1-noise_fraction)^d)^num_init $$ This means that high-dimensional data (large d) can cause issues, which is solved by using LASSO-regularisation (which enables fitting of linear models with smaller subsets than d).