slise.data
This script contains functions for modifying data, mainly normalisation and PCA.
DataScaling
Bases: NamedTuple
Container class for scaling information
Source code in slise/data.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
scale_x(x, remove_columns=True)
Scale a x matrix / vector using the stored scaling information. See slise.data.scale_same.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
ndarray
|
New x matrix / vector. |
required |
remove_columns |
bool
|
Remove columns according to the stored information. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Scaled matrix / vector. |
Source code in slise/data.py
199 200 201 202 203 204 205 206 207 208 209 210 |
|
scale_y(y)
Scale a y vector / scalar using the stored scaling information. See slise.data.scale_same.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
ndarray
|
New y vector / scalar. |
required |
Returns:
Type | Description |
---|---|
Union[float, ndarray]
|
np.ndarray: Scaled y vector / scalar. |
Source code in slise/data.py
212 213 214 215 216 217 218 219 220 221 222 |
|
unscale_model(model)
Unscale a linear model. See slise.data.unscale_model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
ndarray
|
Linear model operating on scaled data. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Linear model operating on unscaled data. |
Source code in slise/data.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
add_intercept_column(X)
Add a constant column of ones to the matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Matrix or vector. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Matrix / vector where the first column / value is one. |
Source code in slise/data.py
10 11 12 13 14 15 16 17 18 19 20 21 |
|
remove_intercept_column(X)
Remove the first column. Used to revert slise.data.add_intercept_column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Matrix or vector. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Matrix / vector without the first column / value. |
Source code in slise/data.py
24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
remove_constant_columns(X, epsilon=None)
Remove columns that are constant from a matrix. Used to revert slise.data.add_constant_columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Data matrix. |
required |
epsilon |
Optional[float]
|
Treshold for constant (std < epsilon). Defaults to machine epsilon. |
None
|
Returns:
Type | Description |
---|---|
Tuple[ndarray, ndarray]
|
Tuple[np.ndarray, np.ndarray]: A tuple of the reduced matrix and a mask showing which columns where retained. |
Source code in slise/data.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
add_constant_columns(X, mask, intercept=False)
Add (back) contant columns to a matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray
|
Data matrix. |
required |
mask |
Optional[ndarray]
|
A boolean array showing which columns are already in the matrix. |
required |
intercept |
bool
|
Does X has an intercept (added to it after constant columns where removed). Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: A matrix with new columns filled with zeros. |
Source code in slise/data.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
normalise_robust(x, epsilon=None)
A robust version of normalisation that uses median and mad (median absolute deviation). Any zeros in the scale are replaced by ones to avoid division by zero.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
ndarray
|
Vector or tensor to normalise. |
required |
epsilon |
Optional[float]
|
Threshold for the scale being zero. Defaults to machine epsilon. |
None
|
Returns:
Type | Description |
---|---|
Tuple[ndarray, Union[float, ndarray], Union[float, ndarray]]
|
Tuple[np.ndarray, Union[float, np.ndarray], Union[float, np.ndarray]]: Tuple of normalised x, center and scale. |
Source code in slise/data.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
scale_same(x, center, scale, constant_colums=None, remove_columns=True)
Scale a matrix or vector the same way as another.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
ndarray
|
Matrix or vector to scale. |
required |
center |
Union[float, ndarray]
|
The center used for the previous scaling. |
required |
scale |
Union[float, ndarray]
|
The scale used for the previous scaling. |
required |
constant_colums |
Optional[ndarray]
|
Boolean mask of constant columns. Defaults to None. |
None
|
remove_columns |
bool
|
Should constant columns be removed. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The scaled matrix/vector. |
Source code in slise/data.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
unscale_model(model, x_center, x_scale, y_center=0.0, y_scale=1.0, columns=None)
Scale a linear model such that it matches unnormalised data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
ndarray
|
The model for normalised data. |
required |
x_center |
ndarray
|
The center used for normalising X. |
required |
x_scale |
ndarray
|
The scale used for normalising X. |
required |
y_center |
float
|
The scale used for normalising y. Defaults to 0.0. |
0.0
|
y_scale |
float
|
The center used for normalising y. Defaults to 1.0. |
1.0
|
columns |
Optional[ndarray]
|
Mask of removed columns (see remove_constant_columns). Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The unscaled model. |
Source code in slise/data.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
pca_simple(x, dimensions=10, tolerance=1e-10)
Fit and use PCA for dimensionality reduction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
ndarray
|
Matrix to reduce. |
required |
dimensions |
int
|
The number of dimensions to return. Defaults to 10. |
10
|
tolerance |
float
|
Threshold for variance being zero. Defaults to 1e-10. |
1e-10
|
Returns:
Type | Description |
---|---|
Tuple[ndarray, ndarray]
|
Tuple[np.ndarray, np.ndarray]: Tuple of the reduced matrix and PCA rotation matrix. |
Source code in slise/data.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
pca_rotate(x, v)
Use a trained PCA for dimensionality reduction. See slise.data.pca_simple for how to obtain a rotation matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
ndarray
|
Matrix to reduce. |
required |
v |
ndarray
|
PCA rotation matrix. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The reduced matrix. |
Source code in slise/data.py
265 266 267 268 269 270 271 272 273 274 275 276 |
|
pca_invert(x, v)
Revert a PCA dimensionality reduction. See slise.data.pca_simple for how to obtain a rotation matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
ndarray
|
Matrix to expand. |
required |
v |
ndarray
|
PCA rotation matrix. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The expanded matrix. |
Source code in slise/data.py
279 280 281 282 283 284 285 286 287 288 289 290 |
|
pca_rotate_model(model, v)
Transform a linear model to work in PCA reduced space. See slise.data.pca_simple for how to obtain a rotation matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
ndarray
|
Linear model coefficients. |
required |
v |
ndarray
|
PCA rotation matrix. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The transformed linear model. |
Source code in slise/data.py
293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
|
pca_invert_model(model, v)
Transform a linear model from PCA space to "normal" space. See slise.data.pca_simple for how to obtain a rotation matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
ndarray
|
Linear model coefficients (in PCA space). |
required |
v |
ndarray
|
PCA rotation matrix. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The transformed linear model. |
Source code in slise/data.py
309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
|