flexcv.model_selection
This module implements customization of the objective function for the hyperparameter optimization. In order to use a custom objective function, we implemented the inner cv loop as follows (pseudo code):
objective_cv(
if n_jobs == -1:
parallel_objective(some_kind_of_scorer)
else:
objective(some_king_of_scorer)
flexcv.model_selection.ObjectiveScorer
Bases: Callable[[ndarray, ndarray, ndarray, ndarray], float]
Callable class that wraps a scorer function to be used as an objective function. The scorer function must match the following signature. Instantiating the class will check the signature.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_valid |
ndarray
|
The validation target values. |
required |
y_pred |
ndarray
|
The predicted target values. |
required |
y_train_in |
ndarray
|
The training target values. |
required |
y_pred_train |
ndarray
|
The predicted training target values. |
required |
Returns:
Type | Description |
---|---|
float
|
The objective function value. |
Source code in flexcv/model_selection.py
flexcv.model_selection.ObjectiveScorer.check_signature()
Source code in flexcv/model_selection.py
flexcv.model_selection.custom_scorer(y_valid, y_pred, y_train_in, y_pred_train)
Objective scorer for the hyperparameter optimization. The function calculates the mean squared error (MSE) for both the validation and training data, and then calculates a weighted sum of the MSEs and their differences. The weights and thresholds used in the calculation are defined in the function. The function returns a float value that represents the objective function value. This function is used in the hyperparameter optimization process to evaluate the performance of different models with different hyperparameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_valid |
ndarray
|
The validation target values |
required |
y_pred |
ndarray
|
Predicted target values |
required |
y_train_in |
ndarray
|
Inner training target values |
required |
y_pred_train |
ndarray
|
Inner predicted target values |
required |
Returns:
Type | Description |
---|---|
float
|
The objective function value. |
For hyperparameter tuning (inner cv loop) we use the following hierarchy:
objective_cv(
if n_jobs == -1:
parallel_objective(some_kind_of_scorer)
else:
objective(some_king_of_scorer)
Source code in flexcv/model_selection.py
flexcv.model_selection.objective(X_train_in, y_train_in, X_valid, y_valid, pipe, params, objective_scorer)
Objective function for the hyperparameter optimization. Sets the parameters of the pipeline and fits it to the training data. Predicts the validation data and calculates the MSE for both the validation and training data. Then applies the objective scorer to the validation MSE and the training MSE which returns the objective function value. Returns the negative validation and training MSEs as well as the negative objective function value, since optuna maximizes the objective function. This function is called from the objective_cv function if n_jobs_cv is set to 1.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_train_in |
DataFrame or ndarray
|
The training data. |
required |
y_train_in |
DataFrame or ndarray
|
The training target values. |
required |
X_valid |
DataFrame or ndarray
|
The validation data. |
required |
y_valid |
DataFrame or ndarray
|
The validation target values. |
required |
pipe |
Pipeline
|
The pipeline to be used for the training. |
required |
Returns:
Type | Description |
---|---|
tuple
|
A tuple containing the negative validation MSE, the negative training MSE and the negative objective function value. |
Inner CV pseudo code
Source code in flexcv/model_selection.py
flexcv.model_selection.objective_cv(trial, cross_val_split, pipe, params, X, y, run, n_jobs, objective_scorer)
Objective function for the hyperparameter optimization with cross validation. n_jobs is the number of processes to use for the parallelization. If n_jobs is -1, the number of processes is set to the number of available CPUs. If n_jobs is 1, the objective function is called sequentially.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trial |
trial
|
Optuna trial object. |
required |
cross_val_split |
function
|
Function that returns the indices for the cross validation split. |
required |
pipe |
Pipeline
|
The pipeline to be used for the training. |
required |
params |
dict
|
Dictionary containing the parameters to be set in the pipeline. |
required |
X |
DataFrame or ndarray
|
Features. |
required |
y |
DataFrame or ndarray
|
Target. |
required |
run |
run
|
neptune run object |
required |
n_jobs |
int
|
Sklearn n_jobs parameter to control if CV is run in parallel or sequentially |
required |
objective_scorer |
ObjectiveScorer
|
Callable class that wraps a scorer function to be used as an objective function. |
required |
Returns:
Type | Description |
---|---|
float
|
The mean objective function value. Note: We average per default. If you would like to use the RMSE as the objective function, you have to average the MSEs and then take the square root. |
Inner CV pseudo code
Source code in flexcv/model_selection.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 |
|
flexcv.model_selection.parallel_objective(train_idx, valid_idx, X, y, pipe, params_, objective_scorer)
Objective function for the hyperparameter optimization to be used with multiprocessing.Pool.starmap. Gets the training and validation indices and the data and calls the objective function. Is called from the objective_cv function if n_jobs_cv is set to -1.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_idx |
ndarray
|
The training indices. |
required |
valid_idx |
ndarray
|
The validation indices. |
required |
X |
DataFrame or ndarray
|
The data. |
required |
y |
DataFrame or ndarray
|
The target values. |
required |
pipe |
Pipeline
|
The pipeline to be used for the training. |
required |
Returns:
Type | Description |
---|---|
tuple
|
A tuple containing the validation MSE, the training MSE and the objective function value. |