Welcome to flexcv-earth
See our repository here: flexcv-earth
This python package provides wrapper classes for the earth
function from the earth
package in R.
It then can be used as a sklearn estimator in python and especially in the flexcv
package.
Installation
Additional dependencies of rpy2
The model class for the EarthRegressor
is actually wrapping around rpy2
code and is using embedded R
under the hood.
Therefore, you should have a recent R
version installed and run our install_rpackages.py
script.
From the command line change your directory to your flexcv-earth
installation directory.
This can be your folder that you created with venv
. Run our python script that installs the remaining R dependencies.
Now you have installed everything you need to use the EarthRegressor
with flexcv-earth.
Use with flexcv
You can use flexcv
to perform cross validation with the EarthRegressor
class.
Define a model configuration with yaml as follows:
from flexcv import CrossValidation
from flexcv_earth import EarthRegressor, EarthModelPostProcessor
from flexcv.synthesizer import generate_data
X, y, _, _ = generate_data(10, 100)
yaml_config = """
EarthRegressor:
requires_inner_cv: True
n_trials: 200
allows_n_jobs: False
model: EarthRegressor
params:
degree: !Int
low: 1
high: 5
nprune: !Int
low: 1
high: 300
fast_k: !Int
low: 0
high: 20
newvar_penalty: !Float
low: 0.01
high: 0.2
post_processor: EarthModelPostProcessor
add_merf: True
"""
cv = (
CrossValidation().set_data(X, y)
.set_models(yaml_strin=yaml_config)
)
Reference
flexcv_earth.models.EarthRegressor
Bases: BaseEstimator
, RegressorMixin
Wrapper Class for Earth Regressor in R. For more Details see https://cran.r-project.org/web/packages/earth/earth.pdf.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
degree |
int
|
Degree of the splines. 1 for linear, 2 for quadratic, etc. (Default value = 1) |
1
|
nprune |
int | None
|
Number of pruning steps. If None, the number of pruning steps is determined by the algorithm. (Default value = None) |
None
|
nk |
int | None
|
Number of knots. If None, the number of knots is determined by the algorithm. The default is semi-automatically calculated from the number of predictors but may need adjusting. (Default value = None) |
None
|
thresh |
float
|
Forward stepping threshold. (Default value = 0.001) |
0.001
|
minspan |
int
|
Minimum number of observations between knots. (Default value = 0) |
0
|
endspan |
int
|
Minimum number of observations before the first and after the final knot. (Default value = 0) |
0
|
newvar_penalty |
float
|
(Default value = 0.0) |
0.0
|
fast_k |
int
|
Maximum number of parent terms considered at each step of the forward pass. (Default value = 20) |
20
|
fast_beta |
float
|
Fast MARS ageing coefficient, as described in the Fast MARS paper section 3.1. Default is 1. A value of 0 sometimes gives better results. (Default value = 1.0) |
1.0
|
pmethod |
str
|
Pruning method. One of: backward none exhaustive forward seqrep cv. Default is "backward". Specify pmethod="cv" to use cross-validation to select the number of terms. This selects the number of terms that gives the maximum mean out-of-fold RSq on the fold models. Requires the nfold argument. Use "none" to retain all the terms created by the forward pass. If y has multiple columns, then only "backward" or "none" is allowed. Pruning can take a while if "exhaustive" is chosen and the model is big (more than about 30 terms). The current version of the leaps package used during pruning does not allow user interrupts (i.e., you have to kill your R session to interrupt; in Windows use the Task Manager or from the command line use taskkill). (Default value = "backward") |
'backward'
|
Source code in flexcv_earth/models.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
flexcv_earth.models.EarthRegressor.calc_variable_importance()
Calculates the variable importance of the model.
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the variable importance. |
Source code in flexcv_earth/models.py
flexcv_earth.models.EarthRegressor.fit(X, y)
Fit a EARTH model to the given training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array - like
|
Features. |
required |
y |
array - like
|
Target values. |
required |
Returns:
Type | Description |
---|---|
object
|
Returns self. |
Source code in flexcv_earth/models.py
flexcv_earth.models.EarthRegressor.get_params(deep=False)
Returns the parameters of the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
deep |
bool: This argument is not used. (Default value = False) |
False
|
Returns: (dict): Parameter names mapped to their values.
Source code in flexcv_earth/models.py
flexcv_earth.models.EarthRegressor.get_rmodel()
flexcv_earth.models.EarthRegressor.get_variable_importance(features)
Returns the variable importance of the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
array-like: The feature names. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the variable importance. |
Source code in flexcv_earth/models.py
flexcv_earth.models.EarthRegressor.make_r_plots()
Creates plots of the model in R and saves them to disk. They are saved to disk in the tmp_imgs
folder.
Source code in flexcv_earth/models.py
flexcv_earth.models.EarthRegressor.predict(X)
Make predicitons using the fitted model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array - like
|
Features |
required |
Returns:
Type | Description |
---|---|
array - like
|
An array of fitted values. |
Source code in flexcv_earth/models.py
flexcv_earth.model_postprocessing.EarthModelPostProcessor
Source code in flexcv_earth/model_postprocessing.py
flexcv_earth.model_postprocessing.EarthModelPostProcessor.__call__(results_all_folds, fold_result, features, run, *args, **kwargs)
Postprocessing function for the MARS model. Logs the parameters to Neptune. Creates a variable importance table and logs barplots to neptune.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
results_all_folds |
A dict of results for all folds |
required | |
fold_result |
A dataclass containing the results for the current fold |
required | |
features |
list of features |
required | |
run |
neptune run object |
required | |
*args |
any additional arguments |
()
|
|
**kwargs |
any additional keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
dict
|
updated results dictionary |