128 lines
3 KiB
Text
128 lines
3 KiB
Text
# Logistic regression:
|
|
pnca
|
|
input: numerical features
|
|
output: dm/om: target
|
|
|
|
grid search/base estimator with a single model with hyperparamter choices: gives you the best model based on a SINGLE metric!
|
|
-- question: which is the metric to optimise for?
|
|
base estimator with multipe models and multiple hyperparams: returns the OVERALL best model-hyperparam combo, based on a single score?
|
|
-- question: which is the metric to optimise for?
|
|
|
|
|
|
# Demonstration
|
|
|
|
###################
|
|
# Metric1: accuracy
|
|
###################
|
|
|
|
Best model:
|
|
{'clf__max_iter': 100, 'clf__solver': 'liblinear'}
|
|
|
|
Best models score:
|
|
0.7145320197044336
|
|
|
|
|
|
###################
|
|
# Metric2: F1
|
|
###################
|
|
Best model:
|
|
{'clf__max_iter': 100, 'clf__solver': 'saga'}
|
|
Best models score:
|
|
0.7550294183111348
|
|
|
|
|
|
###################
|
|
# Metric3: Recall
|
|
###################
|
|
Best model:
|
|
{'clf__max_iter': 100, 'clf__solver': 'saga'}
|
|
Best models score:
|
|
0.8216666666666667
|
|
|
|
|
|
###################
|
|
# Metric4: ROC_AUC
|
|
###################
|
|
|
|
Best model:
|
|
{'clf__max_iter': 200, 'clf__solver': 'sag'}
|
|
Best models score:
|
|
0.7711904761904762
|
|
|
|
###################
|
|
# Metric5: MCC
|
|
###################
|
|
|
|
Best model:
|
|
{'clf__max_iter': 100, 'clf__solver': 'saga'}
|
|
Best models score:
|
|
0.4322970173039572
|
|
|
|
sklearn/linear_model/_sag.py:354: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
|
|
ConvergenceWarning,
|
|
|
|
#####################################
|
|
# Same thing but using: CLFSwitcher()
|
|
|
|
|
|
###################
|
|
# Metric1: Accuracy
|
|
###################
|
|
|
|
Best model:
|
|
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear')
|
|
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblinear'}
|
|
Best models score:
|
|
0.7219298245614035
|
|
|
|
###################
|
|
# Metric2: F1
|
|
###################
|
|
Best model:
|
|
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear'), 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblinear'}
|
|
|
|
print('Best models score:\n', gscv.best_score_)
|
|
Best models score:
|
|
0.7585724070894442
|
|
|
|
###################
|
|
# Metric3: Recall
|
|
###################
|
|
Best model:
|
|
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear')
|
|
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblinear'}
|
|
Best models score:
|
|
0.8198610213316095
|
|
|
|
###################
|
|
# Metric4: ROC_AUC
|
|
###################
|
|
Best model:
|
|
{'clf__estimator': LogisticRegression(solver='newton-cg')
|
|
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'newton-cg'}
|
|
|
|
Best models score:
|
|
nan
|
|
|
|
###################
|
|
# Metric5: MCC
|
|
###################
|
|
Best model:
|
|
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear')
|
|
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblin
|
|
|
|
Best models score:
|
|
0.4480248700902755
|
|
|
|
|
|
|
|
|
|
|
|
|
|
print('Best model:\n', gs_dt.best_params_)
|
|
Best model:
|
|
{'criterion': 'entropy', 'max_depth': 2, 'max_features': None, 'max_leaf_nodes': 10}
|
|
|
|
print('Best models score:\n', gs_dt.best_score_)
|
|
Best models score:
|
|
0.43290518915746007
|