ML_AI_training/earlier_versions/GSCV_base

128 lines
3 KiB
Text

# Logistic regression:
pnca
input: numerical features
output: dm/om: target
grid search/base estimator with a single model with hyperparamter choices: gives you the best model based on a SINGLE metric!
-- question: which is the metric to optimise for?
base estimator with multipe models and multiple hyperparams: returns the OVERALL best model-hyperparam combo, based on a single score?
-- question: which is the metric to optimise for?
# Demonstration
###################
# Metric1: accuracy
###################
Best model:
{'clf__max_iter': 100, 'clf__solver': 'liblinear'}
Best models score:
0.7145320197044336
###################
# Metric2: F1
###################
Best model:
{'clf__max_iter': 100, 'clf__solver': 'saga'}
Best models score:
0.7550294183111348
###################
# Metric3: Recall
###################
Best model:
{'clf__max_iter': 100, 'clf__solver': 'saga'}
Best models score:
0.8216666666666667
###################
# Metric4: ROC_AUC
###################
Best model:
{'clf__max_iter': 200, 'clf__solver': 'sag'}
Best models score:
0.7711904761904762
###################
# Metric5: MCC
###################
Best model:
{'clf__max_iter': 100, 'clf__solver': 'saga'}
Best models score:
0.4322970173039572
sklearn/linear_model/_sag.py:354: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
ConvergenceWarning,
#####################################
# Same thing but using: CLFSwitcher()
###################
# Metric1: Accuracy
###################
Best model:
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear')
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblinear'}
Best models score:
0.7219298245614035
###################
# Metric2: F1
###################
Best model:
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear'), 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblinear'}
print('Best models score:\n', gscv.best_score_)
Best models score:
0.7585724070894442
###################
# Metric3: Recall
###################
Best model:
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear')
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblinear'}
Best models score:
0.8198610213316095
###################
# Metric4: ROC_AUC
###################
Best model:
{'clf__estimator': LogisticRegression(solver='newton-cg')
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'newton-cg'}
Best models score:
nan
###################
# Metric5: MCC
###################
Best model:
{'clf__estimator': LogisticRegression(random_state=42, solver='liblinear')
, 'clf__estimator__max_iter': 100, 'clf__estimator__solver': 'liblin
Best models score:
0.4480248700902755
print('Best model:\n', gs_dt.best_params_)
Best model:
{'criterion': 'entropy', 'max_depth': 2, 'max_features': None, 'max_leaf_nodes': 10}
print('Best models score:\n', gs_dt.best_score_)
Best models score:
0.43290518915746007