LSHTM_analysis/scripts/ml/log_embb_config.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 858

PASS: my_features_df and aa_df successfully combined
nrows: 858
ncols: 269
count of NULL values before imputation

or_mychisq          244
log10_or_mychisq    244
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

No. of numerical features: 45
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: False
Original Data
 Counter({0: 353, 1: 95}) Data dim: (448, 52)

-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (448, 52)
Test data size: (410, 52)
y_train numbers: Counter({0: 353, 1: 95})
y_train ratio: 3.7157894736842105

y_test_numbers: Counter({0: 385, 1: 25})
y_test ratio: 15.4
-------------------------------------------------------------
Simple Random OverSampling
 Counter({1: 353, 0: 353})
(706, 52)
Simple Random UnderSampling
 Counter({0: 95, 1: 95})
(190, 52)
Simple Combined Over and UnderSampling
 Counter({0: 353, 1: 353})
(706, 52)
SMOTE_NC OverSampling
 Counter({1: 353, 0: 353})
(706, 52)

#####################################################################

Running ML analysis: UQ [without AA  index but with active site annotations]
Gene name: embB
Drug name: ethambutol

Output directory: /home/tanu/git/Data/ethambutol/output/ml/uq_v1/

Sanity checks:
Total input features: 52

Training data size: (448, 52)
Test data size: (410, 52)

Target feature numbers (training data): Counter({0: 353, 1: 95})
Target features ratio (training data: 3.7157894736842105

Target feature numbers (test data): Counter({0: 385, 1: 25})
Target features ratio (test data): 15.4

#####################################################################


================================================================

Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01869178 0.03414488 0.02512193 0.02518272 0.02411675 0.02404666
 0.02404714 0.02150297 0.02328777 0.0233357 ]

mean value: 0.024347829818725585

key: score_time
value: [0.01090193 0.01089644 0.0107584  0.01090932 0.0108211  0.01080489
 0.0108552  0.01080036 0.01080394 0.01080608]

mean value: 0.010835766792297363

key: test_mcc
value: [0.56660974 0.66143783 0.74285714 0.80295507 0.80295507 0.63936201
 0.78446454 0.70511024 0.78360391 0.70370542]

mean value: 0.7193060978385479

key: train_mcc
value: [0.82306415 0.783378   0.78225437 0.78270798 0.80735444 0.83325019
 0.77718904 0.80913415 0.81084447 0.80068593]

mean value: 0.8009862707554779

key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.84444444 0.88888889 0.91111111 0.93333333 0.93333333 0.88888889
 0.93333333 0.91111111 0.93181818 0.90909091]

mean value: 0.9085353535353535

key: train_accuracy
value: [0.94292804 0.93052109 0.93052109 0.93052109 0.93796526 0.94540943
 0.9280397  0.93796526 0.93811881 0.93564356]

mean value: 0.9357633343979559

key: test_fscore
value: [0.66666667 0.66666667 0.8        0.82352941 0.82352941 0.70588235
 0.8        0.75       0.8        0.75      ]

mean value: 0.7586274509803922

key: train_fscore
value: [0.85534591 0.82278481 0.81818182 0.82051282 0.8427673  0.86585366
 0.81761006 0.8447205  0.84848485 0.83544304]

mean value: 0.8371704761151999

key: test_precision
value: [0.63636364 1.         0.8        1.         1.         0.75
 1.         0.85714286 1.         0.85714286]

mean value: 0.890064935064935

key: train_precision
value: [0.91891892 0.89041096 0.91304348 0.90140845 0.90540541 0.91025641
 0.89041096 0.90666667 0.88607595 0.91666667]

mean value: 0.9039263864054471

key: test_recall
value: [0.7        0.5        0.8        0.7        0.7        0.66666667
 0.66666667 0.66666667 0.66666667 0.66666667]

mean value: 0.6733333333333333

key: train_recall
value: [0.8        0.76470588 0.74117647 0.75294118 0.78823529 0.8255814
 0.75581395 0.79069767 0.81395349 0.76744186]

mean value: 0.7800547195622435

key: test_roc_auc
value: [0.79285714 0.75       0.87142857 0.85       0.85       0.80555556
 0.83333333 0.81944444 0.83333333 0.81904762]

mean value: 0.8225

key: train_roc_auc
value: [0.89056604 0.86977432 0.86115427 0.8654643  0.88311136 0.90174969
 0.86528868 0.88430783 0.8928258  0.87428697]

mean value: 0.8788529257196572

key: test_jcc
value: [0.5        0.5        0.66666667 0.7        0.7        0.54545455
 0.66666667 0.6        0.66666667 0.6       ]

mean value: 0.6145454545454545

key: train_jcc
value: [0.74725275 0.69892473 0.69230769 0.69565217 0.72826087 0.76344086
 0.69148936 0.7311828  0.73684211 0.7173913 ]

mean value: 0.7202744641448586

MCC on Blind test: 0.31

Accuracy on Blind test: 0.88

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.70116019 0.80249429 0.71138048 0.69454575 0.80476427 0.65118718
 0.66926408 0.86205721 0.67393637 0.69592953]

mean value: 0.7266719341278076

key: score_time
value: [0.01424193 0.0140202  0.01410961 0.01113176 0.01438403 0.01496172
 0.01431608 0.01441383 0.01441216 0.01464701]

mean value: 0.014063835144042969

key: test_mcc
value: [0.64465837 0.73010948 0.76553182 0.93541435 0.86991767 0.55182541
 0.92998111 0.87904907 0.78360391 0.86031746]

mean value: 0.7950408639956386

key: train_mcc
value: [0.91766928 0.89482822 0.90269496 0.91054384 0.91837573 0.94087008
 0.89652263 0.89748849 0.91136463 0.89659207]

mean value: 0.9086949939856763

key: test_accuracy
value: [0.86666667 0.91111111 0.91111111 0.97777778 0.95555556 0.86666667
 0.97777778 0.95555556 0.93181818 0.95454545]

mean value: 0.9308585858585858

key: train_accuracy
value: [0.97270471 0.96526055 0.96774194 0.97022333 0.97270471 0.98014888
 0.96526055 0.96526055 0.97029703 0.96534653]

mean value: 0.969494877527455

key: test_fscore
value: [0.72727273 0.77777778 0.81818182 0.94736842 0.88888889 0.625
 0.94117647 0.9        0.8        0.88888889]

mean value: 0.8314554992650968

key: train_fscore
value: [0.93491124 0.91666667 0.92307692 0.92941176 0.93567251 0.95348837
 0.91860465 0.91954023 0.93023256 0.91860465]

mean value: 0.9280209574116103

key: test_precision
value: [0.66666667 0.875      0.75       1.         1.         0.71428571
 1.         0.81818182 1.         0.88888889]

mean value: 0.8713023088023089

key: train_precision
value: [0.94047619 0.92771084 0.92857143 0.92941176 0.93023256 0.95348837
 0.91860465 0.90909091 0.93023256 0.91860465]

mean value: 0.9286423926915579

key: test_recall
value: [0.8        0.7        0.9        0.9        0.8        0.55555556
 0.88888889 1.         0.66666667 0.88888889]

mean value: 0.81

key: train_recall
value: [0.92941176 0.90588235 0.91764706 0.92941176 0.94117647 0.95348837
 0.91860465 0.93023256 0.93023256 0.91860465]

mean value: 0.927469220246238

key: test_roc_auc
value: [0.84285714 0.83571429 0.90714286 0.95       0.9        0.75
 0.94444444 0.97222222 0.83333333 0.93015873]

mean value: 0.8865873015873016

key: train_roc_auc
value: [0.95684425 0.94350721 0.94938957 0.95527192 0.96115427 0.97043504
 0.94826132 0.95249798 0.95568232 0.94829604]

mean value: 0.9541339911123459

key: test_jcc
value: [0.57142857 0.63636364 0.69230769 0.9        0.8        0.45454545
 0.88888889 0.81818182 0.66666667 0.8       ]

mean value: 0.7228382728382728

key: train_jcc
value: [0.87777778 0.84615385 0.85714286 0.86813187 0.87912088 0.91111111
 0.84946237 0.85106383 0.86956522 0.84946237]

mean value: 0.8658992117799673

MCC on Blind test: 0.26

Accuracy on Blind test: 0.85

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01066494 0.01008058 0.0076077  0.00746608 0.00735378 0.0078814
 0.00793862 0.00814748 0.00798297 0.0080893 ]

mean value: 0.008321285247802734

key: score_time
value: [0.01098418 0.00850654 0.00823236 0.00799155 0.00801158 0.00867128
 0.00872326 0.00870562 0.00858879 0.008641  ]

mean value: 0.008705615997314453

key: test_mcc
value: [0.48483174 0.59030128 0.52378493 0.79539491 0.56447381 0.49897013
 0.74977715 0.58333333 0.74230749 0.35783003]

mean value: 0.5891004788968268

key: train_mcc
value: [0.70574597 0.60320861 0.66220727 0.65220882 0.63952782 0.60127303
 0.65498926 0.65973727 0.65999897 0.63678872]

mean value: 0.6475685731489803

key: test_accuracy
value: [0.8        0.86666667 0.82222222 0.91111111 0.82222222 0.75555556
 0.91111111 0.86666667 0.88636364 0.77272727]

mean value: 0.8414646464646465

key: train_accuracy
value: [0.88337469 0.86848635 0.87344913 0.86600496 0.86600496 0.81141439
 0.86848635 0.87096774 0.87128713 0.85891089]

mean value: 0.863838660540992

key: test_fscore
value: [0.60869565 0.66666667 0.63636364 0.83333333 0.66666667 0.59259259
 0.8        0.66666667 0.7826087  0.5       ]

mean value: 0.675359391011565

key: train_fscore
value: [0.76616915 0.68639053 0.7357513  0.72727273 0.71875    0.6779661
 0.73096447 0.73469388 0.73469388 0.71641791]

mean value: 0.7229069943632542

key: test_precision
value: [0.53846154 0.75       0.58333333 0.71428571 0.57142857 0.44444444
 0.72727273 0.66666667 0.64285714 0.45454545]

mean value: 0.6093295593295593

key: train_precision
value: [0.6637931  0.69047619 0.65740741 0.63716814 0.64485981 0.53333333
 0.64864865 0.65454545 0.65454545 0.62608696]

mean value: 0.6410864503603536

key: test_recall
value: [0.7        0.6        0.7        1.         0.8        0.88888889
 0.88888889 0.66666667 1.         0.55555556]

mean value: 0.78

key: train_recall
value: [0.90588235 0.68235294 0.83529412 0.84705882 0.81176471 0.93023256
 0.8372093  0.8372093  0.8372093  0.8372093 ]

mean value: 0.8361422708618331

key: test_roc_auc
value: [0.76428571 0.77142857 0.77857143 0.94285714 0.81428571 0.80555556
 0.90277778 0.79166667 0.92857143 0.69206349]

mean value: 0.8192063492063492

key: train_roc_auc
value: [0.89162042 0.80029597 0.85947096 0.859064   0.84613393 0.85470618
 0.85709046 0.85866774 0.85885622 0.85099459]

mean value: 0.8536900470036405

key: test_jcc
value: [0.4375     0.5        0.46666667 0.71428571 0.5        0.42105263
 0.66666667 0.5        0.64285714 0.33333333]

mean value: 0.5182362155388471

key: train_jcc
value: [0.62096774 0.52252252 0.58196721 0.57142857 0.56097561 0.51282051
 0.576      0.58064516 0.58064516 0.55813953]

mean value: 0.5666112029042308

MCC on Blind test: 0.31

Accuracy on Blind test: 0.79

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00779533 0.00809622 0.00823021 0.0081985  0.00811529 0.00841022
 0.00828648 0.00819182 0.00816894 0.00820422]

mean value: 0.008169722557067872

key: score_time
value: [0.00871468 0.00865245 0.00868225 0.00860405 0.00801325 0.00864792
 0.0086062  0.00858259 0.00805998 0.0086143 ]

mean value: 0.008517765998840332

key: test_mcc
value: [0.24896765 0.50799198 0.26726124 0.65547353 0.48483174 0.0919709
 0.53033009 0.34874292 0.35783003 0.2627869 ]

mean value: 0.37561869673775167

key: train_mcc
value: [0.46269974 0.47030687 0.49545247 0.43032136 0.4560545  0.49551509
 0.4750932  0.46293038 0.45477034 0.4913158 ]

mean value: 0.4694459747127293

key: test_accuracy
value: [0.71111111 0.84444444 0.75555556 0.88888889 0.8        0.73333333
 0.86666667 0.8        0.77272727 0.75      ]

mean value: 0.7922727272727272

key: train_accuracy
value: [0.81637717 0.82878412 0.83622829 0.81885856 0.82630273 0.82382134
 0.82878412 0.80645161 0.82178218 0.82425743]

mean value: 0.8231647544407046

key: test_fscore
value: [0.43478261 0.58823529 0.42105263 0.70588235 0.60869565 0.25
 0.57142857 0.47058824 0.5        0.42105263]

mean value: 0.49717179778089726

key: train_fscore
value: [0.57954545 0.57668712 0.59756098 0.5408805  0.5625     0.60773481
 0.58181818 0.58510638 0.56626506 0.60335196]

mean value: 0.5801450436839247

key: test_precision
value: [0.38461538 0.71428571 0.44444444 0.85714286 0.53846154 0.28571429
 0.8        0.5        0.45454545 0.4       ]

mean value: 0.5379209679209679

key: train_precision
value: [0.56043956 0.6025641  0.62025316 0.58108108 0.6        0.57894737
 0.60759494 0.53921569 0.5875     0.58064516]

mean value: 0.5858241061336452

key: test_recall
value: [0.5        0.5        0.4        0.6        0.7        0.22222222
 0.44444444 0.44444444 0.55555556 0.44444444]

mean value: 0.4811111111111111

key: train_recall
value: [0.6        0.55294118 0.57647059 0.50588235 0.52941176 0.63953488
 0.55813953 0.63953488 0.54651163 0.62790698]

mean value: 0.5776333789329685

key: test_roc_auc
value: [0.63571429 0.72142857 0.62857143 0.78571429 0.76428571 0.54166667
 0.70833333 0.66666667 0.69206349 0.63650794]

mean value: 0.6780952380952381

key: train_roc_auc
value: [0.73710692 0.72772845 0.74106548 0.70419904 0.71753607 0.75667596
 0.73017387 0.74563495 0.72136902 0.75263273]

mean value: 0.7334122492545921

key: test_jcc
value: [0.27777778 0.41666667 0.26666667 0.54545455 0.4375     0.14285714
 0.4        0.30769231 0.33333333 0.26666667]

mean value: 0.3394615107115107

key: train_jcc
value: [0.408      0.40517241 0.42608696 0.37068966 0.39130435 0.43650794
 0.41025641 0.41353383 0.39495798 0.432     ]

mean value: 0.4088509537857433

MCC on Blind test: 0.27

Accuracy on Blind test: 0.79

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00812173 0.00822234 0.00809884 0.00798321 0.00790763 0.00786543
 0.00798416 0.00782347 0.00781441 0.00777555]

mean value: 0.007959675788879395

key: score_time
value: [0.05056334 0.01157475 0.01144004 0.01309729 0.01108718 0.01101208
 0.01113534 0.01105309 0.01099586 0.01101422]

mean value: 0.015297317504882812

key: test_mcc
value: [0.15118579 0.5        0.41931393 0.65547353 0.28571429 0.53452248
 0.42947785 0.6681531  0.78360391 0.15983741]

mean value: 0.458728228665888

key: train_mcc
value: [0.60794624 0.62646908 0.63584249 0.56988091 0.6086966  0.60327572
 0.62166491 0.60374425 0.59538331 0.60392999]

mean value: 0.6076833509798295

key: test_accuracy
value: [0.75555556 0.84444444 0.82222222 0.88888889 0.8        0.86666667
 0.84444444 0.88888889 0.93181818 0.79545455]

mean value: 0.8438383838383838

key: train_accuracy
value: [0.8808933  0.88585608 0.88833747 0.87096774 0.8808933  0.87841191
 0.88337469 0.87841191 0.87623762 0.87871287]

mean value: 0.8802096897034617

key: test_fscore
value: [0.26666667 0.46153846 0.5        0.70588235 0.30769231 0.5
 0.46153846 0.73684211 0.8        0.18181818]

mean value: 0.49219785374584135

key: train_fscore
value: [0.64179104 0.66176471 0.67625899 0.6        0.65217391 0.64233577
 0.6618705  0.64748201 0.64285714 0.64748201]

mean value: 0.6474016098162307

key: test_precision
value: [0.4        1.         0.66666667 0.85714286 0.66666667 1.
 0.75       0.7        1.         0.5       ]

mean value: 0.7540476190476191

key: train_precision
value: [0.87755102 0.88235294 0.87037037 0.86666667 0.8490566  0.8627451
 0.86792453 0.8490566  0.83333333 0.8490566 ]

mean value: 0.8608113769616862

key: test_recall
value: [0.2        0.3        0.4        0.6        0.2        0.33333333
 0.33333333 0.77777778 0.66666667 0.11111111]

mean value: 0.3922222222222222

key: train_recall
value: [0.50588235 0.52941176 0.55294118 0.45882353 0.52941176 0.51162791
 0.53488372 0.52325581 0.52325581 0.52325581]

mean value: 0.5192749658002737

key: test_roc_auc
value: [0.55714286 0.65       0.67142857 0.78571429 0.58571429 0.66666667
 0.65277778 0.84722222 0.83333333 0.54126984]

mean value: 0.6791269841269841

key: train_roc_auc
value: [0.74350721 0.75527192 0.7654643  0.7199778  0.75212727 0.74477294
 0.75640085 0.74900961 0.74747696 0.74904929]

mean value: 0.7483058161342697

key: test_jcc
value: [0.15384615 0.3        0.33333333 0.54545455 0.18181818 0.33333333
 0.3        0.58333333 0.66666667 0.1       ]

mean value: 0.3497785547785548

key: train_jcc
value: [0.47252747 0.49450549 0.51086957 0.42857143 0.48387097 0.47311828
 0.49462366 0.4787234  0.47368421 0.4787234 ]

mean value: 0.4789217883084548

MCC on Blind test: 0.34

Accuracy on Blind test: 0.92

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01309419 0.01144147 0.01072574 0.0107615  0.01076436 0.0107367
 0.01085353 0.01092339 0.01091146 0.01101494]

mean value: 0.011122727394104004

key: score_time
value: [0.00937915 0.00849342 0.00914073 0.00845909 0.00852418 0.00841475
 0.00845885 0.00858498 0.00842404 0.0084095 ]

mean value: 0.00862886905670166

key: test_mcc
value: [0.44223199 0.66143783 0.53452248 0.73010948 0.57655666 0.62103443
 0.78446454 0.70511024 0.70370542 0.49137176]

mean value: 0.6250544835615041

key: train_mcc
value: [0.78203228 0.75720841 0.76610765 0.71535862 0.74092334 0.76796395
 0.73477752 0.76796395 0.76847981 0.74415611]

mean value: 0.7544971629935804

key: test_accuracy
value: [0.8        0.88888889 0.84444444 0.91111111 0.86666667 0.88888889
 0.93333333 0.91111111 0.90909091 0.84090909]

mean value: 0.8794444444444445

key: train_accuracy
value: [0.93052109 0.92307692 0.92555831 0.91066998 0.91811414 0.92555831
 0.91563275 0.92555831 0.92574257 0.91831683]

mean value: 0.9218749232243324

key: test_fscore
value: [0.57142857 0.66666667 0.63157895 0.77777778 0.625      0.66666667
 0.8        0.75       0.75       0.58823529]

mean value: 0.6827353924025751

key: train_fscore
value: [0.81578947 0.79194631 0.80519481 0.75675676 0.78145695 0.80519481
 0.77333333 0.80519481 0.80769231 0.78709677]

mean value: 0.7929656323611789

key: test_precision
value: [0.54545455 1.         0.66666667 0.875      0.83333333 0.83333333
 1.         0.85714286 0.85714286 0.625     ]

mean value: 0.8093073593073593

key: train_precision
value: [0.92537313 0.921875   0.89855072 0.88888889 0.89393939 0.91176471
 0.90625    0.91176471 0.9        0.88405797]

mean value: 0.9042464524573521

key: test_recall
value: [0.6        0.5        0.6        0.7        0.5        0.55555556
 0.66666667 0.66666667 0.66666667 0.55555556]

mean value: 0.6011111111111112

key: train_recall
value: [0.72941176 0.69411765 0.72941176 0.65882353 0.69411765 0.72093023
 0.6744186  0.72093023 0.73255814 0.70930233]

mean value: 0.7064021887824897

key: test_roc_auc
value: [0.72857143 0.75       0.75714286 0.83571429 0.73571429 0.76388889
 0.83333333 0.81944444 0.81904762 0.73492063]

mean value: 0.7777777777777778

key: train_roc_auc
value: [0.85684425 0.83919719 0.85369959 0.81840548 0.83605253 0.85100139
 0.82774558 0.85100139 0.85527278 0.84207255]

mean value: 0.8431292732694862

key: test_jcc
value: [0.4        0.5        0.46153846 0.63636364 0.45454545 0.5
 0.66666667 0.6        0.6        0.41666667]

mean value: 0.5235780885780885

key: train_jcc
value: [0.68888889 0.65555556 0.67391304 0.60869565 0.64130435 0.67391304
 0.63043478 0.67391304 0.67741935 0.64893617]

mean value: 0.6572973882539398

MCC on Blind test: 0.38

Accuracy on Blind test: 0.92

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.72929549 0.56049705 0.5628233  0.42927814 0.6755352  0.6567452
 0.94534802 0.34498525 1.1619606  0.58351111]

mean value: 0.664997935295105

key: score_time
value: [0.0111177  0.01106381 0.01113963 0.01541114 0.01135039 0.01112366
 0.01115203 0.02141356 0.01108098 0.01109648]

mean value: 0.012594938278198242

key: test_mcc
value: [0.56660974 0.49135381 0.41931393 0.76553182 0.53452248 0.55182541
 0.78446454 0.42947785 0.78353876 0.70156665]

mean value: 0.6028204990182109

key: train_mcc
value: [0.70728394 0.72527216 0.59866291 0.72938054 0.58118493 0.75134611
 0.79357239 0.57559177 0.83228784 0.78243871]

mean value: 0.7077021299353916

key: test_accuracy
value: [0.84444444 0.84444444 0.82222222 0.91111111 0.84444444 0.86666667
 0.93333333 0.84444444 0.93181818 0.88636364]

mean value: 0.8729292929292929

key: train_accuracy
value: [0.90818859 0.91066998 0.87841191 0.91066998 0.87344913 0.91811414
 0.93300248 0.87096774 0.94554455 0.92574257]

mean value: 0.9074761074122301

key: test_fscore
value: [0.66666667 0.53333333 0.5        0.81818182 0.63157895 0.625
 0.8        0.46153846 0.82352941 0.76190476]

mean value: 0.6621733400758169

key: train_fscore
value: [0.75167785 0.7804878  0.6259542  0.78571429 0.62773723 0.80239521
 0.83229814 0.61764706 0.86075949 0.82954545]

mean value: 0.7514216720958653

key: test_precision
value: [0.63636364 0.8        0.66666667 0.75       0.66666667 0.71428571
 1.         0.75       0.875      0.66666667]

mean value: 0.752564935064935

key: train_precision
value: [0.875      0.81012658 0.89130435 0.79518072 0.82692308 0.82716049
 0.89333333 0.84       0.94444444 0.81111111]

mean value: 0.8514584112635261

key: test_recall
value: [0.7        0.4        0.4        0.9        0.6        0.55555556
 0.66666667 0.33333333 0.77777778 0.88888889]

mean value: 0.6222222222222222

key: train_recall
value: [0.65882353 0.75294118 0.48235294 0.77647059 0.50588235 0.77906977
 0.77906977 0.48837209 0.79069767 0.84883721]

mean value: 0.6862517099863201

key: test_roc_auc
value: [0.79285714 0.68571429 0.67142857 0.90714286 0.75714286 0.75
 0.83333333 0.65277778 0.87460317 0.88730159]

mean value: 0.7812301587301587

key: train_roc_auc
value: [0.81683315 0.85288568 0.73331484 0.86150573 0.73879023 0.86745286
 0.87691659 0.73156775 0.88905953 0.89768904]

mean value: 0.8266015409642332

key: test_jcc
value: [0.5        0.36363636 0.33333333 0.69230769 0.46153846 0.45454545
 0.66666667 0.3        0.7        0.61538462]

mean value: 0.5087412587412588

key: train_jcc
value: [0.60215054 0.64       0.45555556 0.64705882 0.45744681 0.67
 0.71276596 0.44680851 0.75555556 0.70873786]

mean value: 0.6096079612948346

MCC on Blind test: 0.27

Accuracy on Blind test: 0.86

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01367283 0.01149011 0.01038933 0.00990009 0.00930309 0.00920916
 0.01029682 0.00981116 0.00988531 0.00956535]

mean value: 0.010352325439453126

key: score_time
value: [0.01084828 0.00840163 0.00803304 0.00791907 0.00791812 0.0079155
 0.00788355 0.00793147 0.00789642 0.00787997]

mean value: 0.008262705802917481

key: test_mcc
value: [0.72069583 0.80178373 0.88640526 0.80295507 0.53452248 0.72222222
 0.86111111 0.86111111 0.85775039 0.87831007]

mean value: 0.7926867266324226

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 0.93333333 0.95555556 0.93333333 0.84444444 0.91111111
 0.95555556 0.95555556 0.95454545 0.95454545]

mean value: 0.9286868686868687

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7826087  0.84210526 0.90909091 0.82352941 0.63157895 0.77777778
 0.88888889 0.88888889 0.875      0.9       ]

mean value: 0.8319468782589661

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.69230769 0.88888889 0.83333333 1.         0.66666667 0.77777778
 0.88888889 0.88888889 1.         0.81818182]

mean value: 0.8454933954933955

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9        0.8        1.         0.7        0.6        0.77777778
 0.88888889 0.88888889 0.77777778 1.        ]

mean value: 0.8333333333333334

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.89285714 0.88571429 0.97142857 0.85       0.75714286 0.86111111
 0.93055556 0.93055556 0.88888889 0.97142857]

mean value: 0.893968253968254

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.64285714 0.72727273 0.83333333 0.7        0.46153846 0.63636364
 0.8        0.8        0.77777778 0.81818182]

mean value: 0.7197324897324897

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.86

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09645939 0.09477639 0.09699512 0.09707952 0.10095119 0.10206556
 0.1050036  0.09934473 0.09767556 0.09797168]

mean value: 0.09883227348327636

key: score_time
value: [0.01664639 0.01817846 0.01748824 0.01789355 0.0176363  0.01849771
 0.01823759 0.01819301 0.01714015 0.01752734]

mean value: 0.017743873596191406

key: test_mcc
value: [0.64465837 0.50799198 0.59030128 0.80295507 0.73010948 0.63936201
 0.78446454 0.78446454 0.92962225 0.70370542]

mean value: 0.7117634944566859

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86666667 0.84444444 0.86666667 0.93333333 0.91111111 0.88888889
 0.93333333 0.93333333 0.97727273 0.90909091]

mean value: 0.9064141414141414

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.58823529 0.66666667 0.82352941 0.77777778 0.70588235
 0.8        0.8        0.94117647 0.75      ]

mean value: 0.7580540701128936

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.71428571 0.75       1.         0.875      0.75
 1.         1.         1.         0.85714286]

mean value: 0.8613095238095239

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.8        0.5        0.6        0.7        0.7        0.66666667
 0.66666667 0.66666667 0.88888889 0.66666667]

mean value: 0.6855555555555555

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.84285714 0.72142857 0.77142857 0.85       0.83571429 0.80555556
 0.83333333 0.83333333 0.94444444 0.81904762]

mean value: 0.8257142857142856

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57142857 0.41666667 0.5        0.7        0.63636364 0.54545455
 0.66666667 0.66666667 0.88888889 0.6       ]

mean value: 0.6192135642135642

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.34

Accuracy on Blind test: 0.88

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00831676 0.00815201 0.00814176 0.00760484 0.00792241 0.00806022
 0.00744224 0.00811696 0.00831771 0.007792  ]

mean value: 0.007986688613891601

key: score_time
value: [0.0079205  0.00823855 0.0081501  0.00784802 0.00789833 0.00781298
 0.00785351 0.00774097 0.00806856 0.00786448]

mean value: 0.007939600944519043

key: test_mcc
value: [0.54554473 0.6681531  0.49629167 0.76553182 0.45049308 0.24525574
 0.72222222 0.53452248 0.53168513 0.57505463]

mean value: 0.5534754596434204

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.88888889 0.77777778 0.91111111 0.82222222 0.77777778
 0.91111111 0.84444444 0.84090909 0.84090909]

mean value: 0.8392929292929293

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.64285714 0.73684211 0.61538462 0.81818182 0.55555556 0.375
 0.77777778 0.63157895 0.63157895 0.66666667]

mean value: 0.6451423576423576

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.77777778 0.5        0.75       0.625      0.42857143
 0.77777778 0.6        0.6        0.58333333]

mean value: 0.6142460317460318

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9        0.7        0.8        0.9        0.5        0.33333333
 0.77777778 0.66666667 0.66666667 0.77777778]

mean value: 0.7022222222222222

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.82142857 0.82142857 0.78571429 0.90714286 0.70714286 0.61111111
 0.86111111 0.77777778 0.77619048 0.81746032]

mean value: 0.7886507936507936

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.47368421 0.58333333 0.44444444 0.69230769 0.38461538 0.23076923
 0.63636364 0.46153846 0.46153846 0.5       ]

mean value: 0.48685948554369607

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.15

Accuracy on Blind test: 0.72

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.25313497 1.24195075 1.2457087  1.27323222 1.23807859 1.32632828
 1.27514148 1.23370337 1.23768973 1.23494816]

mean value: 1.2559916257858277

key: score_time
value: [0.09639502 0.09044957 0.09679174 0.0888958  0.0955162  0.096071
 0.09548783 0.09246659 0.09599972 0.08963633]

mean value: 0.09377098083496094

key: test_mcc
value: [0.79539491 0.93974299 0.88640526 0.86991767 0.86991767 0.72222222
 0.92998111 0.93541435 0.92962225 0.93503247]

mean value: 0.8813650899545291

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.97777778 0.95555556 0.95555556 0.95555556 0.91111111
 0.97777778 0.97777778 0.97727273 0.97727273]

mean value: 0.9576767676767677

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.95238095 0.90909091 0.88888889 0.88888889 0.77777778
 0.94117647 0.94736842 0.94117647 0.94736842]

mean value: 0.9027450533642484

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.90909091 0.83333333 1.         1.         0.77777778
 1.         0.9        1.         0.9       ]

mean value: 0.9034487734487735

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.8        0.8        0.77777778
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9155555555555556

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94285714 0.98571429 0.97142857 0.9        0.9        0.86111111
 0.94444444 0.98611111 0.94444444 0.98571429]

mean value: 0.9421825396825396

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.90909091 0.83333333 0.8        0.8        0.63636364
 0.88888889 0.9        0.88888889 0.9       ]

mean value: 0.8270851370851371

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.87

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.7415638  0.88944888 0.92626333 0.89779186 0.89715624 0.98574638
 0.93079758 0.91791654 0.86064243 0.91723776]

mean value: 0.9964564800262451

key: score_time
value: [0.22609925 0.25738049 0.15809059 0.23059368 0.22971487 0.24633598
 0.2073884  0.26125479 0.13984299 0.23984051]

mean value: 0.21965415477752687

key: test_mcc
value: [0.83862787 0.93541435 0.74285714 0.86991767 0.86991767 0.72222222
 0.92998111 0.93541435 0.92962225 0.93503247]

mean value: 0.8709007101271089

key: train_mcc
value: [0.93322152 0.93322152 0.93322152 0.93322152 0.94097505 0.95612789
 0.93378477 0.93378477 0.94150783 0.94090976]

mean value: 0.9379976137162074

key: test_accuracy
value: [0.93333333 0.97777778 0.91111111 0.95555556 0.95555556 0.91111111
 0.97777778 0.97777778 0.97727273 0.97727273]

mean value: 0.9554545454545454

key: train_accuracy
value: [0.97766749 0.97766749 0.97766749 0.97766749 0.98014888 0.98511166
 0.97766749 0.97766749 0.98019802 0.98019802]

mean value: 0.9791661548288824

key: test_fscore
value: [0.86956522 0.94736842 0.8        0.88888889 0.88888889 0.77777778
 0.94117647 0.94736842 0.94117647 0.94736842]

mean value: 0.8949578977281226

key: train_fscore
value: [0.94736842 0.94736842 0.94736842 0.94736842 0.95348837 0.96551724
 0.94797688 0.94797688 0.95402299 0.95348837]

mean value: 0.9511944415507063

key: test_precision
value: [0.76923077 1.         0.8        1.         1.         0.77777778
 1.         0.9        1.         0.9       ]

mean value: 0.9147008547008547

key: train_precision
value: [0.94186047 0.94186047 0.94186047 0.94186047 0.94252874 0.95454545
 0.94252874 0.94252874 0.94318182 0.95348837]

mean value: 0.9446243712181964

key: test_recall
value: [1.         0.9        0.8        0.8        0.8        0.77777778
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.8855555555555555

key: train_recall
value: [0.95294118 0.95294118 0.95294118 0.95294118 0.96470588 0.97674419
 0.95348837 0.95348837 0.96511628 0.95348837]

mean value: 0.9578796169630643

key: test_roc_auc
value: [0.95714286 0.95       0.87142857 0.9        0.9        0.86111111
 0.94444444 0.98611111 0.94444444 0.98571429]

mean value: 0.9300396825396825

key: train_roc_auc
value: [0.96860895 0.96860895 0.96860895 0.96860895 0.97449131 0.98206294
 0.96885775 0.96885775 0.9746965  0.97045488]

mean value: 0.9713856946391021

key: test_jcc
value: [0.76923077 0.9        0.66666667 0.8        0.8        0.63636364
 0.88888889 0.9        0.88888889 0.9       ]

mean value: 0.815003885003885

key: train_jcc
value: [0.9        0.9        0.9        0.9        0.91111111 0.93333333
 0.9010989  0.9010989  0.91208791 0.91111111]

mean value: 0.906984126984127

MCC on Blind test: 0.25

Accuracy on Blind test: 0.87

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01952696 0.00753379 0.00747681 0.00779676 0.00800824 0.00813532
 0.00763774 0.0079987  0.00786948 0.00797772]

mean value: 0.008996152877807617

key: score_time
value: [0.01012135 0.00860143 0.00816488 0.00859714 0.0084126  0.00811219
 0.00836134 0.00859356 0.00860572 0.00817752]

mean value: 0.008574771881103515

key: test_mcc
value: [0.24896765 0.50799198 0.26726124 0.65547353 0.48483174 0.0919709
 0.53033009 0.34874292 0.35783003 0.2627869 ]

mean value: 0.37561869673775167

key: train_mcc
value: [0.46269974 0.47030687 0.49545247 0.43032136 0.4560545  0.49551509
 0.4750932  0.46293038 0.45477034 0.4913158 ]

mean value: 0.4694459747127293

key: test_accuracy
value: [0.71111111 0.84444444 0.75555556 0.88888889 0.8        0.73333333
 0.86666667 0.8        0.77272727 0.75      ]

mean value: 0.7922727272727272

key: train_accuracy
value: [0.81637717 0.82878412 0.83622829 0.81885856 0.82630273 0.82382134
 0.82878412 0.80645161 0.82178218 0.82425743]

mean value: 0.8231647544407046

key: test_fscore
value: [0.43478261 0.58823529 0.42105263 0.70588235 0.60869565 0.25
 0.57142857 0.47058824 0.5        0.42105263]

mean value: 0.49717179778089726

key: train_fscore
value: [0.57954545 0.57668712 0.59756098 0.5408805  0.5625     0.60773481
 0.58181818 0.58510638 0.56626506 0.60335196]

mean value: 0.5801450436839247

key: test_precision
value: [0.38461538 0.71428571 0.44444444 0.85714286 0.53846154 0.28571429
 0.8        0.5        0.45454545 0.4       ]

mean value: 0.5379209679209679

key: train_precision
value: [0.56043956 0.6025641  0.62025316 0.58108108 0.6        0.57894737
 0.60759494 0.53921569 0.5875     0.58064516]

mean value: 0.5858241061336452

key: test_recall
value: [0.5        0.5        0.4        0.6        0.7        0.22222222
 0.44444444 0.44444444 0.55555556 0.44444444]

mean value: 0.4811111111111111

key: train_recall
value: [0.6        0.55294118 0.57647059 0.50588235 0.52941176 0.63953488
 0.55813953 0.63953488 0.54651163 0.62790698]

mean value: 0.5776333789329685

key: test_roc_auc
value: [0.63571429 0.72142857 0.62857143 0.78571429 0.76428571 0.54166667
 0.70833333 0.66666667 0.69206349 0.63650794]

mean value: 0.6780952380952381

key: train_roc_auc
value: [0.73710692 0.72772845 0.74106548 0.70419904 0.71753607 0.75667596
 0.73017387 0.74563495 0.72136902 0.75263273]

mean value: 0.7334122492545921

key: test_jcc
value: [0.27777778 0.41666667 0.26666667 0.54545455 0.4375     0.14285714
 0.4        0.30769231 0.33333333 0.26666667]

mean value: 0.3394615107115107

key: train_jcc
value: [0.408      0.40517241 0.42608696 0.37068966 0.39130435 0.43650794
 0.41025641 0.41353383 0.39495798 0.432     ]

mean value: 0.4088509537857433

MCC on Blind test: 0.27

Accuracy on Blind test: 0.79

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09265685 0.05299425 0.05342174 0.05338025 0.05346775 0.10898733
 0.04733276 0.04669976 0.04698706 0.05033135]

mean value: 0.06062591075897217

key: score_time
value: [0.01007628 0.00975966 0.00967383 0.01012087 0.01011181 0.01093411
 0.01036572 0.01043487 0.01195836 0.01052809]

mean value: 0.010396361351013184

key: test_mcc
value: [0.79539491 0.93541435 0.88640526 0.86991767 0.93541435 0.63936201
 0.92998111 0.93541435 0.92962225 0.93503247]

mean value: 0.8791958723600636

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.97777778 0.95555556 0.95555556 0.97777778 0.88888889
 0.97777778 0.97777778 0.97727273 0.97727273]

mean value: 0.9576767676767677

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.94736842 0.90909091 0.88888889 0.94736842 0.70588235
 0.94117647 0.94736842 0.94117647 0.94736842]

mean value: 0.9009022109641305

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 1.         0.83333333 1.         1.         0.75
 1.         0.9        1.         0.9       ]

mean value: 0.9097619047619048

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.9        1.         0.8        0.9        0.66666667
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9044444444444444

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94285714 0.95       0.97142857 0.9        0.95       0.80555556
 0.94444444 0.98611111 0.94444444 0.98571429]

mean value: 0.9380555555555555

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.9        0.83333333 0.8        0.9        0.54545455
 0.88888889 0.9        0.88888889 0.9       ]

mean value: 0.8270851370851371

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.87

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.019593   0.02771401 0.03854823 0.0388329  0.03496313 0.0385735
 0.05273533 0.03399873 0.03737974 0.04336548]

mean value: 0.036570405960083006

key: score_time
value: [0.01057959 0.0109818  0.01963091 0.02015376 0.02013206 0.02053571
 0.01113605 0.01271248 0.02074575 0.02328992]

mean value: 0.016989803314208983

key: test_mcc
value: [0.56660974 0.80178373 0.76553182 0.93541435 0.93541435 0.63936201
 0.86111111 0.87904907 0.78360391 0.86031746]

mean value: 0.8028197545072195

key: train_mcc
value: [0.8869427  0.87274633 0.88072512 0.87383838 0.86705728 0.90434373
 0.88174015 0.8749027  0.87498726 0.86704695]

mean value: 0.8784330603510744

key: test_accuracy
value: [0.84444444 0.93333333 0.91111111 0.97777778 0.97777778 0.88888889
 0.95555556 0.95555556 0.93181818 0.95454545]

mean value: 0.9330808080808081

key: train_accuracy
value: [0.96277916 0.95781638 0.96029777 0.95781638 0.95533499 0.96774194
 0.96029777 0.95781638 0.95792079 0.95544554]

mean value: 0.9593267081050537

key: test_fscore
value: [0.66666667 0.84210526 0.81818182 0.94736842 0.94736842 0.70588235
 0.88888889 0.9        0.8        0.88888889]

mean value: 0.8405350720830598

key: train_fscore
value: [0.91017964 0.89940828 0.90588235 0.9005848  0.89534884 0.92485549
 0.90697674 0.9017341  0.9017341  0.89534884]

mean value: 0.9042053191031663

key: test_precision
value: [0.63636364 0.88888889 0.75       1.         1.         0.75
 0.88888889 0.81818182 1.         0.88888889]

mean value: 0.8621212121212121

key: train_precision
value: [0.92682927 0.9047619  0.90588235 0.89534884 0.88505747 0.91954023
 0.90697674 0.89655172 0.89655172 0.89534884]

mean value: 0.9032849094025703

key: test_recall
value: [0.7        0.8        0.9        0.9        0.9        0.66666667
 0.88888889 1.         0.66666667 0.88888889]

mean value: 0.8311111111111111

key: train_recall
value: [0.89411765 0.89411765 0.90588235 0.90588235 0.90588235 0.93023256
 0.90697674 0.90697674 0.90697674 0.89534884]

mean value: 0.9052393980848153

key: test_roc_auc
value: [0.79285714 0.88571429 0.90714286 0.95       0.95       0.80555556
 0.93055556 0.97222222 0.83333333 0.93015873]

mean value: 0.8957539682539682

key: train_roc_auc
value: [0.93762486 0.93448021 0.94036256 0.93879023 0.93721791 0.95407527
 0.94087008 0.93929279 0.93933743 0.93352348]

mean value: 0.9395574805236686

key: test_jcc
value: [0.5        0.72727273 0.69230769 0.9        0.9        0.54545455
 0.8        0.81818182 0.66666667 0.8       ]

mean value: 0.734988344988345

key: train_jcc
value: [0.83516484 0.8172043  0.82795699 0.81914894 0.81052632 0.86021505
 0.82978723 0.82105263 0.82105263 0.81052632]

mean value: 0.8252635244200465

MCC on Blind test: 0.29

Accuracy on Blind test: 0.84

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00993943 0.00762343 0.00771737 0.00769854 0.00768757 0.00771499
 0.00805521 0.00784445 0.00804305 0.0080626 ]

mean value: 0.008038663864135742

key: score_time
value: [0.00817943 0.00850177 0.00837326 0.00846052 0.0085566  0.00857735
 0.00863457 0.00853038 0.00856733 0.00824285]

mean value: 0.00846240520477295

key: test_mcc
value: [0.44223199 0.57655666 0.15118579 0.59030128 0.26207121 0.45760432
 0.62469505 0.45760432 0.63745526 0.45523656]

mean value: 0.4654942424271857

key: train_mcc
value: [0.55751053 0.53112666 0.56707247 0.50578969 0.51581016 0.49290093
 0.50957787 0.48621959 0.48320568 0.56367163]

mean value: 0.5212885208195438

key: test_accuracy
value: [0.8        0.86666667 0.75555556 0.86666667 0.77777778 0.84444444
 0.88888889 0.84444444 0.88636364 0.84090909]

mean value: 0.8371717171717172

key: train_accuracy
value: [0.86352357 0.8560794  0.86600496 0.85111663 0.85359801 0.84367246
 0.84863524 0.84119107 0.84158416 0.86138614]

mean value: 0.8526791636980076

key: test_fscore
value: [0.57142857 0.625      0.26666667 0.66666667 0.375      0.53333333
 0.61538462 0.53333333 0.70588235 0.53333333]

mean value: 0.5426028873087696

key: train_fscore
value: [0.63087248 0.60810811 0.64       0.57746479 0.58741259 0.57718121
 0.59060403 0.57333333 0.56756757 0.64556962]

mean value: 0.5998113723527961

key: test_precision
value: [0.54545455 0.83333333 0.4        0.75       0.5        0.66666667
 1.         0.66666667 0.75       0.66666667]

mean value: 0.6778787878787879

key: train_precision
value: [0.734375   0.71428571 0.73846154 0.71929825 0.72413793 0.68253968
 0.6984127  0.671875   0.67741935 0.70833333]

mean value: 0.7069138498520194

key: test_recall
value: [0.6        0.5        0.2        0.6        0.3        0.44444444
 0.44444444 0.44444444 0.66666667 0.44444444]

mean value: 0.46444444444444444

key: train_recall
value: [0.55294118 0.52941176 0.56470588 0.48235294 0.49411765 0.5
 0.51162791 0.5        0.48837209 0.59302326]

mean value: 0.521655266757866

key: test_roc_auc
value: [0.72857143 0.73571429 0.55714286 0.77142857 0.60714286 0.69444444
 0.72222222 0.69444444 0.8047619  0.69365079]

mean value: 0.700952380952381

key: train_roc_auc
value: [0.74974103 0.736404   0.75562338 0.71601924 0.72190159 0.71845426
 0.7258455  0.71687697 0.71273951 0.76349276]

mean value: 0.7317098229311422

key: test_jcc
value: [0.4        0.45454545 0.15384615 0.5        0.23076923 0.36363636
 0.44444444 0.36363636 0.54545455 0.36363636]

mean value: 0.381996891996892

key: train_jcc
value: [0.46078431 0.4368932  0.47058824 0.40594059 0.41584158 0.40566038
 0.41904762 0.40186916 0.39622642 0.47663551]

mean value: 0.42894870155185705

MCC on Blind test: 0.34

Accuracy on Blind test: 0.92

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.008708   0.01181459 0.01200914 0.01360393 0.01108837 0.01205063
 0.01200819 0.01373696 0.01307154 0.01215887]

mean value: 0.012025022506713867

key: score_time
value: [0.00815749 0.01002741 0.00992107 0.0105021  0.01042247 0.0105381
 0.01050901 0.01046824 0.0104754  0.01043797]

mean value: 0.010145926475524902

key: test_mcc
value: [0.64465837 0.58434871 0.72069583 0.93541435 0.87142857 0.63936201
 0.85839508 0.80178373 0.70609879 0.86031746]

mean value: 0.7622502886917752

key: train_mcc
value: [0.92034122 0.7429756  0.87903746 0.90352995 0.86545712 0.87195445
 0.79224384 0.89514372 0.87964331 0.84825809]

mean value: 0.8598584753360916

key: test_accuracy
value: [0.86666667 0.86666667 0.88888889 0.97777778 0.95555556 0.88888889
 0.95555556 0.93333333 0.90909091 0.95454545]

mean value: 0.9196969696969697

key: train_accuracy
value: [0.97270471 0.91811414 0.96029777 0.96774194 0.9528536  0.95781638
 0.93300248 0.96526055 0.96039604 0.95049505]

mean value: 0.9538682652384345

key: test_fscore
value: [0.72727273 0.57142857 0.7826087  0.94736842 0.9        0.70588235
 0.875      0.84210526 0.71428571 0.88888889]

mean value: 0.7954840634679778

key: train_fscore
value: [0.93714286 0.76258993 0.90361446 0.92397661 0.89385475 0.89171975
 0.82580645 0.91666667 0.90361446 0.87654321]

mean value: 0.8835529131032591

key: test_precision
value: [0.66666667 1.         0.69230769 1.         0.9        0.75
 1.         0.8        1.         0.88888889]

mean value: 0.8697863247863248

key: train_precision
value: [0.91111111 0.98148148 0.92592593 0.91860465 0.85106383 0.98591549
 0.92753623 0.93902439 0.9375     0.93421053]

mean value: 0.931237364087004

key: test_recall
value: [0.8        0.4        0.9        0.9        0.9        0.66666667
 0.77777778 0.88888889 0.55555556 0.88888889]

mean value: 0.7677777777777778

key: train_recall
value: [0.96470588 0.62352941 0.88235294 0.92941176 0.94117647 0.81395349
 0.74418605 0.89534884 0.87209302 0.8255814 ]

mean value: 0.8492339261285909

key: test_roc_auc
value: [0.84285714 0.7        0.89285714 0.95       0.93571429 0.80555556
 0.88888889 0.91666667 0.77777778 0.93015873]

mean value: 0.864047619047619

key: train_roc_auc
value: [0.96977432 0.81019238 0.93174251 0.95369959 0.94857566 0.90539946
 0.86420659 0.93978798 0.92818488 0.90492906]

mean value: 0.9156492428889091

key: test_jcc
value: [0.57142857 0.4        0.64285714 0.9        0.81818182 0.54545455
 0.77777778 0.72727273 0.55555556 0.8       ]

mean value: 0.6738528138528139

key: train_jcc
value: [0.88172043 0.61627907 0.82417582 0.85869565 0.80808081 0.8045977
 0.7032967  0.84615385 0.82417582 0.78021978]

mean value: 0.7947395639301094

MCC on Blind test: 0.24

Accuracy on Blind test: 0.83

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01214361 0.01189399 0.01183558 0.0117414  0.01215601 0.01234984
 0.01391053 0.01211047 0.01208448 0.01138926]

mean value: 0.012161517143249511

key: score_time
value: [0.01047754 0.01044488 0.01046324 0.01056027 0.01050949 0.01049018
 0.01044488 0.01046228 0.01046324 0.0104301 ]

mean value: 0.010474610328674316

key: test_mcc
value: [0.56660974 0.66143783 0.75592895 0.87142857 0.86991767 0.55182541
 0.87904907 0.93541435 0.85775039 0.70370542]

mean value: 0.7653067401926541

key: train_mcc
value: [0.86239285 0.83069452 0.77202883 0.88457302 0.91366773 0.91009599
 0.90782821 0.91435935 0.83242511 0.80863263]

mean value: 0.8636698252023903

key: test_accuracy
value: [0.84444444 0.88888889 0.88888889 0.95555556 0.95555556 0.86666667
 0.95555556 0.97777778 0.95454545 0.90909091]

mean value: 0.9196969696969697

key: train_accuracy
value: [0.95533499 0.94540943 0.9057072  0.96029777 0.97022333 0.97022333
 0.96774194 0.97022333 0.94554455 0.93811881]

mean value: 0.9528824656659214

key: test_fscore
value: [0.66666667 0.66666667 0.8        0.9        0.88888889 0.625
 0.9        0.94736842 0.875      0.75      ]

mean value: 0.8019590643274854

key: train_fscore
value: [0.8875     0.85714286 0.81372549 0.90909091 0.93181818 0.92592593
 0.9273743  0.93258427 0.8625     0.83660131]

mean value: 0.8884263242702394

key: test_precision
value: [0.63636364 1.         0.66666667 0.9        1.         0.71428571
 0.81818182 0.9        1.         0.85714286]

mean value: 0.8492640692640693

key: train_precision
value: [0.94666667 0.95652174 0.69747899 0.87912088 0.9010989  0.98684211
 0.89247312 0.90217391 0.93243243 0.95522388]

mean value: 0.9050032627229174

key: test_recall
value: [0.7        0.5        1.         0.9        0.8        0.55555556
 1.         1.         0.77777778 0.66666667]

mean value: 0.79

key: train_recall
value: [0.83529412 0.77647059 0.97647059 0.94117647 0.96470588 0.87209302
 0.96511628 0.96511628 0.80232558 0.74418605]

mean value: 0.8842954856361149

key: test_roc_auc
value: [0.79285714 0.75       0.92857143 0.93571429 0.9        0.75
 0.97222222 0.98611111 0.88888889 0.81904762]

mean value: 0.8723412698412698

key: train_roc_auc
value: [0.91135775 0.88351831 0.93163152 0.95329264 0.968202   0.93446922
 0.96678527 0.96836256 0.89330116 0.86737604]

mean value: 0.9278296466729867

key: test_jcc
value: [0.5        0.5        0.66666667 0.81818182 0.8        0.45454545
 0.81818182 0.9        0.77777778 0.6       ]

mean value: 0.6835353535353536

key: train_jcc
value: [0.79775281 0.75       0.68595041 0.83333333 0.87234043 0.86206897
 0.86458333 0.87368421 0.75824176 0.71910112]

mean value: 0.8017056372291307

MCC on Blind test: 0.28

Accuracy on Blind test: 0.85

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10343218 0.0909369  0.08587861 0.08887339 0.0882163  0.08921981
 0.08564425 0.08561397 0.08643341 0.08570504]

mean value: 0.08899538516998291

key: score_time
value: [0.01581144 0.01447177 0.01518631 0.01536417 0.01544499 0.01527143
 0.01452279 0.01482272 0.01428127 0.01465106]

mean value: 0.014982795715332032

key: test_mcc
value: [0.79539491 0.93541435 0.88640526 0.86991767 0.80295507 0.63936201
 0.92998111 0.93541435 0.92962225 0.87831007]

mean value: 0.8602777044139022

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.97777778 0.95555556 0.95555556 0.93333333 0.88888889
 0.97777778 0.97777778 0.97727273 0.95454545]

mean value: 0.950959595959596

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.94736842 0.90909091 0.88888889 0.82352941 0.70588235
 0.94117647 0.94736842 0.94117647 0.9       ]

mean value: 0.8837814679300747

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 1.         0.83333333 1.         1.         0.75
 1.         0.9        1.         0.81818182]

mean value: 0.9015800865800866

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.9        1.         0.8        0.7        0.66666667
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.8844444444444445

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94285714 0.95       0.97142857 0.9        0.85       0.80555556
 0.94444444 0.98611111 0.94444444 0.97142857]

mean value: 0.9266269841269841

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.9        0.83333333 0.8        0.7        0.54545455
 0.88888889 0.9        0.88888889 0.81818182]

mean value: 0.7989033189033189

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.87

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.0306735  0.03133774 0.04422259 0.031358   0.03073716 0.03635859
 0.03435659 0.03395271 0.03568673 0.04747963]

mean value: 0.03561632633209229

key: score_time
value: [0.01692772 0.02699375 0.02704811 0.02176785 0.02640796 0.01848412
 0.02651548 0.02117252 0.03250337 0.03121567]

mean value: 0.02490365505218506

key: test_mcc
value: [0.79539491 0.87142857 0.88640526 0.86991767 0.93541435 0.72222222
 1.         0.63936201 0.92962225 0.86031746]

mean value: 0.851008470726259

key: train_mcc
value: [1.         0.98532572 0.97018128 0.98509064 1.         0.99266683
 0.9854476  0.97029022 0.96293777 0.97045488]

mean value: 0.9822394937971208

key: test_accuracy
value: [0.91111111 0.95555556 0.95555556 0.95555556 0.97777778 0.91111111
 1.         0.88888889 0.97727273 0.95454545]

mean value: 0.9487373737373738

key: train_accuracy
value: [1.         0.99503722 0.99007444 0.99503722 1.         0.99751861
 0.99503722 0.99007444 0.98762376 0.99009901]

mean value: 0.9940501928604771

key: test_fscore
value: [0.83333333 0.9        0.90909091 0.88888889 0.94736842 0.77777778
 1.         0.70588235 0.94117647 0.88888889]

mean value: 0.8792407042561842

key: train_fscore
value: [1.         0.98837209 0.97647059 0.98823529 1.         0.99421965
 0.98850575 0.97647059 0.97076023 0.97674419]

mean value: 0.9859778383881759

key: test_precision
value: [0.71428571 0.9        0.83333333 1.         1.         0.77777778
 1.         0.75       1.         0.88888889]

mean value: 0.8864285714285715

key: train_precision
value: [1.         0.97701149 0.97647059 0.98823529 1.         0.98850575
 0.97727273 0.98809524 0.97647059 0.97674419]

mean value: 0.9848805863382023

key: test_recall
value: [1.         0.9        1.         0.8        0.9        0.77777778
 1.         0.66666667 0.88888889 0.88888889]

mean value: 0.8822222222222222

key: train_recall
value: [1.         1.         0.97647059 0.98823529 1.         1.
 1.         0.96511628 0.96511628 0.97674419]

mean value: 0.9871682626538988

key: test_roc_auc
value: [0.94285714 0.93571429 0.97142857 0.9        0.95       0.86111111
 1.         0.80555556 0.94444444 0.93015873]

mean value: 0.9241269841269841

key: train_roc_auc
value: [1.         0.99685535 0.98509064 0.99254532 1.         0.99842271
 0.99684543 0.98098085 0.97941349 0.98522744]

mean value: 0.9915381221608284

key: test_jcc
value: [0.71428571 0.81818182 0.83333333 0.8        0.9        0.63636364
 1.         0.54545455 0.88888889 0.8       ]

mean value: 0.7936507936507937

key: train_jcc
value: [1.         0.97701149 0.95402299 0.97674419 1.         0.98850575
 0.97727273 0.95402299 0.94318182 0.95454545]

mean value: 0.9725307404437317

MCC on Blind test: 0.25

Accuracy on Blind test: 0.87

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.0795815  0.11382031 0.06929111 0.11211514 0.07584715 0.13550496
 0.15576649 0.17540956 0.1215744  0.10611081]

mean value: 0.11450214385986328

key: score_time
value: [0.0121398  0.01215172 0.0121305  0.01206064 0.01252985 0.02870059
 0.02875829 0.01892948 0.01877236 0.01204371]

mean value: 0.016821694374084473

key: test_mcc
value: [0.15118579 0.28571429 0.39652234 0.39652234 0.58434871 0.53452248
 0.53452248 0.42947785 0.62360956 0.3099003 ]

mean value: 0.4246326143681106

key: train_mcc
value: [0.75122811 0.7429756  0.76894596 0.72633485 0.77573617 0.77127576
 0.74561704 0.77797744 0.7553134  0.77005492]

mean value: 0.7585459249101077

key: test_accuracy
value: [0.75555556 0.8        0.82222222 0.82222222 0.86666667 0.86666667
 0.86666667 0.84444444 0.88636364 0.81818182]

mean value: 0.8348989898989899

key: train_accuracy
value: [0.92059553 0.91811414 0.92555831 0.91315136 0.9280397  0.92555831
 0.91811414 0.9280397  0.92079208 0.92574257]

mean value: 0.9223705869346239

key: test_fscore
value: [0.26666667 0.30769231 0.42857143 0.42857143 0.57142857 0.5
 0.5        0.46153846 0.61538462 0.33333333]

mean value: 0.4413186813186813

key: train_fscore
value: [0.77142857 0.76258993 0.78571429 0.74452555 0.7972028  0.78873239
 0.76595745 0.8        0.77142857 0.79166667]

mean value: 0.777924620911841

key: test_precision
value: [0.4        0.66666667 0.75       0.75       1.         1.
 1.         0.75       1.         0.66666667]

mean value: 0.7983333333333333

key: train_precision
value: [0.98181818 0.98148148 1.         0.98076923 0.98275862 1.
 0.98181818 0.98305085 1.         0.98275862]

mean value: 0.9874455164724013

key: test_recall
value: [0.2        0.2        0.3        0.3        0.4        0.33333333
 0.33333333 0.33333333 0.44444444 0.22222222]

mean value: 0.30666666666666664

key: train_recall
value: [0.63529412 0.62352941 0.64705882 0.6        0.67058824 0.65116279
 0.62790698 0.6744186  0.62790698 0.6627907 ]

mean value: 0.6420656634746922

key: test_roc_auc
value: [0.55714286 0.58571429 0.63571429 0.63571429 0.7        0.66666667
 0.66666667 0.65277778 0.72222222 0.5968254 ]

mean value: 0.6419444444444444

key: train_roc_auc
value: [0.81607473 0.81019238 0.82352941 0.79842767 0.83372179 0.8255814
 0.8123762  0.83563202 0.81395349 0.82982302]

mean value: 0.8199312108020843

key: test_jcc
value: [0.15384615 0.18181818 0.27272727 0.27272727 0.4        0.33333333
 0.33333333 0.3        0.44444444 0.2       ]

mean value: 0.2892229992229992

key: train_jcc
value: [0.62790698 0.61627907 0.64705882 0.59302326 0.6627907  0.65116279
 0.62068966 0.66666667 0.62790698 0.65517241]

mean value: 0.6368657326603456

MCC on Blind test: 0.53

Accuracy on Blind test: 0.96

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.20419002 0.19745517 0.19840646 0.2032311  0.19804025 0.19976878
 0.19917703 0.19644904 0.19782686 0.20109701]

mean value: 0.19956417083740235

key: score_time
value: [0.0086832  0.00846052 0.00847912 0.00850511 0.00864482 0.00882816
 0.00845933 0.00858998 0.00887918 0.00864172]

mean value: 0.008617115020751954

key: test_mcc
value: [0.79539491 0.87142857 0.88640526 0.86991767 0.80178373 0.63936201
 0.92998111 0.80178373 0.92962225 0.87831007]

mean value: 0.8403989305108202

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.95555556 0.95555556 0.95555556 0.93333333 0.88888889
 0.97777778 0.93333333 0.97727273 0.95454545]

mean value: 0.9442929292929293

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.9        0.90909091 0.88888889 0.84210526 0.70588235
 0.94117647 0.84210526 0.94117647 0.9       ]

mean value: 0.8703758951746567

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.9        0.83333333 1.         0.88888889 0.75
 1.         0.8        1.         0.81818182]

mean value: 0.8704689754689755

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.9        1.         0.8        0.8        0.66666667
 0.88888889 0.88888889 0.88888889 1.        ]

mean value: 0.8833333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94285714 0.93571429 0.97142857 0.9        0.88571429 0.80555556
 0.94444444 0.91666667 0.94444444 0.97142857]

mean value: 0.9218253968253968

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.81818182 0.83333333 0.8        0.72727273 0.54545455
 0.88888889 0.72727273 0.88888889 0.81818182]

mean value: 0.7761760461760462

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.86

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01316524 0.01303029 0.01386619 0.01346922 0.01556659 0.01349592
 0.01358891 0.01323795 0.01327229 0.01385522]

mean value: 0.013654780387878419

key: score_time
value: [0.01129103 0.01097775 0.01097012 0.01095533 0.01109576 0.01304388
 0.01086783 0.01356649 0.01312065 0.01090169]

mean value: 0.01167905330657959

key: test_mcc
value: [0.41931393 0.39652234 0.22857143 0.58434871 0.28571429 0.35355339
 0.2941742  0.16174916 0.63745526 0.3099003 ]

mean value: 0.36713030130627083

key: train_mcc
value: [0.83864579 0.87027877 0.84093096 0.65974353 0.67151674 0.74002844
 0.72151646 0.77656964 0.87964331 0.73577   ]

mean value: 0.7734643657272753

key: test_accuracy
value: [0.82222222 0.82222222 0.73333333 0.86666667 0.8        0.82222222
 0.8        0.8        0.88636364 0.81818182]

mean value: 0.8171212121212121

key: train_accuracy
value: [0.94789082 0.95781638 0.94540943 0.89330025 0.89826303 0.89826303
 0.91066998 0.9280397  0.96039604 0.91584158]

mean value: 0.9255890229221433

key: test_fscore
value: [0.5        0.42857143 0.4        0.57142857 0.30769231 0.42857143
 0.4        0.18181818 0.70588235 0.33333333]

mean value: 0.4257297604356428

key: train_fscore
value: [0.86451613 0.89440994 0.875      0.66141732 0.70921986 0.79396985
 0.7721519  0.81528662 0.90361446 0.77922078]

mean value: 0.8068806857147465

key: test_precision
value: [0.66666667 0.75       0.4        1.         0.66666667 0.6
 0.5        0.5        0.75       0.66666667]

mean value: 0.65

key: train_precision
value: [0.95714286 0.94736842 0.84615385 1.         0.89285714 0.69911504
 0.84722222 0.90140845 0.9375     0.88235294]

mean value: 0.8911120925557183

key: test_recall
value: [0.4        0.3        0.4        0.4        0.2        0.33333333
 0.33333333 0.11111111 0.66666667 0.22222222]

mean value: 0.33666666666666667

key: train_recall
value: [0.78823529 0.84705882 0.90588235 0.49411765 0.58823529 0.91860465
 0.70930233 0.74418605 0.87209302 0.69767442]

mean value: 0.7565389876880985

key: test_roc_auc
value: [0.67142857 0.63571429 0.61428571 0.7        0.58571429 0.63888889
 0.625      0.54166667 0.8047619  0.5968254 ]

mean value: 0.6414285714285715

key: train_roc_auc
value: [0.88940067 0.9172401  0.9309286  0.74705882 0.78468368 0.90567457
 0.83730101 0.86105201 0.92818488 0.83625859]

mean value: 0.8637782929234691

key: test_jcc
value: [0.33333333 0.27272727 0.25       0.4        0.18181818 0.27272727
 0.25       0.1        0.54545455 0.2       ]

mean value: 0.2806060606060606

key: train_jcc
value: [0.76136364 0.80898876 0.77777778 0.49411765 0.54945055 0.65833333
 0.62886598 0.68817204 0.82417582 0.63829787]

mean value: 0.682954342693751

MCC on Blind test: 0.34

Accuracy on Blind test: 0.85

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01960707 0.02915788 0.02913284 0.02917004 0.02329302 0.02906346
 0.03248668 0.02917218 0.02937102 0.02931643]

mean value: 0.02797706127166748

key: score_time
value: [0.01943707 0.01073003 0.01065588 0.02089858 0.02081704 0.01065969
 0.01883626 0.01075506 0.02131557 0.02014804]

mean value: 0.016425323486328126

key: test_mcc
value: [0.56660974 0.73379939 0.74285714 0.93541435 0.86991767 0.63936201
 0.85839508 0.86111111 0.78360391 0.86031746]

mean value: 0.7851387860068911

key: train_mcc
value: [0.83918085 0.83252135 0.83979823 0.83252135 0.81720003 0.84123675
 0.83401533 0.83401533 0.84212687 0.85010878]

mean value: 0.8362724873808268

key: test_accuracy
value: [0.84444444 0.91111111 0.91111111 0.97777778 0.95555556 0.88888889
 0.95555556 0.95555556 0.93181818 0.95454545]

mean value: 0.9286363636363636

key: train_accuracy
value: [0.94789082 0.94540943 0.94789082 0.94540943 0.94044665 0.94789082
 0.94540943 0.94540943 0.9480198  0.95049505]

mean value: 0.9464271675306488

key: test_fscore
value: [0.66666667 0.75       0.8        0.94736842 0.88888889 0.70588235
 0.875      0.88888889 0.8        0.88888889]

mean value: 0.8211584107327141

key: train_fscore
value: [0.86956522 0.86585366 0.87116564 0.86585366 0.85365854 0.87272727
 0.86746988 0.86746988 0.8742515  0.88095238]

mean value: 0.8688967624943407

key: test_precision
value: [0.63636364 1.         0.8        1.         1.         0.75
 1.         0.88888889 1.         0.88888889]

mean value: 0.8964141414141414

key: train_precision
value: [0.92105263 0.89873418 0.91025641 0.89873418 0.88607595 0.91139241
 0.9        0.9        0.90123457 0.90243902]

mean value: 0.9029919342987596

key: test_recall
value: [0.7        0.6        0.8        0.9        0.8        0.66666667
 0.77777778 0.88888889 0.66666667 0.88888889]

mean value: 0.7688888888888888

key: train_recall
value: [0.82352941 0.83529412 0.83529412 0.83529412 0.82352941 0.8372093
 0.8372093  0.8372093  0.84883721 0.86046512]

mean value: 0.8373871409028728

key: test_roc_auc
value: [0.79285714 0.8        0.87142857 0.95       0.9        0.80555556
 0.88888889 0.93055556 0.83333333 0.93015873]

mean value: 0.8702777777777777

key: train_roc_auc
value: [0.90233074 0.90506844 0.90664077 0.90506844 0.89761376 0.90756364
 0.90598635 0.90598635 0.91183999 0.91765394]

mean value: 0.9065752441613346

key: test_jcc
value: [0.5        0.6        0.66666667 0.9        0.8        0.54545455
 0.77777778 0.8        0.66666667 0.8       ]

mean value: 0.7056565656565656

key: train_jcc
value: [0.76923077 0.76344086 0.77173913 0.76344086 0.74468085 0.77419355
 0.76595745 0.76595745 0.77659574 0.78723404]

mean value: 0.768247070039765

MCC on Blind test: 0.29

Accuracy on Blind test: 0.85

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.19244957 0.19756484 0.1905818  0.18681002 0.18788815 0.18729663
 0.18735313 0.18713069 0.18840814 0.19011474]

mean value: 0.18955976963043214

key: score_time
value: [0.01917291 0.0214653  0.01095438 0.01946831 0.0198698  0.01965523
 0.02010727 0.02099466 0.0204978  0.02179575]

mean value: 0.019398140907287597

key: test_mcc
value: [0.56660974 0.80295507 0.81536524 0.93541435 0.86991767 0.63936201
 0.85839508 0.93541435 0.78360391 0.86031746]

mean value: 0.806735487504937

key: train_mcc
value: [0.86316397 0.87183415 0.87183415 0.83252135 0.81720003 0.88082743
 0.84202517 0.85797496 0.84212687 0.86600555]

mean value: 0.8545513633312516

key: test_accuracy
value: [0.84444444 0.93333333 0.93333333 0.97777778 0.95555556 0.88888889
 0.95555556 0.97777778 0.93181818 0.95454545]

mean value: 0.9353030303030303

key: train_accuracy
value: [0.95533499 0.95781638 0.95781638 0.94540943 0.94044665 0.96029777
 0.94789082 0.9528536  0.9480198  0.95544554]

mean value: 0.9521331351497433

key: test_fscore
value: [0.66666667 0.82352941 0.85714286 0.94736842 0.88888889 0.70588235
 0.875      0.94736842 0.8        0.88888889]

mean value: 0.8400735908398448

key: train_fscore
value: [0.8902439  0.89820359 0.89820359 0.86585366 0.85365854 0.90588235
 0.8742515  0.88757396 0.8742515  0.89411765]

mean value: 0.8842240241698736

key: test_precision
value: [0.63636364 1.         0.81818182 1.         1.         0.75
 1.         0.9        1.         0.88888889]

mean value: 0.8993434343434343

key: train_precision
value: [0.92405063 0.91463415 0.91463415 0.89873418 0.88607595 0.91666667
 0.90123457 0.90361446 0.90123457 0.9047619 ]

mean value: 0.9065641217238963

key: test_recall
value: [0.7        0.7        0.9        0.9        0.8        0.66666667
 0.77777778 1.         0.66666667 0.88888889]

mean value: 0.8

key: train_recall
value: [0.85882353 0.88235294 0.88235294 0.83529412 0.82352941 0.89534884
 0.84883721 0.87209302 0.84883721 0.88372093]

mean value: 0.8631190150478796

key: test_roc_auc
value: [0.79285714 0.85       0.92142857 0.95       0.9        0.80555556
 0.88888889 0.98611111 0.83333333 0.93015873]

mean value: 0.8858333333333333

key: train_roc_auc
value: [0.9199778  0.93017018 0.93017018 0.90506844 0.89761376 0.93663341
 0.91180031 0.92342822 0.91183999 0.92928185]

mean value: 0.9195984139382406

key: test_jcc
value: [0.5        0.7        0.75       0.9        0.8        0.54545455
 0.77777778 0.9        0.66666667 0.8       ]

mean value: 0.733989898989899

key: train_jcc
value: [0.8021978  0.81521739 0.81521739 0.76344086 0.74468085 0.82795699
 0.77659574 0.79787234 0.77659574 0.80851064]

mean value: 0.79282857534178

MCC on Blind test: 0.28

Accuracy on Blind test: 0.85

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.04934955 0.02539492 0.02788877 0.02606297 0.02528405 0.02800322
 0.02623415 0.02650237 0.02881813 0.02588701]

mean value: 0.028942513465881347

key: score_time
value: [0.01096201 0.01063871 0.01081228 0.01083779 0.01075006 0.01077032
 0.01072264 0.01072216 0.01076746 0.01077724]

mean value: 0.010776066780090332

key: test_mcc
value: [0.83214239 0.88862624 0.94365079 0.91587302 0.9451949  0.83095238
 0.85749293 0.91766294 0.82992752 0.74560114]

mean value: 0.8707124227917115

key: train_mcc
value: [0.90236595 0.88663261 0.88350545 0.90558532 0.90567611 0.90236595
 0.90573203 0.89315242 0.90567829 0.89644363]

mean value: 0.8987137772379575

key: test_accuracy
value: [0.91549296 0.94366197 0.97183099 0.95774648 0.97183099 0.91549296
 0.92857143 0.95714286 0.91428571 0.87142857]

mean value: 0.934748490945674

key: train_accuracy
value: [0.9511811  0.94330709 0.94173228 0.95275591 0.95275591 0.9511811
 0.95283019 0.94654088 0.95283019 0.94811321]

mean value: 0.9493227851235577

key: test_fscore
value: [0.91891892 0.94594595 0.97222222 0.95774648 0.97222222 0.91428571
 0.92753623 0.95890411 0.91666667 0.86567164]

mean value: 0.9350120152399073

key: train_fscore
value: [0.95102686 0.94339623 0.94191523 0.95253165 0.95238095 0.95133438
 0.953125   0.94620253 0.95268139 0.94867807]

mean value: 0.9493272279338961

key: test_precision
value: [0.89473684 0.92105263 0.97222222 0.94444444 0.94594595 0.91428571
 0.94117647 0.92105263 0.89189189 0.90625   ]

mean value: 0.9253058794641612

key: train_precision
value: [0.95253165 0.94043887 0.9375     0.95859873 0.96153846 0.94984326
 0.94720497 0.9522293  0.9556962  0.93846154]

mean value: 0.9494042974184514

key: test_recall
value: [0.94444444 0.97222222 0.97222222 0.97142857 1.         0.91428571
 0.91428571 1.         0.94285714 0.82857143]

mean value: 0.946031746031746

key: train_recall
value: [0.94952681 0.94637224 0.94637224 0.94654088 0.94339623 0.95283019
 0.9591195  0.94025157 0.94968553 0.9591195 ]

mean value: 0.949321468960181

key: test_roc_auc
value: [0.91507937 0.94325397 0.9718254  0.95793651 0.97222222 0.91547619
 0.92857143 0.95714286 0.91428571 0.87142857]

mean value: 0.9347222222222222

key: train_roc_auc
value: [0.9511785  0.94331191 0.94173958 0.95276571 0.95277067 0.9511785
 0.95283019 0.94654088 0.95283019 0.94811321]

mean value: 0.9493259329801798

key: test_jcc
value: [0.85       0.8974359  0.94594595 0.91891892 0.94594595 0.84210526
 0.86486486 0.92105263 0.84615385 0.76315789]

mean value: 0.8795581208739104

key: train_jcc
value: [0.90662651 0.89285714 0.89020772 0.90936556 0.90909091 0.90718563
 0.91044776 0.8978979  0.90963855 0.90236686]

mean value: 0.9035684537974702

MCC on Blind test: 0.27

Accuracy on Blind test: 0.81

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.76378393 0.85567832 0.79737496 0.71720099 0.86780381 0.74783587
 0.76583719 0.92471051 0.7560215  0.84938383]

mean value: 0.8045630931854248

key: score_time
value: [0.01349831 0.01428533 0.01490283 0.01481152 0.01480985 0.01473713
 0.01456404 0.01478338 0.01481318 0.01483154]

mean value: 0.014603710174560547

key: test_mcc
value: [0.88730159 0.88862624 0.97222222 0.88730159 0.91885703 0.88862624
 0.94440028 1.         0.88571429 0.860309  ]

mean value: 0.9133358474476411

key: train_mcc
value: [0.94962551 0.94962551 0.94646152 0.95276028 0.94962452 0.94649802
 0.94968553 0.94025622 0.95912424 0.9528349 ]

mean value: 0.9496496257416195

key: test_accuracy
value: [0.94366197 0.94366197 0.98591549 0.94366197 0.95774648 0.94366197
 0.97142857 1.         0.94285714 0.92857143]

mean value: 0.9561167002012072

key: train_accuracy
value: [0.97480315 0.97480315 0.97322835 0.97637795 0.97480315 0.97322835
 0.97484277 0.97012579 0.97955975 0.97641509]

mean value: 0.9748187490714604

key: test_fscore
value: [0.94444444 0.94594595 0.98591549 0.94285714 0.95890411 0.94117647
 0.97222222 1.         0.94285714 0.92537313]

mean value: 0.955969610579028

key: train_fscore
value: [0.97484277 0.97484277 0.97322835 0.97645212 0.97492163 0.97339593
 0.97484277 0.97017268 0.97959184 0.97645212]

mean value: 0.9748742969391556

key: test_precision
value: [0.94444444 0.92105263 1.         0.94285714 0.92105263 0.96969697
 0.94594595 1.         0.94285714 0.96875   ]

mean value: 0.955665690895954

key: train_precision
value: [0.97178683 0.97178683 0.97169811 0.97492163 0.971875   0.96884735
 0.97484277 0.96865204 0.97805643 0.97492163]

mean value: 0.9727388624377596

key: test_recall
value: [0.94444444 0.97222222 0.97222222 0.94285714 1.         0.91428571
 1.         1.         0.94285714 0.88571429]

mean value: 0.9574603174603175

key: train_recall
value: [0.97791798 0.97791798 0.97476341 0.97798742 0.97798742 0.97798742
 0.97484277 0.97169811 0.98113208 0.97798742]

mean value: 0.9770222010594608

key: test_roc_auc
value: [0.94365079 0.94325397 0.98611111 0.94365079 0.95833333 0.94325397
 0.97142857 1.         0.94285714 0.92857143]

mean value: 0.9561111111111111

key: train_roc_auc
value: [0.97480805 0.97480805 0.97323076 0.97637541 0.97479813 0.97322084
 0.97484277 0.97012579 0.97955975 0.97641509]

mean value: 0.9748184631867151

key: test_jcc
value: [0.89473684 0.8974359  0.97222222 0.89189189 0.92105263 0.88888889
 0.94594595 1.         0.89189189 0.86111111]

mean value: 0.916517732307206

key: train_jcc
value: [0.95092025 0.95092025 0.94785276 0.95398773 0.95107034 0.94817073
 0.95092025 0.94207317 0.96       0.95398773]

mean value: 0.9509903195885676

MCC on Blind test: 0.24

Accuracy on Blind test: 0.81

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.02257991 0.00823998 0.0080502  0.00808787 0.00771427 0.00770688
 0.00799084 0.00788474 0.00776696 0.00769186]

mean value: 0.009371352195739747

key: score_time
value: [0.01607466 0.00837326 0.00837708 0.00820518 0.00806141 0.00812292
 0.00803041 0.00804925 0.00794697 0.008039  ]

mean value: 0.008928012847900391

key: test_mcc
value: [0.76074845 0.63412698 0.80301852 0.72329377 0.75442414 0.7468254
 0.77651637 0.54374562 0.80295507 0.66701701]

mean value: 0.7212671319076379

key: train_mcc
value: [0.77922929 0.78378281 0.74804396 0.75865264 0.72015797 0.76695184
 0.75429609 0.75178056 0.73914559 0.71365299]

mean value: 0.7515693753369681

key: test_accuracy
value: [0.87323944 0.81690141 0.90140845 0.85915493 0.87323944 0.87323944
 0.88571429 0.77142857 0.9        0.82857143]

mean value: 0.8582897384305835

key: train_accuracy
value: [0.88818898 0.89133858 0.87244094 0.87716535 0.85826772 0.88188976
 0.87578616 0.87421384 0.86792453 0.8490566 ]

mean value: 0.8736272470658148

key: test_fscore
value: [0.88607595 0.81690141 0.90410959 0.86486486 0.88       0.87323944
 0.89189189 0.77777778 0.90410959 0.84210526]

mean value: 0.8641075770212132

key: train_fscore
value: [0.89258699 0.88816856 0.87782805 0.88358209 0.86526946 0.88721805
 0.88084465 0.87987988 0.87387387 0.86324786]

mean value: 0.8792499459540104

key: test_precision
value: [0.81395349 0.82857143 0.89189189 0.82051282 0.825      0.86111111
 0.84615385 0.75675676 0.86842105 0.7804878 ]

mean value: 0.8292860200879576

key: train_precision
value: [0.85755814 0.91333333 0.84104046 0.84090909 0.82571429 0.85014409
 0.84637681 0.84195402 0.8362069  0.7890625 ]

mean value: 0.8442299635272792

key: test_recall
value: [0.97222222 0.80555556 0.91666667 0.91428571 0.94285714 0.88571429
 0.94285714 0.8        0.94285714 0.91428571]

mean value: 0.9037301587301587

key: train_recall
value: [0.93059937 0.86435331 0.91798107 0.93081761 0.90880503 0.92767296
 0.91823899 0.92138365 0.91509434 0.95283019]

mean value: 0.9187776521238815

key: test_roc_auc
value: [0.8718254  0.81706349 0.90119048 0.85992063 0.87420635 0.8734127
 0.88571429 0.77142857 0.9        0.82857143]

mean value: 0.8583333333333333

key: train_roc_auc
value: [0.88825566 0.89129615 0.87251255 0.87708073 0.858188   0.88181755
 0.87578616 0.87421384 0.86792453 0.8490566 ]

mean value: 0.8736131777870365

key: test_jcc
value: [0.79545455 0.69047619 0.825      0.76190476 0.78571429 0.775
 0.80487805 0.63636364 0.825      0.72727273]

mean value: 0.7627064195966635

key: train_jcc
value: [0.80601093 0.79883382 0.78225806 0.79144385 0.76253298 0.7972973
 0.78706199 0.78552279 0.776      0.7593985 ]

mean value: 0.7846360220868399

MCC on Blind test: 0.23

Accuracy on Blind test: 0.73

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00863671 0.00801086 0.00807619 0.00798845 0.0079236  0.007936
 0.00849605 0.00788641 0.00797105 0.0078764 ]

mean value: 0.008080172538757324

key: score_time
value: [0.00854659 0.00803828 0.00822711 0.00798297 0.00817943 0.00802064
 0.00861883 0.00801945 0.00802946 0.00808167]

mean value: 0.008174443244934082

key: test_mcc
value: [0.67079854 0.71825397 0.54972312 0.70470171 0.6153057  0.6473892
 0.6350853  0.66701701 0.66701701 0.57166195]

mean value: 0.6446953514858611

key: train_mcc
value: [0.66188316 0.6530534  0.67906111 0.66043489 0.67030115 0.66562282
 0.66739685 0.66332496 0.66391373 0.68010917]

mean value: 0.6665101243381613

key: test_accuracy
value: [0.83098592 0.85915493 0.77464789 0.84507042 0.8028169  0.81690141
 0.81428571 0.82857143 0.82857143 0.78571429]

mean value: 0.818672032193159

key: train_accuracy
value: [0.82677165 0.82362205 0.83622047 0.82677165 0.83149606 0.82992126
 0.83018868 0.82861635 0.82861635 0.83647799]

mean value: 0.8298702520675482

key: test_fscore
value: [0.84615385 0.86111111 0.78378378 0.85714286 0.81578947 0.83116883
 0.82666667 0.84210526 0.84210526 0.78873239]

mean value: 0.8294759490393293

key: train_fscore
value: [0.83918129 0.83431953 0.84660767 0.83870968 0.84333821 0.84070796
 0.84164223 0.83946981 0.83994126 0.84750733]

mean value: 0.8411424970085409

key: test_precision
value: [0.78571429 0.86111111 0.76315789 0.78571429 0.75609756 0.76190476
 0.775      0.7804878  0.7804878  0.77777778]

mean value: 0.7827453287690772

key: train_precision
value: [0.78201635 0.78551532 0.79501385 0.78571429 0.7890411  0.79166667
 0.78846154 0.78947368 0.78787879 0.79395604]

mean value: 0.7888737622301876

key: test_recall
value: [0.91666667 0.86111111 0.80555556 0.94285714 0.88571429 0.91428571
 0.88571429 0.91428571 0.91428571 0.8       ]

mean value: 0.8840476190476191

key: train_recall
value: [0.90536278 0.88958991 0.90536278 0.89937107 0.90566038 0.89622642
 0.90251572 0.89622642 0.89937107 0.90880503]

mean value: 0.900849155804218

key: test_roc_auc
value: [0.8297619  0.85912698 0.77420635 0.84642857 0.80396825 0.81825397
 0.81428571 0.82857143 0.82857143 0.78571429]

mean value: 0.8188888888888889

key: train_roc_auc
value: [0.82689522 0.82372577 0.83632919 0.82665714 0.83137908 0.82981668
 0.83018868 0.82861635 0.82861635 0.83647799]

mean value: 0.8298702458187013

key: test_jcc
value: [0.73333333 0.75609756 0.64444444 0.75       0.68888889 0.71111111
 0.70454545 0.72727273 0.72727273 0.65116279]

mean value: 0.7094129038541971

key: train_jcc
value: [0.72292191 0.71573604 0.73401535 0.72222222 0.72911392 0.72519084
 0.72658228 0.72335025 0.72405063 0.73536896]

mean value: 0.7258552408145388

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00882816 0.00808144 0.008111   0.00824952 0.00814581 0.00827622
 0.00815296 0.00838614 0.00819111 0.00840569]

mean value: 0.008282804489135742

key: score_time
value: [0.0142138  0.01573038 0.01161218 0.01225567 0.01171994 0.01176977
 0.0117259  0.01164389 0.01177073 0.01233864]

mean value: 0.012478089332580567

key: test_mcc
value: [0.71917468 0.64082051 0.6656213  0.80588933 0.64082051 0.71825397
 0.6350853  0.69985421 0.71545476 0.71545476]

mean value: 0.6956429316837078

key: train_mcc
value: [0.83074055 0.84991022 0.80531087 0.81373251 0.8080633  0.81862293
 0.83183549 0.81660412 0.81537301 0.80994411]

mean value: 0.8200137110997675

key: test_accuracy
value: [0.85915493 0.81690141 0.83098592 0.90140845 0.81690141 0.85915493
 0.81428571 0.84285714 0.85714286 0.85714286]

mean value: 0.8455935613682093

key: train_accuracy
value: [0.91496063 0.92440945 0.9023622  0.90551181 0.90393701 0.90866142
 0.91509434 0.9072327  0.9072327  0.90408805]

mean value: 0.9093490318427178

key: test_fscore
value: [0.86486486 0.80597015 0.84210526 0.90410959 0.82666667 0.85714286
 0.82666667 0.85714286 0.86111111 0.85294118]

mean value: 0.8498721201518333

key: train_fscore
value: [0.91666667 0.92615385 0.90402477 0.90936556 0.90513219 0.91131498
 0.91768293 0.91047041 0.9093702  0.90715373]

mean value: 0.9117335282395541

key: test_precision
value: [0.84210526 0.87096774 0.8        0.86842105 0.775      0.85714286
 0.775      0.78571429 0.83783784 0.87878788]

mean value: 0.8290976917207817

key: train_precision
value: [0.89728097 0.9039039  0.88753799 0.875      0.89538462 0.88690476
 0.89053254 0.8797654  0.88888889 0.87905605]

mean value: 0.8884255118241281

key: test_recall
value: [0.88888889 0.75       0.88888889 0.94285714 0.88571429 0.85714286
 0.88571429 0.94285714 0.88571429 0.82857143]

mean value: 0.8756349206349207

key: train_recall
value: [0.93690852 0.94952681 0.92113565 0.94654088 0.91509434 0.93710692
 0.94654088 0.94339623 0.93081761 0.93710692]

mean value: 0.9364174751502887

key: test_roc_auc
value: [0.85873016 0.81785714 0.83015873 0.90198413 0.81785714 0.85912698
 0.81428571 0.84285714 0.85714286 0.85714286]

mean value: 0.8457142857142856

key: train_roc_auc
value: [0.91499514 0.92444894 0.90239172 0.9054471  0.90391941 0.90861655
 0.91509434 0.9072327  0.9072327  0.90408805]

mean value: 0.9093466658730631

key: test_jcc
value: [0.76190476 0.675      0.72727273 0.825      0.70454545 0.75
 0.70454545 0.75       0.75609756 0.74358974]

mean value: 0.7397955702833752

key: train_jcc
value: [0.84615385 0.86246418 0.82485876 0.83379501 0.82670455 0.83707865
 0.84788732 0.8356546  0.83380282 0.83008357]

mean value: 0.8378483299992395

MCC on Blind test: 0.26

Accuracy on Blind test: 0.76

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01908469 0.01650476 0.01644874 0.01647878 0.01608276 0.01604009
 0.0166142  0.01631188 0.01670742 0.01612663]

mean value: 0.01663999557495117

key: score_time
value: [0.00963759 0.00929117 0.00938487 0.00942397 0.00930023 0.00936937
 0.00929046 0.00934219 0.00984812 0.00929856]

mean value: 0.009418654441833495

key: test_mcc
value: [0.77565853 0.83095238 0.83214239 0.88880092 0.89315217 0.88730159
 0.80032673 0.80829038 0.8340361  0.74316054]

mean value: 0.8293821713525972

key: train_mcc
value: [0.88047545 0.88998365 0.88047545 0.88357096 0.88033094 0.88381426
 0.88065992 0.88680999 0.88368712 0.89658557]

mean value: 0.8846393313783368

key: test_accuracy
value: [0.88732394 0.91549296 0.91549296 0.94366197 0.94366197 0.94366197
 0.9        0.9        0.91428571 0.87142857]

mean value: 0.9135010060362173

key: train_accuracy
value: [0.94015748 0.94488189 0.94015748 0.94173228 0.94015748 0.94173228
 0.94025157 0.94339623 0.9418239  0.94811321]

mean value: 0.9422403803298173

key: test_fscore
value: [0.89189189 0.91666667 0.91891892 0.94444444 0.94594595 0.94285714
 0.90140845 0.90666667 0.91891892 0.86956522]

mean value: 0.9157284264406126

key: train_fscore
value: [0.940625   0.94539782 0.940625   0.94227769 0.94043887 0.94263566
 0.94080997 0.94357367 0.94209703 0.94883721]

mean value: 0.9427317909873709

key: test_precision
value: [0.86842105 0.91666667 0.89473684 0.91891892 0.8974359  0.94285714
 0.88888889 0.85       0.87179487 0.88235294]

mean value: 0.8932073222475699

key: train_precision
value: [0.93188854 0.93518519 0.93188854 0.93498452 0.9375     0.92966361
 0.93209877 0.940625   0.9376947  0.93577982]

mean value: 0.9347308689650702

key: test_recall
value: [0.91666667 0.91666667 0.94444444 0.97142857 1.         0.94285714
 0.91428571 0.97142857 0.97142857 0.85714286]

mean value: 0.9406349206349206

key: train_recall
value: [0.94952681 0.95583596 0.94952681 0.94968553 0.94339623 0.95597484
 0.94968553 0.94654088 0.94654088 0.96226415]

mean value: 0.9508977640219828

key: test_roc_auc
value: [0.88690476 0.91547619 0.91507937 0.94404762 0.94444444 0.94365079
 0.9        0.9        0.91428571 0.87142857]

mean value: 0.913531746031746

key: train_roc_auc
value: [0.94017221 0.94489911 0.94017221 0.94171974 0.94015237 0.94170982
 0.94025157 0.94339623 0.9418239  0.94811321]

mean value: 0.9422410372398469

key: test_jcc
value: [0.80487805 0.84615385 0.85       0.89473684 0.8974359  0.89189189
 0.82051282 0.82926829 0.85       0.76923077]

mean value: 0.8454108408793903

key: train_jcc
value: [0.8879056  0.8964497  0.8879056  0.89085546 0.88757396 0.8914956
 0.88823529 0.89317507 0.89053254 0.90265487]

mean value: 0.8916783716415699

MCC on Blind test: 0.32

Accuracy on Blind test: 0.84

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.91041517 1.81590009 1.9913187  1.93741298 1.92132735 1.84885836
 1.98798418 2.0122838  1.8185308  1.97723055]

mean value: 1.9221261978149413

key: score_time
value: [0.01374125 0.01442313 0.01120591 0.01360011 0.01358199 0.01433253
 0.01412654 0.01376438 0.02158213 0.01124787]

mean value: 0.014160585403442384

key: test_mcc
value: [0.94365079 0.86205133 0.9451949  0.91587302 0.91885703 0.91580648
 0.91465912 0.94285714 0.91465912 0.860309  ]

mean value: 0.9133917944367449

key: train_mcc
value: [0.99372055 0.99372055 0.99372055 1.         0.99372043 0.99372043
 0.99371069 0.99686027 0.99373035 0.99373035]

mean value: 0.9946634157813034

key: test_accuracy
value: [0.97183099 0.92957746 0.97183099 0.95774648 0.95774648 0.95774648
 0.95714286 0.97142857 0.95714286 0.92857143]

mean value: 0.9560764587525151

key: train_accuracy
value: [0.99685039 0.99685039 0.99685039 1.         0.99685039 0.99685039
 0.99685535 0.99842767 0.99685535 0.99685535]

mean value: 0.9973245679195761

key: test_fscore
value: [0.97222222 0.93333333 0.97142857 0.95774648 0.95890411 0.95652174
 0.95774648 0.97142857 0.95774648 0.92537313]

mean value: 0.9562451118080251

key: train_fscore
value: [0.99685535 0.99685535 0.99685535 1.         0.9968652  0.9968652
 0.99685535 0.99843014 0.9968652  0.9968652 ]

mean value: 0.9973312339982106

key: test_precision
value: [0.97222222 0.8974359  1.         0.94444444 0.92105263 0.97058824
 0.94444444 0.97142857 0.94444444 0.96875   ]

mean value: 0.953481089129309

key: train_precision
value: [0.99373041 0.99373041 0.99373041 1.         0.99375    0.99375
 0.99685535 0.9968652  0.99375    0.99375   ]

mean value: 0.9949911772244238

key: test_recall
value: [0.97222222 0.97222222 0.94444444 0.97142857 1.         0.94285714
 0.97142857 0.97142857 0.97142857 0.88571429]

mean value: 0.9603174603174603

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 0.99685535 1.         1.         1.        ]

mean value: 0.999685534591195

key: test_roc_auc
value: [0.9718254  0.92896825 0.97222222 0.95793651 0.95833333 0.95753968
 0.95714286 0.97142857 0.95714286 0.92857143]

mean value: 0.9561111111111111

key: train_roc_auc
value: [0.99685535 0.99685535 0.99685535 1.         0.99684543 0.99684543
 0.99685535 0.99842767 0.99685535 0.99685535]

mean value: 0.9973250600162689

key: test_jcc
value: [0.94594595 0.875      0.94444444 0.91891892 0.92105263 0.91666667
 0.91891892 0.94444444 0.91891892 0.86111111]

mean value: 0.9165422000948317

key: train_jcc
value: [0.99373041 0.99373041 0.99373041 1.         0.99375    0.99375
 0.99373041 0.9968652  0.99375    0.99375   ]

mean value: 0.99467868338558

MCC on Blind test: 0.28

Accuracy on Blind test: 0.83

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01854157 0.01431727 0.01315069 0.01308203 0.01279688 0.01326585
 0.01374698 0.01271224 0.01384306 0.01365948]

mean value: 0.013911604881286621

key: score_time
value: [0.01067591 0.00834036 0.00812936 0.00806141 0.00822115 0.00790977
 0.00792432 0.00795054 0.00802064 0.00805449]

mean value: 0.008328795433044434

key: test_mcc
value: [0.8594125  0.97220047 0.88880092 0.86237318 0.91587302 0.83095238
 0.85749293 1.         0.94440028 0.8871639 ]

mean value: 0.9018669569382461

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92957746 0.98591549 0.94366197 0.92957746 0.95774648 0.91549296
 0.92857143 1.         0.97142857 0.94285714]

mean value: 0.9504828973843058

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93150685 0.98630137 0.94285714 0.93150685 0.95774648 0.91428571
 0.92957746 1.         0.97222222 0.94117647]

mean value: 0.9507180562108437

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91891892 0.97297297 0.97058824 0.89473684 0.94444444 0.91428571
 0.91666667 1.         0.94594595 0.96969697]

mean value: 0.9448256710331013

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94444444 1.         0.91666667 0.97142857 0.97142857 0.91428571
 0.94285714 1.         1.         0.91428571]

mean value: 0.9575396825396825

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92936508 0.98571429 0.94404762 0.93015873 0.95793651 0.91547619
 0.92857143 1.         0.97142857 0.94285714]

mean value: 0.9505555555555555

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.87179487 0.97297297 0.89189189 0.87179487 0.91891892 0.84210526
 0.86842105 1.         0.94594595 0.88888889]

mean value: 0.9072734677997836

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.08

Accuracy on Blind test: 0.3

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10807514 0.10659599 0.1065073  0.10630512 0.10647535 0.10710096
 0.10619712 0.10621285 0.10633135 0.10575533]

mean value: 0.1065556526184082

key: score_time
value: [0.01735711 0.01735687 0.01724958 0.01735258 0.01745963 0.01735044
 0.01732707 0.01728344 0.01737738 0.0173161 ]

mean value: 0.017343020439147948

key: test_mcc
value: [0.89282857 0.94365079 0.97222222 0.91587302 0.91885703 0.88880092
 0.85749293 1.         0.91465912 0.82992752]

mean value: 0.9134312117371046

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94366197 0.97183099 0.98591549 0.95774648 0.95774648 0.94366197
 0.92857143 1.         0.95714286 0.91428571]

mean value: 0.956056338028169

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.97222222 0.98591549 0.95774648 0.95890411 0.94444444
 0.92957746 1.         0.95774648 0.91176471]

mean value: 0.956568981868365

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.97222222 1.         0.94444444 0.92105263 0.91891892
 0.91666667 1.         0.94444444 0.93939394]

mean value: 0.9457143267669583

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.97222222 0.97222222 0.97142857 1.         0.97142857
 0.94285714 1.         0.97142857 0.88571429]

mean value: 0.9687301587301587

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94285714 0.9718254  0.98611111 0.95793651 0.95833333 0.94404762
 0.92857143 1.         0.95714286 0.91428571]

mean value: 0.9561111111111111

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.94594595 0.97222222 0.91891892 0.92105263 0.89473684
 0.86842105 1.         0.91891892 0.83783784]

mean value: 0.9178054370159634

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.85

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00805402 0.00804996 0.00802207 0.00799775 0.00799274 0.00810194
 0.00802922 0.00801492 0.0081358  0.00799394]

mean value: 0.008039236068725586

key: score_time
value: [0.00802302 0.00803089 0.00806451 0.00799036 0.0080173  0.00841832
 0.0080297  0.00801516 0.00801277 0.00795412]

mean value: 0.008055615425109863

key: test_mcc
value: [0.72329377 0.91580648 0.69292162 0.63412698 0.77460317 0.7468254
 0.77142857 0.68599434 0.77651637 0.63089327]

mean value: 0.7352409975683333

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85915493 0.95774648 0.84507042 0.81690141 0.88732394 0.87323944
 0.88571429 0.84285714 0.88571429 0.81428571]

mean value: 0.8668008048289738

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85294118 0.95890411 0.84057971 0.81690141 0.88571429 0.87323944
 0.88571429 0.84507042 0.89189189 0.80597015]

mean value: 0.8656926876384385

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90625    0.94594595 0.87878788 0.80555556 0.88571429 0.86111111
 0.88571429 0.83333333 0.84615385 0.84375   ]

mean value: 0.8692316242316243

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.80555556 0.97222222 0.80555556 0.82857143 0.88571429 0.88571429
 0.88571429 0.85714286 0.94285714 0.77142857]

mean value: 0.8640476190476191

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85992063 0.95753968 0.84563492 0.81706349 0.88730159 0.8734127
 0.88571429 0.84285714 0.88571429 0.81428571]

mean value: 0.8669444444444444

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.74358974 0.92105263 0.725      0.69047619 0.79487179 0.775
 0.79487179 0.73170732 0.80487805 0.675     ]

mean value: 0.765644752124213

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.77

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.43684602 1.44837141 1.44293976 1.43462992 1.43130136 1.44044256
 1.4325366  1.44482112 1.4325006  1.43315077]

mean value: 1.437754011154175

key: score_time
value: [0.09354448 0.09333181 0.09289074 0.09419799 0.09253526 0.09277344
 0.09239817 0.09314775 0.0924108  0.09247732]

mean value: 0.09297077655792237

key: test_mcc
value: [0.94511009 1.         0.97222222 0.9451949  0.9451949  0.97220047
 0.94440028 1.         0.94440028 0.8871639 ]

mean value: 0.9555887036197296

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         0.98591549 0.97183099 0.97183099 0.98591549
 0.97142857 1.         0.97142857 0.94285714]

mean value: 0.9773038229376257

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 1.         0.98591549 0.97222222 0.97222222 0.98550725
 0.97222222 1.         0.97222222 0.94117647]

mean value: 0.9774461071784655

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 1.         1.         0.94594595 0.94594595 1.
 0.94594595 1.         0.94594595 0.96969697]

mean value: 0.9700849174533385

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.97222222 1.         1.         0.97142857
 1.         1.         1.         0.91428571]

mean value: 0.9857936507936508

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 1.         0.98611111 0.97222222 0.97222222 0.98571429
 0.97142857 1.         0.97142857 0.94285714]

mean value: 0.9773412698412698

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 1.         0.97222222 0.94594595 0.94594595 0.97142857
 0.94594595 1.         0.94594595 0.88888889]

mean value: 0.9563691887376098

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.8

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.87997389 0.93496108 0.96271634 0.90993977 0.95138836 0.95243716
 0.92486715 0.92792821 0.93744898 0.93452334]

mean value: 0.9316184282302856

key: score_time
value: [0.26987338 0.27504158 0.23175645 0.26720405 0.27712655 0.27116489
 0.22560811 0.20714259 0.17259336 0.24573827]

mean value: 0.2443249225616455

key: test_mcc
value: [0.94511009 1.         0.97222222 0.91587302 0.9451949  0.94511009
 0.94440028 1.         0.94440028 0.8871639 ]

mean value: 0.9499474779757923

key: train_mcc
value: [0.96559014 0.9625117  0.9625117  0.96867592 0.96867592 0.96250874
 0.96872591 0.9625688  0.96872591 0.96579568]

mean value: 0.9656290400102543

key: test_accuracy
value: [0.97183099 1.         0.98591549 0.95774648 0.97183099 0.97183099
 0.97142857 1.         0.97142857 0.94285714]

mean value: 0.974486921529175

key: train_accuracy
value: [0.98267717 0.98110236 0.98110236 0.98425197 0.98425197 0.98110236
 0.98427673 0.98113208 0.98427673 0.9827044 ]

mean value: 0.9826878126083296

key: test_fscore
value: [0.97297297 1.         0.98591549 0.95774648 0.97222222 0.97058824
 0.97222222 1.         0.97222222 0.94117647]

mean value: 0.9745066317352978

key: train_fscore
value: [0.98283931 0.98130841 0.98130841 0.98442368 0.98442368 0.98136646
 0.98442368 0.98136646 0.98442368 0.98294574]

mean value: 0.9828829495741059

key: test_precision
value: [0.94736842 1.         1.         0.94444444 0.94594595 1.
 0.94594595 1.         0.94594595 0.96969697]

mean value: 0.9699347673031884

key: train_precision
value: [0.97222222 0.96923077 0.96923077 0.97530864 0.97530864 0.96932515
 0.97530864 0.96932515 0.97530864 0.96941896]

mean value: 0.971998759557811

key: test_recall
value: [1.         1.         0.97222222 0.97142857 1.         0.94285714
 1.         1.         1.         0.91428571]

mean value: 0.9800793650793651

key: train_recall
value: [0.99369085 0.99369085 0.99369085 0.99371069 0.99371069 0.99371069
 0.99371069 0.99371069 0.99371069 0.99685535]

mean value: 0.9940192052060394

key: test_roc_auc
value: [0.97142857 1.         0.98611111 0.95793651 0.97222222 0.97142857
 0.97142857 1.         0.97142857 0.94285714]

mean value: 0.974484126984127

key: train_roc_auc
value: [0.98269448 0.98112216 0.98112216 0.98423705 0.98423705 0.98108248
 0.98427673 0.98113208 0.98427673 0.9827044 ]

mean value: 0.9826885304446165

key: test_jcc
value: [0.94736842 1.         0.97222222 0.91891892 0.94594595 0.94285714
 0.94594595 1.         0.94594595 0.88888889]

mean value: 0.9508093431777642

key: train_jcc
value: [0.96625767 0.96330275 0.96330275 0.96932515 0.96932515 0.96341463
 0.96932515 0.96341463 0.96932515 0.96646341]

mean value: 0.9663456469722573

MCC on Blind test: 0.24

Accuracy on Blind test: 0.76

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02038693 0.0088098  0.00817513 0.00887132 0.0087924  0.0088861
 0.00894165 0.00874758 0.00866318 0.0089376 ]

mean value: 0.00992116928100586

key: score_time
value: [0.01118541 0.00879335 0.00893068 0.00872803 0.00877929 0.00873256
 0.00876904 0.00859785 0.00878286 0.00883555]

mean value: 0.00901346206665039

key: test_mcc
value: [0.67079854 0.71825397 0.54972312 0.70470171 0.6153057  0.6473892
 0.6350853  0.66701701 0.66701701 0.57166195]

mean value: 0.6446953514858611

key: train_mcc
value: [0.66188316 0.6530534  0.67906111 0.66043489 0.67030115 0.66562282
 0.66739685 0.66332496 0.66391373 0.68010917]

mean value: 0.6665101243381613

key: test_accuracy
value: [0.83098592 0.85915493 0.77464789 0.84507042 0.8028169  0.81690141
 0.81428571 0.82857143 0.82857143 0.78571429]

mean value: 0.818672032193159

key: train_accuracy
value: [0.82677165 0.82362205 0.83622047 0.82677165 0.83149606 0.82992126
 0.83018868 0.82861635 0.82861635 0.83647799]

mean value: 0.8298702520675482

key: test_fscore
value: [0.84615385 0.86111111 0.78378378 0.85714286 0.81578947 0.83116883
 0.82666667 0.84210526 0.84210526 0.78873239]

mean value: 0.8294759490393293

key: train_fscore
value: [0.83918129 0.83431953 0.84660767 0.83870968 0.84333821 0.84070796
 0.84164223 0.83946981 0.83994126 0.84750733]

mean value: 0.8411424970085409

key: test_precision
value: [0.78571429 0.86111111 0.76315789 0.78571429 0.75609756 0.76190476
 0.775      0.7804878  0.7804878  0.77777778]

mean value: 0.7827453287690772

key: train_precision
value: [0.78201635 0.78551532 0.79501385 0.78571429 0.7890411  0.79166667
 0.78846154 0.78947368 0.78787879 0.79395604]

mean value: 0.7888737622301876

key: test_recall
value: [0.91666667 0.86111111 0.80555556 0.94285714 0.88571429 0.91428571
 0.88571429 0.91428571 0.91428571 0.8       ]

mean value: 0.8840476190476191

key: train_recall
value: [0.90536278 0.88958991 0.90536278 0.89937107 0.90566038 0.89622642
 0.90251572 0.89622642 0.89937107 0.90880503]

mean value: 0.900849155804218

key: test_roc_auc
value: [0.8297619  0.85912698 0.77420635 0.84642857 0.80396825 0.81825397
 0.81428571 0.82857143 0.82857143 0.78571429]

mean value: 0.8188888888888889

key: train_roc_auc
value: [0.82689522 0.82372577 0.83632919 0.82665714 0.83137908 0.82981668
 0.83018868 0.82861635 0.82861635 0.83647799]

mean value: 0.8298702458187013

key: test_jcc
value: [0.73333333 0.75609756 0.64444444 0.75       0.68888889 0.71111111
 0.70454545 0.72727273 0.72727273 0.65116279]

mean value: 0.7094129038541971

key: train_jcc
value: [0.72292191 0.71573604 0.73401535 0.72222222 0.72911392 0.72519084
 0.72658228 0.72335025 0.72405063 0.73536896]

mean value: 0.7258552408145388

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.08077264 0.05656648 0.05679369 0.05571246 0.06284809 0.0810225
 0.06032395 0.06238079 0.06300926 0.22280788]

mean value: 0.08022377490997315

key: score_time
value: [0.01013517 0.00978065 0.00984907 0.01045036 0.00996995 0.01013207
 0.00967622 0.00966024 0.00964713 0.01016784]

mean value: 0.009946870803833007

key: test_mcc
value: [0.94511009 1.         0.9451949  0.9451949  0.91885703 0.94511009
 0.94440028 1.         0.94440028 0.8871639 ]

mean value: 0.9475431470899147

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         0.97183099 0.97183099 0.95774648 0.97183099
 0.97142857 1.         0.97142857 0.94285714]

mean value: 0.9730784708249497

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 1.         0.97142857 0.97222222 0.95890411 0.97058824
 0.97222222 1.         0.97222222 0.94117647]

mean value: 0.9731737026539605

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 1.         1.         0.94594595 0.92105263 1.
 0.94594595 1.         0.94594595 0.96969697]

mean value: 0.9675955860166386

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.94444444 1.         1.         0.94285714
 1.         1.         1.         0.91428571]

mean value: 0.9801587301587301

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 1.         0.97222222 0.97222222 0.95833333 0.97142857
 0.97142857 1.         0.97142857 0.94285714]

mean value: 0.9731349206349206

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 1.         0.94444444 0.94594595 0.92105263 0.94285714
 0.94594595 1.         0.94594595 0.88888889]

mean value: 0.9482449366659893

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.01

Accuracy on Blind test: 0.32

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01711631 0.05170107 0.04450679 0.04490185 0.04405975 0.04858232
 0.04450536 0.03757882 0.04440331 0.04462457]

mean value: 0.042198014259338376

key: score_time
value: [0.01043272 0.01453161 0.01896262 0.01934981 0.01965714 0.01737046
 0.01697755 0.01947474 0.02058792 0.01093698]

mean value: 0.016828155517578124

key: test_mcc
value: [0.94365079 0.89282857 1.         0.88730159 0.91587302 0.9186708
 0.91465912 0.97182532 0.88571429 0.82857143]

mean value: 0.9159094921471266

key: train_mcc
value: [0.93700772 0.93070849 0.92759921 0.94962452 0.94016229 0.9433251
 0.94341489 0.92771424 0.94025622 0.94341489]

mean value: 0.93832275544477

key: test_accuracy
value: [0.97183099 0.94366197 1.         0.94366197 0.95774648 0.95774648
 0.95714286 0.98571429 0.94285714 0.91428571]

mean value: 0.9574647887323944

key: train_accuracy
value: [0.96850394 0.96535433 0.96377953 0.97480315 0.97007874 0.97165354
 0.97169811 0.96383648 0.97012579 0.97169811]

mean value: 0.9691531718912494

key: test_fscore
value: [0.97222222 0.94736842 1.         0.94285714 0.95774648 0.95522388
 0.95774648 0.98591549 0.94285714 0.91428571]

mean value: 0.9576222974576094

key: train_fscore
value: [0.96845426 0.96529968 0.96354992 0.97492163 0.97007874 0.97178683
 0.97160883 0.96366509 0.97007874 0.97178683]

mean value: 0.9691230561794373

key: test_precision
value: [0.97222222 0.9        1.         0.94285714 0.94444444 1.
 0.94444444 0.97222222 0.94285714 0.91428571]

mean value: 0.9533333333333334

key: train_precision
value: [0.96845426 0.96529968 0.96815287 0.971875   0.97160883 0.96875
 0.97468354 0.96825397 0.97160883 0.96875   ]

mean value: 0.9697436987632612

key: test_recall
value: [0.97222222 1.         1.         0.94285714 0.97142857 0.91428571
 0.97142857 1.         0.94285714 0.91428571]

mean value: 0.962936507936508

key: train_recall
value: [0.96845426 0.96529968 0.95899054 0.97798742 0.96855346 0.97484277
 0.96855346 0.9591195  0.96855346 0.97484277]

mean value: 0.9685197309683947

key: test_roc_auc
value: [0.9718254  0.94285714 1.         0.94365079 0.95793651 0.95714286
 0.95714286 0.98571429 0.94285714 0.91428571]

mean value: 0.9573412698412698

key: train_roc_auc
value: [0.96850386 0.96535424 0.963772   0.97479813 0.97008115 0.97164851
 0.97169811 0.96383648 0.97012579 0.97169811]

mean value: 0.9691516377993373

key: test_jcc
value: [0.94594595 0.9        1.         0.89189189 0.91891892 0.91428571
 0.91891892 0.97222222 0.89189189 0.84210526]

mean value: 0.9196180767233398

key: train_jcc
value: [0.93883792 0.93292683 0.92966361 0.95107034 0.94189602 0.94512195
 0.94478528 0.92987805 0.94189602 0.94512195]

mean value: 0.9401197970934513

MCC on Blind test: 0.3

Accuracy on Blind test: 0.83

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01068687 0.0080061  0.00796509 0.00772619 0.00784731 0.00785708
 0.00791907 0.00809073 0.00793958 0.00783396]

mean value: 0.008187198638916015

key: score_time
value: [0.00917411 0.00820422 0.00817823 0.00810766 0.00799704 0.00803733
 0.00800204 0.00832915 0.0080092  0.00808311]

mean value: 0.00821220874786377

key: test_mcc
value: [0.72811105 0.6656213  0.66269083 0.69762232 0.69762232 0.64082051
 0.57735027 0.6614769  0.71899664 0.68599434]

mean value: 0.6736306482828506

key: train_mcc
value: [0.68565341 0.6963999  0.68905264 0.68994047 0.69528523 0.69577133
 0.69581242 0.68997016 0.69048219 0.69506299]

mean value: 0.692343074334257

key: test_accuracy
value: [0.85915493 0.83098592 0.83098592 0.84507042 0.84507042 0.81690141
 0.78571429 0.82857143 0.85714286 0.84285714]

mean value: 0.8342454728370221

key: train_accuracy
value: [0.84094488 0.84566929 0.84251969 0.84251969 0.84566929 0.84566929
 0.84591195 0.8427673  0.8427673  0.84433962]

mean value: 0.8438778289506265

key: test_fscore
value: [0.87179487 0.84210526 0.83783784 0.85333333 0.85333333 0.82666667
 0.8        0.83783784 0.86486486 0.84057971]

mean value: 0.8428353718971567

key: train_fscore
value: [0.84857571 0.85416667 0.8502994  0.85163205 0.85373134 0.85416667
 0.85373134 0.85119048 0.85163205 0.85419735]

mean value: 0.8523323053430707

key: test_precision
value: [0.80952381 0.8        0.81578947 0.8        0.8        0.775
 0.75       0.79487179 0.82051282 0.85294118]

mean value: 0.8018639075063224

key: train_precision
value: [0.80857143 0.8084507  0.80911681 0.80617978 0.8125     0.81073446
 0.8125     0.8079096  0.80617978 0.8033241 ]

mean value: 0.808546665999499

key: test_recall
value: [0.94444444 0.88888889 0.86111111 0.91428571 0.91428571 0.88571429
 0.85714286 0.88571429 0.91428571 0.82857143]

mean value: 0.8894444444444445

key: train_recall
value: [0.89274448 0.90536278 0.89589905 0.90251572 0.89937107 0.90251572
 0.89937107 0.89937107 0.90251572 0.91194969]

mean value: 0.9011616372041347

key: test_roc_auc
value: [0.85793651 0.83015873 0.83055556 0.84603175 0.84603175 0.81785714
 0.78571429 0.82857143 0.85714286 0.84285714]

mean value: 0.8342857142857143

key: train_roc_auc
value: [0.84102633 0.84576315 0.84260361 0.84242505 0.84558459 0.84557963
 0.84591195 0.8427673  0.8427673  0.84433962]

mean value: 0.8438768525682995

key: test_jcc
value: [0.77272727 0.72727273 0.72093023 0.74418605 0.74418605 0.70454545
 0.66666667 0.72093023 0.76190476 0.725     ]

mean value: 0.7288349441256418

key: train_jcc
value: [0.73697917 0.74545455 0.73958333 0.74160207 0.74479167 0.74545455
 0.74479167 0.74093264 0.74160207 0.74550129]

mean value: 0.742669298644344

MCC on Blind test: 0.32

Accuracy on Blind test: 0.8

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00993681 0.01531029 0.01399684 0.01523709 0.01411676 0.01322913
 0.01689434 0.0154171  0.01349378 0.01362562]

mean value: 0.014125776290893555

key: score_time
value: [0.00818229 0.0100224  0.01002216 0.01056504 0.0105257  0.01053262
 0.01064253 0.01050687 0.0105536  0.01051474]

mean value: 0.010206794738769532

key: test_mcc
value: [0.8365327  0.91580648 0.89315217 0.91885703 0.83214239 0.9186708
 0.91766294 1.         0.80032673 0.80829038]

mean value: 0.8841441619108454

key: train_mcc
value: [0.87618527 0.93389881 0.91532447 0.91736146 0.75092152 0.93072764
 0.92900139 0.92149756 0.93418862 0.94654556]

mean value: 0.9055652313449004

key: test_accuracy
value: [0.91549296 0.95774648 0.94366197 0.95774648 0.91549296 0.95774648
 0.95714286 1.         0.9        0.9       ]

mean value: 0.9405030181086519

key: train_accuracy
value: [0.93543307 0.96692913 0.95748031 0.95748031 0.86299213 0.96535433
 0.96383648 0.96069182 0.96698113 0.97327044]

mean value: 0.9510449165552419

key: test_fscore
value: [0.91176471 0.95890411 0.94117647 0.95890411 0.91176471 0.95522388
 0.95890411 1.         0.89855072 0.89230769]

mean value: 0.9387500508662453

key: train_fscore
value: [0.93155259 0.96671949 0.9568     0.95902883 0.84324324 0.96529968
 0.96477795 0.96099844 0.96661367 0.9733124 ]

mean value: 0.9488346302113415

key: test_precision
value: [0.96875    0.94594595 1.         0.92105263 0.93939394 1.
 0.92105263 1.         0.91176471 0.96666667]

mean value: 0.95746265210468

key: train_precision
value: [0.9893617  0.97133758 0.97077922 0.92668622 0.98734177 0.96835443
 0.94029851 0.95356037 0.97749196 0.97178683]

mean value: 0.9656998596315463

key: test_recall
value: [0.86111111 0.97222222 0.88888889 1.         0.88571429 0.91428571
 1.         1.         0.88571429 0.82857143]

mean value: 0.9236507936507936

key: train_recall
value: [0.88012618 0.96214511 0.94321767 0.99371069 0.73584906 0.96226415
 0.99056604 0.96855346 0.95597484 0.97484277]

mean value: 0.9367249965279845

key: test_roc_auc
value: [0.91626984 0.95753968 0.94444444 0.95833333 0.91507937 0.95714286
 0.95714286 1.         0.9        0.9       ]

mean value: 0.940595238095238

key: train_roc_auc
value: [0.93534611 0.96692161 0.95745789 0.95742317 0.86319267 0.9653592
 0.96383648 0.96069182 0.96698113 0.97327044]

mean value: 0.951048052695276

key: test_jcc
value: [0.83783784 0.92105263 0.88888889 0.92105263 0.83783784 0.91428571
 0.92105263 1.         0.81578947 0.80555556]

mean value: 0.8863353202826887

key: train_jcc
value: [0.871875   0.93558282 0.91717791 0.9212828  0.72897196 0.93292683
 0.93195266 0.92492492 0.93538462 0.94801223]

mean value: 0.9048091762362589

MCC on Blind test: 0.25

Accuracy on Blind test: 0.82

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01513577 0.0140748  0.01494813 0.01315999 0.01602697 0.01350999
 0.01594853 0.01357126 0.01522708 0.01573467]

mean value: 0.014733719825744628

key: score_time
value: [0.0106318  0.01056147 0.01055241 0.01054549 0.01066828 0.01055503
 0.01060414 0.01057744 0.01061797 0.02540255]

mean value: 0.012071657180786132

key: test_mcc
value: [0.85952381 0.88880092 0.91885703 0.91587302 0.91885703 0.88862624
 0.91465912 0.97182532 0.80295507 0.85749293]

mean value: 0.8937470474465333

key: train_mcc
value: [0.94990974 0.90065217 0.92530412 0.95620727 0.95298581 0.9401617
 0.9311123  0.93081761 0.88444772 0.94985462]

mean value: 0.9321453053914295

key: test_accuracy
value: [0.92957746 0.94366197 0.95774648 0.95774648 0.95774648 0.94366197
 0.95714286 0.98571429 0.9        0.92857143]

mean value: 0.9461569416498994

key: train_accuracy
value: [0.97480315 0.9496063  0.96220472 0.97795276 0.97637795 0.97007874
 0.96540881 0.96540881 0.94025157 0.97484277]

mean value: 0.9656935571732779

key: test_fscore
value: [0.92957746 0.94285714 0.95652174 0.95774648 0.95890411 0.94117647
 0.95774648 0.98591549 0.89552239 0.92957746]

mean value: 0.9455545230506246

key: train_fscore
value: [0.97507788 0.94805195 0.96129032 0.97826087 0.97667185 0.97017268
 0.96496815 0.96540881 0.93729373 0.97507788]

mean value: 0.9652274125866556

key: test_precision
value: [0.94285714 0.97058824 1.         0.94444444 0.92105263 0.96969697
 0.94444444 0.97222222 0.9375     0.91666667]

mean value: 0.9519472757204955

key: train_precision
value: [0.96307692 0.97658863 0.98349835 0.96625767 0.96615385 0.96865204
 0.97741935 0.96540881 0.98611111 0.96604938]

mean value: 0.9719216107854822

key: test_recall
value: [0.91666667 0.91666667 0.91666667 0.97142857 1.         0.91428571
 0.97142857 1.         0.85714286 0.94285714]

mean value: 0.9407142857142857

key: train_recall
value: [0.9873817  0.92113565 0.94006309 0.99056604 0.98742138 0.97169811
 0.95283019 0.96540881 0.89308176 0.98427673]

mean value: 0.9593863460508303

key: test_roc_auc
value: [0.9297619  0.94404762 0.95833333 0.95793651 0.95833333 0.94325397
 0.95714286 0.98571429 0.9        0.92857143]

mean value: 0.9463095238095238

key: train_roc_auc
value: [0.97482293 0.94956153 0.96216991 0.97793286 0.97636053 0.97007619
 0.96540881 0.96540881 0.94025157 0.97484277]

mean value: 0.9656835902624844

key: test_jcc
value: [0.86842105 0.89189189 0.91666667 0.91891892 0.92105263 0.88888889
 0.91891892 0.97222222 0.81081081 0.86842105]

mean value: 0.8976213055160424

key: train_jcc
value: [0.95136778 0.90123457 0.92546584 0.95744681 0.95440729 0.94207317
 0.93230769 0.9331307  0.88198758 0.95136778]

mean value: 0.9330789211831344

MCC on Blind test: 0.16

Accuracy on Blind test: 0.65

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.12434745 0.10765958 0.10760546 0.10737205 0.10747123 0.1078012
 0.10726714 0.10751939 0.11066079 0.10946774]

mean value: 0.10971720218658447

key: score_time
value: [0.01442289 0.01422215 0.0144012  0.01428294 0.01433635 0.01430273
 0.01422977 0.0143826  0.01571679 0.01569033]

mean value: 0.014598774909973144

key: test_mcc
value: [0.91580648 1.         0.91885703 0.9451949  0.91587302 0.94511009
 0.91766294 0.97182532 0.94440028 0.8871639 ]

mean value: 0.9361893953436148

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95774648 1.         0.95774648 0.97183099 0.95774648 0.97183099
 0.95714286 0.98571429 0.97142857 0.94285714]

mean value: 0.9674044265593561

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95890411 1.         0.95652174 0.97222222 0.95774648 0.97058824
 0.95890411 0.98550725 0.97222222 0.94117647]

mean value: 0.9673792833885365

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94594595 1.         1.         0.94594595 0.94444444 1.
 0.92105263 1.         0.94594595 0.96969697]

mean value: 0.96730318835582

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.97222222 1.         0.91666667 1.         0.97142857 0.94285714
 1.         0.97142857 1.         0.91428571]

mean value: 0.9688888888888889

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95753968 1.         0.95833333 0.97222222 0.95793651 0.97142857
 0.95714286 0.98571429 0.97142857 0.94285714]

mean value: 0.9674603174603175

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.92105263 1.         0.91666667 0.94594595 0.91891892 0.94285714
 0.92105263 0.97142857 0.94594595 0.88888889]

mean value: 0.9372757343809975

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.65

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03845501 0.04855371 0.05894065 0.03629017 0.03888655 0.0367496
 0.03712749 0.03992295 0.06053734 0.04424524]

mean value: 0.04397087097167969

key: score_time
value: [0.02625251 0.03578377 0.03374958 0.03522325 0.01706123 0.01737881
 0.02301645 0.02451253 0.030761   0.02602267]

mean value: 0.026976180076599122

key: test_mcc
value: [0.94365079 0.97220047 0.91587302 0.91587302 0.9451949  0.94365079
 0.91766294 1.         0.94440028 0.8871639 ]

mean value: 0.9385670099602731

key: train_mcc
value: [0.99685535 0.99055612 0.99370077 0.98425689 0.99685531 0.98425673
 0.99373035 0.99057094 0.99371069 0.99061012]

mean value: 0.9915103266748221

key: test_accuracy
value: [0.97183099 0.98591549 0.95774648 0.95774648 0.97183099 0.97183099
 0.95714286 1.         0.97142857 0.94285714]

mean value: 0.9688329979879275

key: train_accuracy
value: [0.9984252  0.99527559 0.99685039 0.99212598 0.9984252  0.99212598
 0.99685535 0.99528302 0.99685535 0.99528302]

mean value: 0.9957505076016441

key: test_fscore
value: [0.97222222 0.98630137 0.95774648 0.95774648 0.97222222 0.97142857
 0.95890411 1.         0.97222222 0.94117647]

mean value: 0.9689970145882008

key: train_fscore
value: [0.9984252  0.99527559 0.99684543 0.99212598 0.99843014 0.99215071
 0.99684543 0.99529042 0.99685535 0.99530516]

mean value: 0.9957549405205315

key: test_precision
value: [0.97222222 0.97297297 0.97142857 0.94444444 0.94594595 0.97142857
 0.92105263 1.         0.94594595 0.96969697]

mean value: 0.9615138275664592

key: train_precision
value: [0.99685535 0.99371069 0.99684543 0.99369085 0.9968652  0.99059561
 1.         0.99373041 0.99685535 0.99065421]

mean value: 0.9949803089428332

key: test_recall
value: [0.97222222 1.         0.94444444 0.97142857 1.         0.97142857
 1.         1.         1.         0.91428571]

mean value: 0.9773809523809524

key: train_recall
value: [1.         0.99684543 0.99684543 0.99056604 1.         0.99371069
 0.99371069 0.99685535 0.99685535 1.        ]

mean value: 0.9965388964942563

key: test_roc_auc
value: [0.9718254  0.98571429 0.95793651 0.95793651 0.97222222 0.9718254
 0.95714286 1.         0.97142857 0.94285714]

mean value: 0.9688888888888889

key: train_roc_auc
value: [0.99842767 0.99527806 0.99685039 0.99212844 0.99842271 0.99212348
 0.99685535 0.99528302 0.99685535 0.99528302]

mean value: 0.9957507489633554

key: test_jcc
value: [0.94594595 0.97297297 0.91891892 0.91891892 0.94594595 0.94444444
 0.92105263 1.         0.94594595 0.88888889]

mean value: 0.940303461356093

key: train_jcc
value: [0.99685535 0.99059561 0.99371069 0.984375   0.9968652  0.98442368
 0.99371069 0.990625   0.99373041 0.99065421]

mean value: 0.9915545833750219

MCC on Blind test: 0.0

Accuracy on Blind test: 0.38

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.25946164 0.30284429 0.27964544 0.27268171 0.31926012 0.17955041
 0.16414642 0.25722194 0.18691826 0.15706849]

mean value: 0.23787987232208252

key: score_time
value: [0.02183843 0.0219655  0.02176881 0.02186322 0.02178836 0.01367235
 0.02136087 0.01341414 0.01887727 0.02596784]

mean value: 0.02025167942047119

key: test_mcc
value: [0.77565853 0.66190476 0.74662454 0.88880092 0.86802778 0.83240693
 0.6882472  0.81649658 0.82992752 0.65714286]

mean value: 0.7765237623635132

key: train_mcc
value: [0.87752313 0.88987659 0.86815344 0.87737406 0.88046834 0.88367504
 0.88078191 0.88078191 0.87771008 0.89068168]

mean value: 0.8807026177428205

key: test_accuracy
value: [0.88732394 0.83098592 0.87323944 0.94366197 0.92957746 0.91549296
 0.84285714 0.9        0.91428571 0.82857143]

mean value: 0.8865995975855131

key: train_accuracy
value: [0.93858268 0.94488189 0.93385827 0.93858268 0.94015748 0.94173228
 0.94025157 0.94025157 0.93867925 0.94496855]

mean value: 0.9401946218986778

key: test_fscore
value: [0.89189189 0.83333333 0.87671233 0.94444444 0.93333333 0.91666667
 0.84931507 0.90909091 0.91666667 0.82857143]

mean value: 0.8900026071258949

key: train_fscore
value: [0.93934681 0.94522692 0.93478261 0.93934681 0.94080997 0.94245723
 0.94099379 0.94099379 0.93953488 0.94607088]

mean value: 0.9409563689601331

key: test_precision
value: [0.86842105 0.83333333 0.86486486 0.91891892 0.875      0.89189189
 0.81578947 0.83333333 0.89189189 0.82857143]

mean value: 0.8622016189121453

key: train_precision
value: [0.92638037 0.9378882  0.9204893  0.92923077 0.93209877 0.93230769
 0.92944785 0.92944785 0.9266055  0.92749245]

mean value: 0.9291388747701107

key: test_recall
value: [0.91666667 0.83333333 0.88888889 0.97142857 1.         0.94285714
 0.88571429 1.         0.94285714 0.82857143]

mean value: 0.921031746031746

key: train_recall
value: [0.95268139 0.95268139 0.94952681 0.94968553 0.94968553 0.95283019
 0.95283019 0.95283019 0.95283019 0.96540881]

mean value: 0.9530990218836181

key: test_roc_auc
value: [0.88690476 0.83095238 0.87301587 0.94404762 0.93055556 0.91587302
 0.84285714 0.9        0.91428571 0.82857143]

mean value: 0.8867063492063492

key: train_roc_auc
value: [0.93860484 0.94489415 0.9338829  0.93856516 0.94014245 0.94171478
 0.94025157 0.94025157 0.93867925 0.94496855]

mean value: 0.9401955240759479

key: test_jcc
value: [0.80487805 0.71428571 0.7804878  0.89473684 0.875      0.84615385
 0.73809524 0.83333333 0.84615385 0.70731707]

mean value: 0.8040441746956509

key: train_jcc
value: [0.8856305  0.89614243 0.87755102 0.8856305  0.88823529 0.89117647
 0.88856305 0.88856305 0.88596491 0.89766082]

mean value: 0.8885118046116812

MCC on Blind test: 0.36

Accuracy on Blind test: 0.83

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.31332707 0.30789495 0.30552244 0.30341172 0.30553842 0.30712295
 0.30548239 0.31058264 0.30930758 0.30872202]

mean value: 0.30769121646881104

key: score_time
value: [0.00987029 0.0085175  0.00868106 0.00901246 0.00944805 0.00861239
 0.00868559 0.00857615 0.00932789 0.00859332]

mean value: 0.00893247127532959

key: test_mcc
value: [0.94511009 0.97220047 0.91885703 0.88880092 0.9451949  0.94511009
 0.91766294 1.         0.94440028 0.860309  ]

mean value: 0.9337645710169541

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 0.98591549 0.95774648 0.94366197 0.97183099 0.97183099
 0.95714286 1.         0.97142857 0.92857143]

mean value: 0.9659959758551308

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.98630137 0.95652174 0.94444444 0.97222222 0.97058824
 0.95890411 1.         0.97222222 0.92537313]

mean value: 0.9659550450066827

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.97297297 1.         0.91891892 0.94594595 1.
 0.92105263 1.         0.94594595 0.96875   ]

mean value: 0.9620954836415363

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.91666667 0.97142857 1.         0.94285714
 1.         1.         1.         0.88571429]

mean value: 0.9716666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 0.98571429 0.95833333 0.94404762 0.97222222 0.97142857
 0.95714286 1.         0.97142857 0.92857143]

mean value: 0.966031746031746

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.97297297 0.91666667 0.89473684 0.94594595 0.94285714
 0.92105263 1.         0.94594595 0.86111111]

mean value: 0.9348657680236627

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.04

Accuracy on Blind test: 0.34

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01238108 0.0148325  0.0149672  0.01481032 0.03123617 0.01486897
 0.0147841  0.01480579 0.01492763 0.02490044]

mean value: 0.017251420021057128

key: score_time
value: [0.01073623 0.01106596 0.01105785 0.01112366 0.01126671 0.01110649
 0.01109028 0.01363516 0.01109719 0.01138449]

mean value: 0.011356401443481445

key: test_mcc
value: [0.66068747 0.25082639 0.61746548 0.62970191 0.57247871 0.45738492
 0.69954392 0.61036794 0.63245553 0.71899664]

mean value: 0.5849908918897713

key: train_mcc
value: [0.62752065 0.59587004 0.6984327  0.7396315  0.63814678 0.6036901
 0.66988593 0.6066425  0.61612462 0.91194969]

mean value: 0.6707894510450889

key: test_accuracy
value: [0.8028169  0.6056338  0.77464789 0.8028169  0.76056338 0.67605634
 0.82857143 0.77142857 0.78571429 0.85714286]

mean value: 0.7665392354124748

key: train_accuracy
value: [0.78267717 0.76220472 0.83622047 0.85354331 0.79055118 0.76692913
 0.80974843 0.77044025 0.77515723 0.95597484]

mean value: 0.810344673896895

key: test_fscore
value: [0.75862069 0.48148148 0.71428571 0.76666667 0.69090909 0.5106383
 0.79310345 0.7037037  0.72727273 0.84848485]

mean value: 0.6995166668607607

key: train_fscore
value: [0.72177419 0.6873706  0.81021898 0.82872928 0.73663366 0.69672131
 0.76504854 0.70325203 0.70993915 0.95597484]

mean value: 0.7615662595724322

key: test_precision
value: [1.         0.72222222 1.         0.92       0.95       1.
 1.         1.         1.         0.90322581]

mean value: 0.9495448028673835

key: train_precision
value: [1.         1.         0.96103896 1.         0.99465241 1.
 1.         0.99425287 1.         0.95597484]

mean value: 0.9905919083786587

key: test_recall
value: [0.61111111 0.36111111 0.55555556 0.65714286 0.54285714 0.34285714
 0.65714286 0.54285714 0.57142857 0.8       ]

mean value: 0.5642063492063492

key: train_recall
value: [0.56466877 0.52365931 0.70031546 0.70754717 0.58490566 0.53459119
 0.61949686 0.54402516 0.55031447 0.95597484]

mean value: 0.6285498879034978

key: test_roc_auc
value: [0.80555556 0.60912698 0.77777778 0.80079365 0.75753968 0.67142857
 0.82857143 0.77142857 0.78571429 0.85714286]

mean value: 0.7665079365079365

key: train_roc_auc
value: [0.78233438 0.76182965 0.83600679 0.85377358 0.79087554 0.7672956
 0.80974843 0.77044025 0.77515723 0.95597484]

mean value: 0.8103436303394639

key: test_jcc
value: [0.61111111 0.31707317 0.55555556 0.62162162 0.52777778 0.34285714
 0.65714286 0.54285714 0.57142857 0.73684211]

mean value: 0.5484267056346646

key: train_jcc
value: [0.56466877 0.52365931 0.6809816  0.70754717 0.5830721  0.53459119
 0.61949686 0.54231975 0.55031447 0.91566265]

mean value: 0.6222313856468585

MCC on Blind test: 0.25

Accuracy on Blind test: 0.91

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02410054 0.01189899 0.01190758 0.01197672 0.01203632 0.02087998
 0.03131342 0.03137183 0.03720117 0.03229856]

mean value: 0.022498512268066408

key: score_time
value: [0.02143836 0.01087689 0.01085854 0.01089215 0.01084137 0.02098989
 0.02022457 0.01981997 0.01118922 0.01599312]

mean value: 0.015312409400939942

key: test_mcc
value: [0.94365079 0.9186708  1.         0.88730159 0.9451949  0.88862624
 0.91465912 0.94440028 0.91465912 0.80295507]

mean value: 0.9160117910547262

key: train_mcc
value: [0.93078099 0.92126383 0.92442685 0.94646152 0.94330695 0.93700772
 0.93712545 0.92454659 0.93396688 0.94025622]

mean value: 0.9339142994473958

key: test_accuracy
value: [0.97183099 0.95774648 1.         0.94366197 0.97183099 0.94366197
 0.95714286 0.97142857 0.95714286 0.9       ]

mean value: 0.9574446680080483

key: train_accuracy
value: [0.96535433 0.96062992 0.96220472 0.97322835 0.97165354 0.96850394
 0.96855346 0.96226415 0.96698113 0.97012579]

mean value: 0.9669499331451493

key: test_fscore
value: [0.97222222 0.96       1.         0.94285714 0.97222222 0.94117647
 0.95774648 0.97222222 0.95774648 0.89552239]

mean value: 0.9571715625918226

key: train_fscore
value: [0.96507937 0.96050553 0.96202532 0.97322835 0.97169811 0.96855346
 0.96845426 0.96214511 0.96692913 0.97017268]

mean value: 0.9668791316946547

key: test_precision
value: [0.97222222 0.92307692 1.         0.94285714 0.94594595 0.96969697
 0.94444444 0.94594595 0.94444444 0.9375    ]

mean value: 0.9526134038634039

key: train_precision
value: [0.97124601 0.96202532 0.96507937 0.97476341 0.97169811 0.96855346
 0.97151899 0.96518987 0.96845426 0.96865204]

mean value: 0.9687180824244073

key: test_recall
value: [0.97222222 1.         1.         0.94285714 1.         0.91428571
 0.97142857 1.         0.97142857 0.85714286]

mean value: 0.962936507936508

key: train_recall
value: [0.95899054 0.95899054 0.95899054 0.97169811 0.97169811 0.96855346
 0.96540881 0.9591195  0.96540881 0.97169811]

mean value: 0.9650556514493185

key: test_roc_auc
value: [0.9718254  0.95714286 1.         0.94365079 0.97222222 0.94325397
 0.95714286 0.97142857 0.95714286 0.9       ]

mean value: 0.9573809523809523

key: train_roc_auc
value: [0.96534432 0.96062734 0.96219967 0.97323076 0.97165347 0.96850386
 0.96855346 0.96226415 0.96698113 0.97012579]

mean value: 0.9669483959288138

key: test_jcc
value: [0.94594595 0.92307692 1.         0.89189189 0.94594595 0.88888889
 0.91891892 0.94594595 0.91891892 0.81081081]

mean value: 0.9190344190344191

key: train_jcc
value: [0.93251534 0.92401216 0.92682927 0.94785276 0.94495413 0.93902439
 0.93883792 0.92705167 0.93597561 0.94207317]

mean value: 0.9359126415900797

MCC on Blind test: 0.28

Accuracy on Blind test: 0.83

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.14312363 0.11674213 0.22311258 0.18742394 0.09935522 0.20226431
 0.20428991 0.13562512 0.20273662 0.20825934]

mean value: 0.1722932815551758

key: score_time
value: [0.01109982 0.01091933 0.01983857 0.01111674 0.01117826 0.02104044
 0.01099682 0.01117277 0.01107144 0.01106048]

mean value: 0.012949466705322266

key: test_mcc
value: [0.94365079 0.9186708  1.         0.88730159 0.9451949  0.88862624
 0.91465912 0.97182532 0.88571429 0.80295507]

mean value: 0.9158598109706015

key: train_mcc
value: [0.93702568 0.93070849 0.92759921 0.94646152 0.94016229 0.94649802
 0.94029342 0.93400383 0.94025622 0.94025622]

mean value: 0.9383264898322129

key: test_accuracy
value: [0.97183099 0.95774648 1.         0.94366197 0.97183099 0.94366197
 0.95714286 0.98571429 0.94285714 0.9       ]

mean value: 0.9574446680080483

key: train_accuracy
value: [0.96850394 0.96535433 0.96377953 0.97322835 0.97007874 0.97322835
 0.97012579 0.96698113 0.97012579 0.97012579]

mean value: 0.9691531718912494

key: test_fscore
value: [0.97222222 0.96       1.         0.94285714 0.97222222 0.94117647
 0.95774648 0.98591549 0.94285714 0.89552239]

mean value: 0.9570519560637653

key: train_fscore
value: [0.96835443 0.96529968 0.96354992 0.97322835 0.97007874 0.97339593
 0.9699842  0.96682464 0.97007874 0.97017268]

mean value: 0.9690967324816946

key: test_precision
value: [0.97222222 0.92307692 1.         0.94285714 0.94594595 0.96969697
 0.94444444 0.97222222 0.94285714 0.9375    ]

mean value: 0.9550823013323013

key: train_precision
value: [0.97142857 0.96529968 0.96815287 0.97476341 0.97160883 0.96884735
 0.97460317 0.97142857 0.97160883 0.96865204]

mean value: 0.9706393330442624

key: test_recall
value: [0.97222222 1.         1.         0.94285714 1.         0.91428571
 0.97142857 1.         0.94285714 0.85714286]

mean value: 0.9600793650793651

key: train_recall
value: [0.96529968 0.96529968 0.95899054 0.97169811 0.96855346 0.97798742
 0.96540881 0.96226415 0.96855346 0.97169811]

mean value: 0.9675753427375354

key: test_roc_auc
value: [0.9718254  0.95714286 1.         0.94365079 0.97222222 0.94325397
 0.95714286 0.98571429 0.94285714 0.9       ]

mean value: 0.9573809523809523

key: train_roc_auc
value: [0.9684989  0.96535424 0.963772   0.97323076 0.97008115 0.97322084
 0.97012579 0.96698113 0.97012579 0.97012579]

mean value: 0.9691516377993373

key: test_jcc
value: [0.94594595 0.92307692 1.         0.89189189 0.94594595 0.88888889
 0.91891892 0.97222222 0.89189189 0.81081081]

mean value: 0.918959343959344

key: train_jcc
value: [0.93865031 0.93292683 0.92966361 0.94785276 0.94189602 0.94817073
 0.94171779 0.93577982 0.94189602 0.94207317]

mean value: 0.9400627064609138

MCC on Blind test: 0.28

Accuracy on Blind test: 0.83

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.0334816  0.0385375  0.03184867 0.02891684 0.02883768 0.02635932
 0.02719021 0.02624321 0.02558875 0.02810621]

mean value: 0.029510998725891115

key: score_time
value: [0.01067233 0.01144505 0.01100469 0.01092911 0.01091051 0.01090193
 0.0108726  0.01089025 0.0111692  0.01106668]

mean value: 0.010986232757568359

key: test_mcc
value: [0.88730159 0.88862624 0.91587302 0.88880092 0.91885703 0.8594125
 0.77269114 0.94440028 0.80032673 0.6882472 ]

mean value: 0.8564536645618587

key: train_mcc
value: [0.88367504 0.89923119 0.8772708  0.89606666 0.89291312 0.87122165
 0.87757113 0.874283   0.90573203 0.88057281]

mean value: 0.8858537441772321

key: test_accuracy
value: [0.94366197 0.94366197 0.95774648 0.94366197 0.95774648 0.92957746
 0.88571429 0.97142857 0.9        0.84285714]

mean value: 0.9276056338028169

key: train_accuracy
value: [0.94173228 0.9496063  0.93858268 0.9480315  0.94645669 0.93543307
 0.93867925 0.93710692 0.95283019 0.94025157]

mean value: 0.9428710444213342

key: test_fscore
value: [0.94444444 0.94594595 0.95774648 0.94444444 0.95890411 0.92753623
 0.88235294 0.97058824 0.89855072 0.8358209 ]

mean value: 0.9266334451811831

key: train_fscore
value: [0.94098884 0.94968553 0.93799682 0.94819466 0.94654088 0.93460925
 0.93799682 0.93670886 0.953125   0.93987342]

mean value: 0.9425720082879654

key: test_precision
value: [0.94444444 0.92105263 0.97142857 0.91891892 0.92105263 0.94117647
 0.90909091 1.         0.91176471 0.875     ]

mean value: 0.9313929283511326

key: train_precision
value: [0.9516129  0.94670846 0.94551282 0.94670846 0.94654088 0.94822006
 0.94855305 0.94267516 0.94720497 0.94585987]

mean value: 0.9469596652319989

key: test_recall
value: [0.94444444 0.97222222 0.94444444 0.97142857 1.         0.91428571
 0.85714286 0.94285714 0.88571429 0.8       ]

mean value: 0.9232539682539682

key: train_recall
value: [0.93059937 0.95268139 0.93059937 0.94968553 0.94654088 0.92138365
 0.92767296 0.93081761 0.9591195  0.93396226]

mean value: 0.9383062516120072

key: test_roc_auc
value: [0.94365079 0.94325397 0.95793651 0.94404762 0.95833333 0.92936508
 0.88571429 0.97142857 0.9        0.84285714]

mean value: 0.9276587301587301

key: train_roc_auc
value: [0.94171478 0.94961113 0.93857012 0.94802889 0.94645656 0.93545523
 0.93867925 0.93710692 0.95283019 0.94025157]

mean value: 0.942870464059679

key: test_jcc
value: [0.89473684 0.8974359  0.91891892 0.89473684 0.92105263 0.86486486
 0.78947368 0.94285714 0.81578947 0.71794872]

mean value: 0.8657815015709752

key: train_jcc
value: [0.88855422 0.90419162 0.88323353 0.90149254 0.89850746 0.87724551
 0.88323353 0.88095238 0.91044776 0.88656716]

mean value: 0.8914425714809752

MCC on Blind test: 0.25

Accuracy on Blind test: 0.79

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.77223945 0.88186479 0.73250079 0.70630646 0.87280202 0.74433756
 0.76517749 0.89203405 0.77521062 0.80457163]

mean value: 0.7947044849395752

key: score_time
value: [0.01399922 0.01436377 0.01483202 0.01464963 0.01440787 0.01132131
 0.01460695 0.01472092 0.01459765 0.01145387]

mean value: 0.013895320892333984

key: test_mcc
value: [0.88730159 0.86753285 1.         0.9451949  0.9451949  0.97222222
 0.94440028 0.97182532 0.91465912 0.91465912]

mean value: 0.936299028941898

key: train_mcc
value: [0.96867777 0.9625117  0.9625117  0.96558776 0.96558776 0.96250874
 0.96872591 0.9625688  0.97501633 0.96564279]

mean value: 0.9659339262257346

key: test_accuracy
value: [0.94366197 0.92957746 1.         0.97183099 0.97183099 0.98591549
 0.97142857 0.98571429 0.95714286 0.95714286]

mean value: 0.9674245472837022

key: train_accuracy
value: [0.98425197 0.98110236 0.98110236 0.98267717 0.98267717 0.98110236
 0.98427673 0.98113208 0.98742138 0.9827044 ]

mean value: 0.982844797702174

key: test_fscore
value: [0.94444444 0.93506494 1.         0.97222222 0.97222222 0.98591549
 0.97222222 0.98591549 0.95774648 0.95652174]

mean value: 0.9682275250095214

key: train_fscore
value: [0.984375   0.98130841 0.98130841 0.98289269 0.98289269 0.98136646
 0.98442368 0.98136646 0.98753894 0.98289269]

mean value: 0.9830365430046653

key: test_precision
value: [0.94444444 0.87804878 1.         0.94594595 0.94594595 0.97222222
 0.94594595 0.97222222 0.94444444 0.97058824]

mean value: 0.9519808186953094

key: train_precision
value: [0.9752322  0.96923077 0.96923077 0.97230769 0.97230769 0.96932515
 0.97530864 0.96932515 0.97839506 0.97230769]

mean value: 0.97229708239792

key: test_recall
value: [0.94444444 1.         1.         1.         1.         1.
 1.         1.         0.97142857 0.94285714]

mean value: 0.9858730158730159

key: train_recall
value: [0.99369085 0.99369085 0.99369085 0.99371069 0.99371069 0.99371069
 0.99371069 0.99371069 0.99685535 0.99371069]

mean value: 0.9940192052060394

key: test_roc_auc
value: [0.94365079 0.92857143 1.         0.97222222 0.97222222 0.98611111
 0.97142857 0.98571429 0.95714286 0.95714286]

mean value: 0.9674206349206349

key: train_roc_auc
value: [0.98426681 0.98112216 0.98112216 0.98265976 0.98265976 0.98108248
 0.98427673 0.98113208 0.98742138 0.9827044 ]

mean value: 0.9828447711445747

key: test_jcc
value: [0.89473684 0.87804878 1.         0.94594595 0.94594595 0.97222222
 0.94594595 0.97222222 0.91891892 0.91666667]

mean value: 0.9390653490460936

key: train_jcc
value: [0.96923077 0.96330275 0.96330275 0.96636086 0.96636086 0.96341463
 0.96932515 0.96341463 0.97538462 0.96636086]

mean value: 0.9666457879676796

MCC on Blind test: 0.25

Accuracy on Blind test: 0.81

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01122856 0.01021957 0.00833702 0.0081079  0.00787926 0.00794339
 0.00866914 0.00802183 0.0078454  0.00855756]

mean value: 0.008680963516235351

key: score_time
value: [0.01121497 0.00928688 0.00848627 0.00823808 0.00867367 0.00840926
 0.00836754 0.00825548 0.00810933 0.00873709]

mean value: 0.008777856826782227

key: test_mcc
value: [0.69023056 0.57777778 0.85952381 0.75442414 0.7468254  0.77565853
 0.74316054 0.6882472  0.77142857 0.600982  ]

mean value: 0.7208258525066535

key: train_mcc
value: [0.77704336 0.633035   0.7613864  0.77335915 0.75280338 0.7642249
 0.77704083 0.75544945 0.75849571 0.78672387]

mean value: 0.7539562032070855

key: test_accuracy
value: [0.84507042 0.78873239 0.92957746 0.87323944 0.87323944 0.88732394
 0.87142857 0.84285714 0.88571429 0.8       ]

mean value: 0.8597183098591549

key: train_accuracy
value: [0.88818898 0.80944882 0.88031496 0.88661417 0.87559055 0.88188976
 0.88836478 0.87735849 0.87893082 0.89308176]

mean value: 0.8759783093151092

key: test_fscore
value: [0.84931507 0.78873239 0.92957746 0.88       0.87323944 0.88235294
 0.86956522 0.84931507 0.88571429 0.79411765]

mean value: 0.8601929524101833

key: train_fscore
value: [0.89026275 0.78659612 0.88271605 0.88785047 0.87975647 0.88408037
 0.88992248 0.88       0.88135593 0.89506173]

mean value: 0.875760236872007

key: test_precision
value: [0.83783784 0.8        0.94285714 0.825      0.86111111 0.90909091
 0.88235294 0.81578947 0.88571429 0.81818182]

mean value: 0.8577935519653785

key: train_precision
value: [0.87272727 0.892      0.86404834 0.87962963 0.85250737 0.86930091
 0.87767584 0.86144578 0.86404834 0.87878788]

mean value: 0.8712171368478436

key: test_recall
value: [0.86111111 0.77777778 0.91666667 0.94285714 0.88571429 0.85714286
 0.85714286 0.88571429 0.88571429 0.77142857]

mean value: 0.8641269841269841

key: train_recall
value: [0.90851735 0.70347003 0.9022082  0.89622642 0.90880503 0.89937107
 0.90251572 0.89937107 0.89937107 0.91194969]

mean value: 0.8831805646489297

key: test_roc_auc
value: [0.84484127 0.78888889 0.9297619  0.87420635 0.8734127  0.88690476
 0.87142857 0.84285714 0.88571429 0.8       ]

mean value: 0.8598015873015873

key: train_roc_auc
value: [0.88822094 0.80928219 0.88034938 0.88659901 0.87553816 0.88186219
 0.88836478 0.87735849 0.87893082 0.89308176]

mean value: 0.8759587722953

key: test_jcc
value: [0.73809524 0.65116279 0.86842105 0.78571429 0.775      0.78947368
 0.76923077 0.73809524 0.79487179 0.65853659]

mean value: 0.756860143891296

key: train_jcc
value: [0.80222841 0.64825581 0.79005525 0.79831933 0.78532609 0.79224377
 0.80167598 0.78571429 0.78787879 0.81005587]

mean value: 0.7801753573997666

MCC on Blind test: 0.28

Accuracy on Blind test: 0.76

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00878906 0.00839543 0.00800848 0.00811028 0.00816107 0.00810719
 0.00804377 0.00802827 0.00797844 0.00811362]

mean value: 0.008173561096191407

key: score_time
value: [0.00841141 0.00832272 0.00822234 0.00826097 0.00850153 0.00808287
 0.00810909 0.00806832 0.00813913 0.00800657]

mean value: 0.00821249485015869

key: test_mcc
value: [0.60881948 0.60555556 0.63940384 0.49285714 0.6153057  0.59007669
 0.45883147 0.6614769  0.57735027 0.46188022]

mean value: 0.5711557273256194

key: train_mcc
value: [0.59860964 0.58693799 0.6055625  0.58980737 0.63409349 0.60603034
 0.63214752 0.60053256 0.63553444 0.6201872 ]

mean value: 0.6109443059939785

key: test_accuracy
value: [0.8028169  0.8028169  0.81690141 0.74647887 0.8028169  0.78873239
 0.72857143 0.82857143 0.78571429 0.72857143]

mean value: 0.7831991951710262

key: train_accuracy
value: [0.7984252  0.79212598 0.8015748  0.79370079 0.81574803 0.8
 0.81289308 0.79874214 0.81761006 0.80974843]

mean value: 0.8040568513841431

key: test_fscore
value: [0.81578947 0.80555556 0.83116883 0.74285714 0.81578947 0.80519481
 0.73972603 0.83783784 0.8        0.70769231]

mean value: 0.7901611455072162

key: train_fscore
value: [0.80547112 0.80120482 0.80966767 0.80300752 0.82406015 0.81350954
 0.82525698 0.80838323 0.82043344 0.8141321 ]

mean value: 0.8125126581130029

key: test_precision
value: [0.775      0.80555556 0.7804878  0.74285714 0.75609756 0.73809524
 0.71052632 0.79487179 0.75       0.76666667]

mean value: 0.7620158079689531

key: train_precision
value: [0.7771261  0.76657061 0.77681159 0.76945245 0.78962536 0.7630854
 0.77410468 0.77142857 0.80792683 0.7957958 ]

mean value: 0.7791927388032522

key: test_recall
value: [0.86111111 0.80555556 0.88888889 0.74285714 0.88571429 0.88571429
 0.77142857 0.88571429 0.85714286 0.65714286]

mean value: 0.8241269841269842

key: train_recall
value: [0.83596215 0.83911672 0.84542587 0.83962264 0.86163522 0.87106918
 0.8836478  0.8490566  0.83333333 0.83333333]

mean value: 0.8492202845068746

key: test_roc_auc
value: [0.80198413 0.80277778 0.81587302 0.74642857 0.80396825 0.79007937
 0.72857143 0.82857143 0.78571429 0.72857143]

mean value: 0.7832539682539683

key: train_roc_auc
value: [0.79848422 0.79219987 0.80164375 0.79362836 0.81567565 0.7998879
 0.81289308 0.79874214 0.81761006 0.80974843]

mean value: 0.8040513461500308

key: test_jcc
value: [0.68888889 0.6744186  0.71111111 0.59090909 0.68888889 0.67391304
 0.58695652 0.72093023 0.66666667 0.54761905]

mean value: 0.6550302096510388

key: train_jcc
value: [0.67430025 0.66834171 0.68020305 0.67085427 0.70076726 0.68564356
 0.7025     0.67839196 0.69553806 0.6865285 ]

mean value: 0.6843068622772353

MCC on Blind test: 0.22

Accuracy on Blind test: 0.63

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00771666 0.00820422 0.00826883 0.008219   0.00830936 0.00839472
 0.00791407 0.00843978 0.00832844 0.00831151]

mean value: 0.00821065902709961

key: score_time
value: [0.01118207 0.01534915 0.01180792 0.01198816 0.01184177 0.01210046
 0.01215458 0.01243091 0.01233387 0.01189065]

mean value: 0.012307953834533692

key: test_mcc
value: [0.75346834 0.84273607 0.69643609 0.8365327  0.8031746  0.77991323
 0.74560114 0.6614769  0.57353933 0.62882815]

mean value: 0.7321706561143749

key: train_mcc
value: [0.83709453 0.83130054 0.82734834 0.80898883 0.7990627  0.80645661
 0.83592055 0.80334707 0.82330288 0.81448419]

mean value: 0.8187306253278919

key: test_accuracy
value: [0.87323944 0.91549296 0.84507042 0.91549296 0.90140845 0.88732394
 0.87142857 0.82857143 0.78571429 0.81428571]

mean value: 0.8638028169014085

key: train_accuracy
value: [0.91653543 0.91338583 0.91181102 0.9007874  0.8976378  0.9007874
 0.91666667 0.89937107 0.91037736 0.90566038]

mean value: 0.9073020353587877

key: test_fscore
value: [0.88311688 0.92307692 0.85714286 0.91891892 0.90140845 0.89189189
 0.87671233 0.83783784 0.79452055 0.8115942 ]

mean value: 0.8696220842300416

key: train_fscore
value: [0.92030075 0.91754123 0.91566265 0.90721649 0.90254873 0.90611028
 0.91981846 0.90447761 0.91376702 0.90963855]

mean value: 0.9117081778217269

key: test_precision
value: [0.82926829 0.85714286 0.80487805 0.87179487 0.88888889 0.84615385
 0.84210526 0.79487179 0.76315789 0.82352941]

mean value: 0.8321791169975116

key: train_precision
value: [0.87931034 0.87428571 0.87608069 0.8531856  0.86246418 0.8611898
 0.88629738 0.86079545 0.88046647 0.87283237]

mean value: 0.8706908004288777

key: test_recall
value: [0.94444444 1.         0.91666667 0.97142857 0.91428571 0.94285714
 0.91428571 0.88571429 0.82857143 0.8       ]

mean value: 0.9118253968253969

key: train_recall
value: [0.96529968 0.96529968 0.95899054 0.96855346 0.94654088 0.95597484
 0.95597484 0.95283019 0.94968553 0.94968553]

mean value: 0.9568835188381644

key: test_roc_auc
value: [0.87222222 0.91428571 0.84404762 0.91626984 0.9015873  0.88809524
 0.87142857 0.82857143 0.78571429 0.81428571]

mean value: 0.8636507936507937

key: train_roc_auc
value: [0.91661211 0.91346745 0.91188521 0.90068052 0.89756066 0.90070036
 0.91666667 0.89937107 0.91037736 0.90566038]

mean value: 0.9072981766958317

key: test_jcc
value: [0.79069767 0.85714286 0.75       0.85       0.82051282 0.80487805
 0.7804878  0.72093023 0.65909091 0.68292683]

mean value: 0.771666717665016

key: train_jcc
value: [0.85236769 0.84764543 0.84444444 0.83018868 0.82240437 0.82833787
 0.85154062 0.82561308 0.84122563 0.83425414]

mean value: 0.837802195297192

MCC on Blind test: 0.25

Accuracy on Blind test: 0.74

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.02249098 0.0178566  0.01815391 0.02023125 0.02107072 0.02131462
 0.01788855 0.01724267 0.01735258 0.01709366]

mean value: 0.019069552421569824

key: score_time
value: [0.01088095 0.01002216 0.01010871 0.00967574 0.01091886 0.01002645
 0.01017642 0.00960231 0.00959897 0.00949121]

mean value: 0.010050177574157715

key: test_mcc
value: [0.81050059 0.88862624 0.8594125  0.85952381 0.91885703 0.88880092
 0.80032673 0.8660254  0.80032673 0.74316054]

mean value: 0.8435560494237176

key: train_mcc
value: [0.87720238 0.88357673 0.88033094 0.89298187 0.88350199 0.88668202
 0.89644363 0.87746696 0.88994151 0.89658557]

mean value: 0.8864713594717671

key: test_accuracy
value: [0.90140845 0.94366197 0.92957746 0.92957746 0.95774648 0.94366197
 0.9        0.92857143 0.9        0.87142857]

mean value: 0.9205633802816902

key: train_accuracy
value: [0.93858268 0.94173228 0.94015748 0.94645669 0.94173228 0.94330709
 0.94811321 0.93867925 0.94496855 0.94811321]

mean value: 0.9431842717773485

key: test_fscore
value: [0.90909091 0.94594595 0.93150685 0.92957746 0.95890411 0.94444444
 0.89855072 0.93333333 0.89855072 0.86956522]

mean value: 0.9219469723174141

key: train_fscore
value: [0.93819334 0.94209703 0.93987342 0.946875   0.94209703 0.94375
 0.94867807 0.93915757 0.94488189 0.94883721]

mean value: 0.9434440551736646

key: test_precision
value: [0.85365854 0.92105263 0.91891892 0.91666667 0.92105263 0.91891892
 0.91176471 0.875      0.91176471 0.88235294]

mean value: 0.9031150657188941

key: train_precision
value: [0.94267516 0.93478261 0.94285714 0.94099379 0.9376947  0.9378882
 0.93846154 0.93188854 0.94637224 0.93577982]

mean value: 0.9389393742030523

key: test_recall
value: [0.97222222 0.97222222 0.94444444 0.94285714 1.         0.97142857
 0.88571429 1.         0.88571429 0.85714286]

mean value: 0.9431746031746031

key: train_recall
value: [0.93375394 0.94952681 0.93690852 0.95283019 0.94654088 0.94968553
 0.9591195  0.94654088 0.94339623 0.96226415]

mean value: 0.9480566632938515

key: test_roc_auc
value: [0.90039683 0.94325397 0.92936508 0.9297619  0.95833333 0.94404762
 0.9        0.92857143 0.9        0.87142857]

mean value: 0.920515873015873

key: train_roc_auc
value: [0.93857508 0.94174454 0.94015237 0.94644664 0.9417247  0.94329703
 0.94811321 0.93867925 0.94496855 0.94811321]

mean value: 0.9431814574529294

key: test_jcc
value: [0.83333333 0.8974359  0.87179487 0.86842105 0.92105263 0.89473684
 0.81578947 0.875      0.81578947 0.76923077]

mean value: 0.8562584345479083

key: train_jcc
value: [0.88358209 0.89053254 0.88656716 0.89910979 0.89053254 0.89349112
 0.90236686 0.88529412 0.89552239 0.90265487]

mean value: 0.8929653495902684

MCC on Blind test: 0.32

Accuracy on Blind test: 0.8

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.88505411 2.62316203 1.86890149 2.08626842 2.09213996 2.03358459
 2.04049277 2.06235981 2.0503335  2.0242188 ]

mean value: 2.0766515493392945

key: score_time
value: [0.02021956 0.01382565 0.01210284 0.01527214 0.01395893 0.01186132
 0.01189065 0.01409435 0.01404357 0.01615286]

mean value: 0.014342188835144043

key: test_mcc
value: [0.97220047 0.9186708  0.97220047 0.9451949  0.91885703 0.97222222
 0.94440028 0.97182532 0.8871639  0.94440028]

mean value: 0.9447135665250179

key: train_mcc
value: [0.99685535 0.99685535 0.99685535 1.         0.99685531 0.99372043
 0.99686027 0.99686027 0.99686027 0.99686027]

mean value: 0.996858287722753

key: test_accuracy
value: [0.98591549 0.95774648 0.98591549 0.97183099 0.95774648 0.98591549
 0.97142857 0.98571429 0.94285714 0.97142857]

mean value: 0.9716498993963782

key: train_accuracy
value: [0.9984252  0.9984252  0.9984252  1.         0.9984252  0.99685039
 0.99842767 0.99842767 0.99842767 0.99842767]

mean value: 0.9984261872926261

key: test_fscore
value: [0.98630137 0.96       0.98630137 0.97222222 0.95890411 0.98591549
 0.97222222 0.98591549 0.94444444 0.97222222]

mean value: 0.9724448946341673

key: train_fscore
value: [0.9984252  0.9984252  0.9984252  1.         0.99843014 0.9968652
 0.99843014 0.99843014 0.99843014 0.99843014]

mean value: 0.9984291500749357

key: test_precision
value: [0.97297297 0.92307692 0.97297297 0.94594595 0.92105263 0.97222222
 0.94594595 0.97222222 0.91891892 0.94594595]

mean value: 0.9491276701803018

key: train_precision
value: [0.99685535 0.99685535 0.99685535 1.         0.9968652  0.99375
 0.9968652  0.9968652  0.9968652  0.9968652 ]

mean value: 0.9968642056544627

key: test_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.97142857 1.        ]

mean value: 0.9971428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98571429 0.95714286 0.98571429 0.97222222 0.95833333 0.98611111
 0.97142857 0.98571429 0.94285714 0.97142857]

mean value: 0.9716666666666667

key: train_roc_auc
value: [0.99842767 0.99842767 0.99842767 1.         0.99842271 0.99684543
 0.99842767 0.99842767 0.99842767 0.99842767]

mean value: 0.9984261849493086

key: test_jcc
value: [0.97297297 0.92307692 0.97297297 0.94594595 0.92105263 0.97222222
 0.94594595 0.97222222 0.89473684 0.94594595]

mean value: 0.9467094624989362

key: train_jcc
value: [0.99685535 0.99685535 0.99685535 1.         0.9968652  0.99375
 0.9968652  0.9968652  0.9968652  0.9968652 ]

mean value: 0.9968642056544627

MCC on Blind test: 0.31

Accuracy on Blind test: 0.84

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01675367 0.01345181 0.01251197 0.01144505 0.01196003 0.01191974
 0.01172614 0.01275802 0.01214552 0.01232648]

mean value: 0.01269984245300293

key: score_time
value: [0.01150537 0.00836968 0.00818038 0.00795603 0.00822639 0.00796032
 0.00794578 0.00900435 0.00807214 0.00801206]

mean value: 0.008523249626159668

key: test_mcc
value: [0.94511009 0.97220047 0.97222222 0.89315217 0.9451949  0.91587302
 0.91766294 1.         0.94440028 0.97182532]

mean value: 0.9477641393225145

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 0.98591549 0.98591549 0.94366197 0.97183099 0.95774648
 0.95714286 1.         0.97142857 0.98571429]

mean value: 0.9731187122736419

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.98630137 0.98591549 0.94594595 0.97222222 0.95774648
 0.95890411 1.         0.97222222 0.98591549]

mean value: 0.9738146307604151

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.97297297 1.         0.8974359  0.94594595 0.94444444
 0.92105263 1.         0.94594595 0.97222222]

mean value: 0.9547388481599008

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.97222222 1.         1.         0.97142857
 1.         1.         1.         1.        ]

mean value: 0.9943650793650793

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 0.98571429 0.98611111 0.94444444 0.97222222 0.95793651
 0.95714286 1.         0.97142857 0.98571429]

mean value: 0.9732142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.97297297 0.97222222 0.8974359  0.94594595 0.91891892
 0.92105263 1.         0.94594595 0.97222222]

mean value: 0.9494085178295705

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.86

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10958791 0.1073451  0.10643077 0.10473418 0.10406518 0.10428858
 0.10580897 0.10563993 0.10519648 0.11137867]

mean value: 0.10644757747650146

key: score_time
value: [0.01716733 0.01747584 0.0184536  0.01719213 0.01720023 0.01738763
 0.01717591 0.01825809 0.01838541 0.01861048]

mean value: 0.01773066520690918

key: test_mcc
value: [0.91587302 1.         1.         0.9451949  0.9451949  1.
 0.94440028 1.         0.91465912 0.97182532]

mean value: 0.9637147528126756

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95774648 1.         1.         0.97183099 0.97183099 1.
 0.97142857 1.         0.95714286 0.98571429]

mean value: 0.981569416498994

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95774648 1.         1.         0.97222222 0.97222222 1.
 0.97222222 1.         0.95774648 0.98591549]

mean value: 0.9818075117370892

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.97142857 1.         1.         0.94594595 0.94594595 1.
 0.94594595 1.         0.94444444 0.97222222]

mean value: 0.9725933075933075

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94444444 1.         1.         1.         1.         1.
 1.         1.         0.97142857 1.        ]

mean value: 0.9915873015873016

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95793651 1.         1.         0.97222222 0.97222222 1.
 0.97142857 1.         0.95714286 0.98571429]

mean value: 0.9816666666666667

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91891892 1.         1.         0.94594595 0.94594595 1.
 0.94594595 1.         0.91891892 0.97222222]

mean value: 0.9647897897897898

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.38

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00803304 0.00870395 0.00800323 0.0079124  0.00792861 0.00882459
 0.0079906  0.00846434 0.00908756 0.00920701]

mean value: 0.008415532112121583

key: score_time
value: [0.00797582 0.00854325 0.00794959 0.00855184 0.00831938 0.00841022
 0.00789499 0.00829268 0.00895119 0.00839472]

mean value: 0.00832836627960205

key: test_mcc
value: [0.58237159 0.78542356 0.91587302 0.91587302 0.89315217 0.80588933
 0.8660254  0.91766294 0.94285714 0.97182532]

mean value: 0.8596953472278328

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78873239 0.88732394 0.95774648 0.95774648 0.94366197 0.90140845
 0.92857143 0.95714286 0.97142857 0.98571429]

mean value: 0.9279476861167002

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.80519481 0.8974359  0.95774648 0.95774648 0.94594595 0.90410959
 0.93333333 0.95890411 0.97142857 0.98591549]

mean value: 0.9317760702672916

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75609756 0.83333333 0.97142857 0.94444444 0.8974359  0.86842105
 0.875      0.92105263 0.97142857 0.97222222]

mean value: 0.9010864285479177

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86111111 0.97222222 0.94444444 0.97142857 1.         0.94285714
 1.         1.         0.97142857 1.        ]

mean value: 0.9663492063492063

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78769841 0.88611111 0.95793651 0.95793651 0.94444444 0.90198413
 0.92857143 0.95714286 0.97142857 0.98571429]

mean value: 0.9278968253968254

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.67391304 0.81395349 0.91891892 0.91891892 0.8974359  0.825
 0.875      0.92105263 0.94444444 0.97222222]

mean value: 0.8760859565369703

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.81

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.38699269 1.40455055 1.47870827 1.44593501 1.40105391 1.46513176
 1.42459655 1.40462375 1.40781617 1.41947818]

mean value: 1.4238886833190918

key: score_time
value: [0.10027957 0.09926486 0.10140967 0.0993166  0.09710431 0.09941864
 0.10116935 0.0992384  0.10060048 0.10078096]

mean value: 0.09985828399658203

key: test_mcc
value: [0.94511009 1.         1.         0.9451949  0.91885703 1.
 0.94440028 1.         0.94440028 0.97182532]

mean value: 0.9669787898507702

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         1.         0.97183099 0.95774648 1.
 0.97142857 1.         0.97142857 0.98571429]

mean value: 0.9829979879275654

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 1.         1.         0.97222222 0.95890411 1.
 0.97222222 1.         0.97222222 0.98591549]

mean value: 0.9834459242186427

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 1.         1.         0.94594595 0.92105263 1.
 0.94594595 1.         0.94594595 0.97222222]

mean value: 0.9678481112691639

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 1.         1.         0.97222222 0.95833333 1.
 0.97142857 1.         0.97142857 0.98571429]

mean value: 0.9830555555555556

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 1.         1.         0.94594595 0.92105263 1.
 0.94594595 1.         0.94594595 0.97222222]

mean value: 0.9678481112691639

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.29

Accuracy on Blind test: 0.86

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.93753052 1.00080466 0.96736526 1.01301908 0.96891427 0.94553971
 0.96478963 0.93041205 0.95536923 0.9131546 ]

mean value: 0.9596899032592774

key: score_time
value: [0.14722586 0.28152537 0.28021955 0.2021513  0.26646662 0.22738409
 0.21272445 0.26650643 0.24508262 0.24609566]

mean value: 0.23753819465637208

key: test_mcc
value: [0.94511009 0.97220047 1.         0.9451949  0.91885703 1.
 0.94440028 1.         0.91465912 0.94285714]

mean value: 0.9583279030242763

key: train_mcc
value: [0.96559014 0.96559014 0.96559014 0.96867592 0.96867592 0.96558776
 0.97181825 0.96564279 0.97193362 0.9688601 ]

mean value: 0.9677964760324274

key: test_accuracy
value: [0.97183099 0.98591549 1.         0.97183099 0.95774648 1.
 0.97142857 1.         0.95714286 0.97142857]

mean value: 0.9787323943661972

key: train_accuracy
value: [0.98267717 0.98267717 0.98267717 0.98425197 0.98425197 0.98267717
 0.98584906 0.9827044  0.98584906 0.98427673]

mean value: 0.9837891843708215

key: test_fscore
value: [0.97297297 0.98630137 1.         0.97222222 0.95890411 1.
 0.97222222 1.         0.95774648 0.97142857]

mean value: 0.9791797947171283

key: train_fscore
value: [0.98283931 0.98283931 0.98283931 0.98442368 0.98442368 0.98289269
 0.98595944 0.98289269 0.98600311 0.98447205]

mean value: 0.9839585272255872

key: test_precision
value: [0.94736842 0.97297297 1.         0.94594595 0.92105263 1.
 0.94594595 1.         0.94444444 0.97142857]

mean value: 0.9649158933369459

key: train_precision
value: [0.97222222 0.97222222 0.97222222 0.97530864 0.97530864 0.97230769
 0.97832817 0.97230769 0.97538462 0.97239264]

mean value: 0.9738004762028707

key: test_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.97142857 0.97142857]

mean value: 0.9942857142857143

key: train_recall
value: [0.99369085 0.99369085 0.99369085 0.99371069 0.99371069 0.99371069
 0.99371069 0.99371069 0.99685535 0.99685535]

mean value: 0.9943336706148443

key: test_roc_auc
value: [0.97142857 0.98571429 1.         0.97222222 0.95833333 1.
 0.97142857 1.         0.95714286 0.97142857]

mean value: 0.9787698412698412

key: train_roc_auc
value: [0.98269448 0.98269448 0.98269448 0.98423705 0.98423705 0.98265976
 0.98584906 0.9827044  0.98584906 0.98427673]

mean value: 0.9837896553776562

key: test_jcc
value: [0.94736842 0.97297297 1.         0.94594595 0.92105263 1.
 0.94594595 1.         0.91891892 0.94444444]

mean value: 0.9596649280859807

key: train_jcc
value: [0.96625767 0.96625767 0.96625767 0.96932515 0.96932515 0.96636086
 0.97230769 0.96636086 0.97239264 0.96941896]

mean value: 0.9684264316010812

MCC on Blind test: 0.31

Accuracy on Blind test: 0.86

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00934434 0.0084703  0.0083704  0.00860596 0.00874734 0.00881553
 0.00802827 0.00889921 0.00797343 0.00873542]

mean value: 0.008599019050598145

key: score_time
value: [0.00872207 0.00881052 0.00853944 0.00848937 0.00812554 0.00880098
 0.00822377 0.00858617 0.00838947 0.00800633]

mean value: 0.008469367027282714

key: test_mcc
value: [0.60881948 0.60555556 0.63940384 0.49285714 0.6153057  0.59007669
 0.45883147 0.6614769  0.57735027 0.46188022]

mean value: 0.5711557273256194

key: train_mcc
value: [0.59860964 0.58693799 0.6055625  0.58980737 0.63409349 0.60603034
 0.63214752 0.60053256 0.63553444 0.6201872 ]

mean value: 0.6109443059939785

key: test_accuracy
value: [0.8028169  0.8028169  0.81690141 0.74647887 0.8028169  0.78873239
 0.72857143 0.82857143 0.78571429 0.72857143]

mean value: 0.7831991951710262

key: train_accuracy
value: [0.7984252  0.79212598 0.8015748  0.79370079 0.81574803 0.8
 0.81289308 0.79874214 0.81761006 0.80974843]

mean value: 0.8040568513841431

key: test_fscore
value: [0.81578947 0.80555556 0.83116883 0.74285714 0.81578947 0.80519481
 0.73972603 0.83783784 0.8        0.70769231]

mean value: 0.7901611455072162

key: train_fscore
value: [0.80547112 0.80120482 0.80966767 0.80300752 0.82406015 0.81350954
 0.82525698 0.80838323 0.82043344 0.8141321 ]

mean value: 0.8125126581130029

key: test_precision
value: [0.775      0.80555556 0.7804878  0.74285714 0.75609756 0.73809524
 0.71052632 0.79487179 0.75       0.76666667]

mean value: 0.7620158079689531

key: train_precision
value: [0.7771261  0.76657061 0.77681159 0.76945245 0.78962536 0.7630854
 0.77410468 0.77142857 0.80792683 0.7957958 ]

mean value: 0.7791927388032522

key: test_recall
value: [0.86111111 0.80555556 0.88888889 0.74285714 0.88571429 0.88571429
 0.77142857 0.88571429 0.85714286 0.65714286]

mean value: 0.8241269841269842

key: train_recall
value: [0.83596215 0.83911672 0.84542587 0.83962264 0.86163522 0.87106918
 0.8836478  0.8490566  0.83333333 0.83333333]

mean value: 0.8492202845068746

key: test_roc_auc
value: [0.80198413 0.80277778 0.81587302 0.74642857 0.80396825 0.79007937
 0.72857143 0.82857143 0.78571429 0.72857143]

mean value: 0.7832539682539683

key: train_roc_auc
value: [0.79848422 0.79219987 0.80164375 0.79362836 0.81567565 0.7998879
 0.81289308 0.79874214 0.81761006 0.80974843]

mean value: 0.8040513461500308

key: test_jcc
value: [0.68888889 0.6744186  0.71111111 0.59090909 0.68888889 0.67391304
 0.58695652 0.72093023 0.66666667 0.54761905]

mean value: 0.6550302096510388

key: train_jcc
value: [0.67430025 0.66834171 0.68020305 0.67085427 0.70076726 0.68564356
 0.7025     0.67839196 0.69553806 0.6865285 ]

mean value: 0.6843068622772353

MCC on Blind test: 0.22

Accuracy on Blind test: 0.63

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07607102 0.06376791 0.0508194  0.05137157 0.05335784 0.0584743
 0.05639553 0.05657625 0.0571003  0.05305147]

mean value: 0.05769855976104736

key: score_time
value: [0.01018667 0.00977993 0.00984669 0.00992823 0.00973797 0.00999999
 0.00965548 0.00966048 0.00993204 0.00974274]

mean value: 0.009847021102905274

key: test_mcc
value: [0.94511009 1.         0.97222222 0.9451949  0.9451949  1.
 0.94440028 1.         0.94440028 0.97182532]

mean value: 0.966834798557657

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         0.98591549 0.97183099 0.97183099 1.
 0.97142857 1.         0.97142857 0.98571429]

mean value: 0.9829979879275654

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 1.         0.98591549 0.97222222 0.97222222 1.
 0.97222222 1.         0.97222222 0.98591549]

mean value: 0.9833692847777354

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 1.         1.         0.94594595 0.94594595 1.
 0.94594595 1.         0.94594595 0.97222222]

mean value: 0.9703374427058638

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.97222222 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9972222222222222

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 1.         0.98611111 0.97222222 0.97222222 1.
 0.97142857 1.         0.97142857 0.98571429]

mean value: 0.9830555555555556

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 1.         0.97222222 0.94594595 0.94594595 1.
 0.94594595 1.         0.94594595 0.97222222]

mean value: 0.967559664928086

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.85

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01657271 0.04419208 0.04425573 0.05008912 0.05284476 0.044945
 0.06070256 0.01946783 0.03588033 0.0560317 ]

mean value: 0.04249818325042724

key: score_time
value: [0.0105021  0.01969695 0.01938534 0.01104617 0.0147028  0.0137198
 0.01671791 0.01098347 0.01099586 0.01136303]

mean value: 0.01391134262084961

key: test_mcc
value: [0.91580648 0.83214239 0.97222222 0.91587302 0.9451949  0.94511009
 0.82992752 0.97182532 0.82992752 0.85749293]

mean value: 0.9015522377450673

key: train_mcc
value: [0.92448113 0.91225907 0.91812744 0.95928679 0.96867592 0.92760136
 0.94029342 0.91825715 0.94341489 0.93083602]

mean value: 0.9343233182291167

key: test_accuracy
value: [0.95774648 0.91549296 0.98591549 0.95774648 0.97183099 0.97183099
 0.91428571 0.98571429 0.91428571 0.92857143]

mean value: 0.9503420523138832

key: train_accuracy
value: [0.96220472 0.95590551 0.95905512 0.97952756 0.98425197 0.96377953
 0.97012579 0.9591195  0.97169811 0.96540881]

mean value: 0.967107661070668

key: test_fscore
value: [0.95890411 0.91891892 0.98591549 0.95774648 0.97222222 0.97058824
 0.91176471 0.98550725 0.91176471 0.92753623]

mean value: 0.9500868347880861

key: train_fscore
value: [0.96190476 0.95512821 0.95886076 0.97978227 0.98442368 0.96366509
 0.9699842  0.95899054 0.97160883 0.96529968]

mean value: 0.9669648015872917

key: test_precision
value: [0.94594595 0.89473684 1.         0.94444444 0.94594595 1.
 0.93939394 1.         0.93939394 0.94117647]

mean value: 0.9551037527817714

key: train_precision
value: [0.96805112 0.97068404 0.96190476 0.96923077 0.97530864 0.96825397
 0.97460317 0.96202532 0.97468354 0.96835443]

mean value: 0.9693099764406033

key: test_recall
value: [0.97222222 0.94444444 0.97222222 0.97142857 1.         0.94285714
 0.88571429 0.97142857 0.88571429 0.91428571]

mean value: 0.946031746031746

key: train_recall
value: [0.95583596 0.94006309 0.95583596 0.99056604 0.99371069 0.9591195
 0.96540881 0.95597484 0.96855346 0.96226415]

mean value: 0.96473325000496

key: test_roc_auc
value: [0.95753968 0.91507937 0.98611111 0.95793651 0.97222222 0.97142857
 0.91428571 0.98571429 0.91428571 0.92857143]

mean value: 0.9503174603174603

key: train_roc_auc
value: [0.96219471 0.9558806  0.95905006 0.97951015 0.98423705 0.96378688
 0.97012579 0.9591195  0.97169811 0.96540881]

mean value: 0.9671011646132175

key: test_jcc
value: [0.92105263 0.85       0.97222222 0.91891892 0.94594595 0.94285714
 0.83783784 0.97142857 0.83783784 0.86486486]

mean value: 0.906296597349229

key: train_jcc
value: [0.9266055  0.91411043 0.92097264 0.96036585 0.96932515 0.92987805
 0.94171779 0.92121212 0.94478528 0.93292683]

mean value: 0.9361899652190242

MCC on Blind test: 0.26

Accuracy on Blind test: 0.8

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02060699 0.00799441 0.00798726 0.00798249 0.00766635 0.00779152
 0.00775051 0.00782704 0.00777411 0.008286  ]

mean value: 0.009166669845581055

key: score_time
value: [0.0106101  0.00825071 0.0082407  0.00789952 0.00786734 0.00836182
 0.00791502 0.00782728 0.00794268 0.00842905]

mean value: 0.00833442211151123

key: test_mcc
value: [0.67079854 0.75346834 0.71917468 0.70470171 0.77239298 0.67233796
 0.57735027 0.69282032 0.71545476 0.65821838]

mean value: 0.693671793586133

key: train_mcc
value: [0.70041161 0.67932093 0.69527344 0.68590643 0.68703227 0.69452345
 0.70268583 0.66332496 0.69063815 0.70408235]

mean value: 0.6903199416989273

key: test_accuracy
value: [0.83098592 0.87323944 0.85915493 0.84507042 0.87323944 0.83098592
 0.78571429 0.84285714 0.85714286 0.82857143]

mean value: 0.8426961770623742

key: train_accuracy
value: [0.84724409 0.83779528 0.84409449 0.83937008 0.84094488 0.84409449
 0.84748428 0.82861635 0.84119497 0.84748428]

mean value: 0.8418323181300451

key: test_fscore
value: [0.84615385 0.88311688 0.86486486 0.85714286 0.88607595 0.84210526
 0.8        0.85333333 0.86111111 0.82352941]

mean value: 0.8517433520012585

key: train_fscore
value: [0.8562963  0.84557721 0.85419735 0.85043988 0.85037037 0.85419735
 0.8579795  0.83946981 0.85255474 0.8588064 ]

mean value: 0.8519888918765984

key: test_precision
value: [0.78571429 0.82926829 0.84210526 0.78571429 0.79545455 0.7804878
 0.75       0.8        0.83783784 0.84848485]

mean value: 0.8055067163924674

key: train_precision
value: [0.80726257 0.80571429 0.80110497 0.7967033  0.80392157 0.8033241
 0.80273973 0.78947368 0.79564033 0.79945799]

mean value: 0.8005342524769464

key: test_recall
value: [0.91666667 0.94444444 0.88888889 0.94285714 1.         0.91428571
 0.85714286 0.91428571 0.88571429 0.8       ]

mean value: 0.9064285714285714

key: train_recall
value: [0.91167192 0.88958991 0.9148265  0.91194969 0.90251572 0.91194969
 0.92138365 0.89622642 0.91823899 0.92767296]

mean value: 0.9106025434993948

key: test_roc_auc
value: [0.8297619  0.87222222 0.85873016 0.84642857 0.875      0.83214286
 0.78571429 0.84285714 0.85714286 0.82857143]

mean value: 0.8428571428571429

key: train_roc_auc
value: [0.8473454  0.83787671 0.8442057  0.8392556  0.84084777 0.84398746
 0.84748428 0.82861635 0.84119497 0.84748428]

mean value: 0.8418298513977343

key: test_jcc
value: [0.73333333 0.79069767 0.76190476 0.75       0.79545455 0.72727273
 0.66666667 0.74418605 0.75609756 0.7       ]

mean value: 0.7425613316537877

key: train_jcc
value: [0.74870466 0.73246753 0.74550129 0.73979592 0.73969072 0.74550129
 0.75128205 0.72335025 0.74300254 0.75255102]

mean value: 0.742184727641747

MCC on Blind test: 0.27

Accuracy on Blind test: 0.73

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01053691 0.01432872 0.014678   0.01592255 0.013062   0.01458049
 0.01466274 0.01540923 0.01347852 0.01427984]

mean value: 0.014093899726867675

key: score_time
value: [0.00840187 0.01055169 0.01028132 0.01054978 0.01052547 0.0104835
 0.01046133 0.01095486 0.01046395 0.01049304]

mean value: 0.010316681861877442

key: test_mcc
value: [0.85952381 0.9186708  0.89315217 0.91587302 0.9451949  0.94511009
 0.80295507 0.97182532 0.8871639  0.78301997]

mean value: 0.8922489037444675

key: train_mcc
value: [0.9401617  0.95944236 0.89984937 0.89436086 0.96558776 0.93386306
 0.86603117 0.9213882  0.97501633 0.87487332]

mean value: 0.9230574140751575

key: test_accuracy
value: [0.92957746 0.95774648 0.94366197 0.95774648 0.97183099 0.97183099
 0.9        0.98571429 0.94285714 0.88571429]

mean value: 0.9446680080482898

key: train_accuracy
value: [0.97007874 0.97952756 0.9496063  0.94645669 0.98267717 0.96692913
 0.93081761 0.96069182 0.98742138 0.93396226]

mean value: 0.9608168672312187

key: test_fscore
value: [0.92957746 0.96       0.94117647 0.95774648 0.97222222 0.97058824
 0.89552239 0.98550725 0.94444444 0.89473684]

mean value: 0.9451521792752767

key: train_fscore
value: [0.9699842  0.97978227 0.94855305 0.94498382 0.98289269 0.96692913
 0.92715232 0.96075353 0.98753894 0.93786982]

mean value: 0.960643978398039

key: test_precision
value: [0.94285714 0.92307692 1.         0.94444444 0.94594595 1.
 0.9375     1.         0.91891892 0.82926829]

mean value: 0.9442011667926302

key: train_precision
value: [0.97151899 0.96625767 0.96721311 0.97333333 0.97230769 0.96845426
 0.97902098 0.95924765 0.97839506 0.88547486]

mean value: 0.9621223605111022

key: test_recall
value: [0.91666667 1.         0.88888889 0.97142857 1.         0.94285714
 0.85714286 0.97142857 0.97142857 0.97142857]

mean value: 0.9491269841269842

key: train_recall
value: [0.96845426 0.99369085 0.93059937 0.91823899 0.99371069 0.96540881
 0.88050314 0.96226415 0.99685535 0.99685535]

mean value: 0.960658095748269

key: test_roc_auc
value: [0.9297619  0.95714286 0.94444444 0.95793651 0.97222222 0.97142857
 0.9        0.98571429 0.94285714 0.88571429]

mean value: 0.9447222222222222

key: train_roc_auc
value: [0.97007619 0.97954983 0.94957641 0.9465012  0.98265976 0.96693153
 0.93081761 0.96069182 0.98742138 0.93396226]

mean value: 0.9608188004682261

key: test_jcc
value: [0.86842105 0.92307692 0.88888889 0.91891892 0.94594595 0.94285714
 0.81081081 0.97142857 0.89473684 0.80952381]

mean value: 0.8974608906187853

key: train_jcc
value: [0.94171779 0.96036585 0.90214067 0.89570552 0.96636086 0.93597561
 0.86419753 0.9244713  0.97538462 0.88300836]

mean value: 0.9249328107238487

MCC on Blind test: 0.21

Accuracy on Blind test: 0.74

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01662755 0.01433229 0.01325893 0.01569057 0.01217318 0.01252699
 0.01344371 0.01365376 0.01214552 0.01519012]

mean value: 0.013904261589050292

key: score_time
value: [0.01091886 0.01051593 0.01051092 0.01051617 0.01046252 0.01055741
 0.01050019 0.01043868 0.01046753 0.01048732]

mean value: 0.010537552833557128

key: test_mcc
value: [0.88730159 0.83095238 0.97220047 0.9451949  0.86802778 0.89315217
 0.91766294 0.94440028 0.8871639  0.94285714]

mean value: 0.9088913538408987

key: train_mcc
value: [0.95279762 0.92778189 0.95028807 0.9533256  0.89981019 0.862672
 0.89131675 0.89152985 0.9179354  0.95321203]

mean value: 0.9200669401462821

key: test_accuracy
value: [0.94366197 0.91549296 0.98591549 0.97183099 0.92957746 0.94366197
 0.95714286 0.97142857 0.94285714 0.97142857]

mean value: 0.9532997987927565

key: train_accuracy
value: [0.97637795 0.96377953 0.97480315 0.97637795 0.9480315  0.92755906
 0.94339623 0.94496855 0.95754717 0.97641509]

mean value: 0.9589256177883425

key: test_fscore
value: [0.94444444 0.91666667 0.98630137 0.97222222 0.93333333 0.94594595
 0.95890411 0.97058824 0.94444444 0.97142857]

mean value: 0.9544279343231801

key: train_fscore
value: [0.97622821 0.96331738 0.9752322  0.97681607 0.95037594 0.93215339
 0.94610778 0.94327391 0.9591528  0.97674419]

mean value: 0.959940187333688

key: test_precision
value: [0.94444444 0.91666667 0.97297297 0.94594595 0.875      0.8974359
 0.92105263 1.         0.91891892 0.97142857]

mean value: 0.9363866049392365

key: train_precision
value: [0.98089172 0.97419355 0.95744681 0.96048632 0.91066282 0.87777778
 0.90285714 0.97324415 0.92419825 0.96330275]

mean value: 0.9425061293853453

key: test_recall
value: [0.94444444 0.91666667 1.         1.         1.         1.
 1.         0.94285714 0.97142857 0.97142857]

mean value: 0.9746825396825397

key: train_recall
value: [0.97160883 0.95268139 0.99369085 0.99371069 0.99371069 0.99371069
 0.99371069 0.91509434 0.99685535 0.99056604]

mean value: 0.9795339563121243

key: test_roc_auc
value: [0.94365079 0.91547619 0.98571429 0.97222222 0.93055556 0.94444444
 0.95714286 0.97142857 0.94285714 0.97142857]

mean value: 0.9534920634920635

key: train_roc_auc
value: [0.97637045 0.96376208 0.97483285 0.97635061 0.94795945 0.92745471
 0.94339623 0.94496855 0.95754717 0.97641509]

mean value: 0.9589057198976251

key: test_jcc
value: [0.89473684 0.84615385 0.97297297 0.94594595 0.875      0.8974359
 0.92105263 0.94285714 0.89473684 0.94444444]

mean value: 0.9135336565599723

key: train_jcc
value: [0.95356037 0.92923077 0.95166163 0.95468278 0.90544413 0.87292818
 0.89772727 0.89263804 0.92151163 0.95454545]

mean value: 0.9233930246483528

MCC on Blind test: 0.13

Accuracy on Blind test: 0.49

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11352706 0.09823751 0.09783912 0.09749627 0.09775257 0.09784293
 0.0982132  0.0982163  0.09822345 0.09824324]

mean value: 0.09955916404724122

key: score_time
value: [0.01440597 0.0141964  0.014431   0.01412749 0.01414037 0.01425576
 0.01428652 0.01439333 0.01421928 0.0141108 ]

mean value: 0.014256691932678223

key: test_mcc
value: [0.94511009 0.97220047 1.         0.9451949  0.9451949  1.
 0.91766294 0.97182532 0.94440028 1.        ]

mean value: 0.9641588882762016

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 0.98591549 1.         0.97183099 0.97183099 1.
 0.95714286 0.98571429 0.97142857 1.        ]

mean value: 0.981569416498994

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.98630137 1.         0.97222222 0.97222222 1.
 0.95890411 0.98591549 0.97222222 1.        ]

mean value: 0.9820760612049441

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.97297297 1.         0.94594595 0.94594595 1.
 0.92105263 0.97222222 0.94594595 1.        ]

mean value: 0.9651454085664612

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97142857 0.98571429 1.         0.97222222 0.97222222 1.
 0.95714286 0.98571429 0.97142857 1.        ]

mean value: 0.9815873015873016

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.97297297 1.         0.94594595 0.94594595 1.
 0.92105263 0.97222222 0.94594595 1.        ]

mean value: 0.9651454085664612

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.84

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03574896 0.04156756 0.04362035 0.04388809 0.04569268 0.04029918
 0.04188228 0.0483954  0.04589486 0.04156971]

mean value: 0.04285590648651123

key: score_time
value: [0.02013993 0.02285957 0.02204061 0.03455472 0.01919889 0.02451229
 0.01914024 0.02869654 0.02967858 0.04260755]

mean value: 0.026342892646789552

key: test_mcc
value: [0.97220047 0.97220047 1.         0.89315217 0.9451949  1.
 0.91766294 1.         0.91766294 0.97182532]

mean value: 0.9589899184279953

key: train_mcc
value: [0.99685535 0.99372055 0.99685535 0.99372043 0.99685531 0.99059524
 1.         0.99373035 1.         0.98749951]

mean value: 0.9949832075818474

key: test_accuracy
value: [0.98591549 0.98591549 1.         0.94366197 0.97183099 1.
 0.95714286 1.         0.95714286 0.98571429]

mean value: 0.9787323943661972

key: train_accuracy
value: [0.9984252  0.99685039 0.9984252  0.99685039 0.9984252  0.99527559
 1.         0.99685535 1.         0.99371069]

mean value: 0.9974818006239786

key: test_fscore
value: [0.98630137 0.98630137 1.         0.94594595 0.97222222 1.
 0.95890411 1.         0.95890411 0.98591549]

mean value: 0.9794494620030024

key: train_fscore
value: [0.9984252  0.99685535 0.9984252  0.9968652  0.99843014 0.99530516
 1.         0.9968652  1.         0.99375   ]

mean value: 0.9974921452742781

key: test_precision
value: [0.97297297 0.97297297 1.         0.8974359  0.94594595 1.
 0.92105263 1.         0.92105263 0.97222222]

mean value: 0.9603655274707906

key: train_precision
value: [0.99685535 0.99373041 0.99685535 0.99375    0.9968652  0.99065421
 1.         0.99375    1.         0.98757764]

mean value: 0.9950038148468195

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98571429 0.98571429 1.         0.94444444 0.97222222 1.
 0.95714286 1.         0.95714286 0.98571429]

mean value: 0.9788095238095238

key: train_roc_auc
value: [0.99842767 0.99685535 0.99842767 0.99684543 0.99842271 0.99526814
 1.         0.99685535 1.         0.99371069]

mean value: 0.9974813007162272

key: test_jcc
value: [0.97297297 0.97297297 1.         0.8974359  0.94594595 1.
 0.92105263 1.         0.92105263 0.97222222]

mean value: 0.9603655274707906

key: train_jcc
value: [0.99685535 0.99373041 0.99685535 0.99375    0.9968652  0.99065421
 1.         0.99375    1.         0.98757764]

mean value: 0.9950038148468195

MCC on Blind test: 0.25

Accuracy on Blind test: 0.86

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.24773479 0.26326966 0.28095245 0.28693724 0.25053453 0.24942899
 0.14829826 0.25048161 0.20905161 0.1913681 ]

mean value: 0.23780572414398193

key: score_time
value: [0.02167845 0.02865052 0.02786064 0.02753568 0.02866054 0.02848601
 0.01395488 0.03657627 0.01424241 0.01387572]

mean value: 0.024152112007141114

key: test_mcc
value: [0.81050059 0.88862624 0.86205133 0.85952381 0.89315217 0.80588933
 0.77142857 0.81649658 0.74316054 0.65821838]

mean value: 0.8109047540709404

key: train_mcc
value: [0.88980159 0.91197105 0.89606666 0.90870311 0.89610428 0.91188492
 0.91509886 0.90902529 0.91202184 0.91509886]

mean value: 0.9065776480751302

key: test_accuracy
value: [0.90140845 0.94366197 0.92957746 0.92957746 0.94366197 0.90140845
 0.88571429 0.9        0.87142857 0.82857143]

mean value: 0.9035010060362173

key: train_accuracy
value: [0.94488189 0.95590551 0.9480315  0.95433071 0.9480315  0.95590551
 0.95754717 0.95440252 0.95597484 0.95754717]

mean value: 0.9532558312286435

key: test_fscore
value: [0.90909091 0.94594595 0.93333333 0.92957746 0.94594595 0.90410959
 0.88571429 0.90909091 0.87323944 0.83333333]

mean value: 0.9069381152904209

key: train_fscore
value: [0.94453249 0.95541401 0.9478673  0.95418641 0.9478673  0.9556962
 0.95748031 0.95389507 0.9556962  0.95761381]

mean value: 0.9530249118234133

key: test_precision
value: [0.85365854 0.92105263 0.8974359  0.91666667 0.8974359  0.86842105
 0.88571429 0.83333333 0.86111111 0.81081081]

mean value: 0.8745640223303894

key: train_precision
value: [0.94904459 0.96463023 0.94936709 0.95873016 0.95238095 0.96178344
 0.95899054 0.96463023 0.96178344 0.95611285]

mean value: 0.957745350378981

key: test_recall
value: [0.97222222 0.97222222 0.97222222 0.94285714 1.         0.94285714
 0.88571429 1.         0.88571429 0.85714286]

mean value: 0.9430952380952381

key: train_recall
value: [0.94006309 0.94637224 0.94637224 0.94968553 0.94339623 0.94968553
 0.95597484 0.94339623 0.94968553 0.9591195 ]

mean value: 0.9483750967204333

key: test_roc_auc
value: [0.90039683 0.94325397 0.92896825 0.9297619  0.94444444 0.90198413
 0.88571429 0.9        0.87142857 0.82857143]

mean value: 0.9034523809523809

key: train_roc_auc
value: [0.94487431 0.95589052 0.94802889 0.95433804 0.94803881 0.95591532
 0.95754717 0.95440252 0.95597484 0.95754717]

mean value: 0.9532557585857985

key: test_jcc
value: [0.83333333 0.8974359  0.875      0.86842105 0.8974359  0.825
 0.79487179 0.83333333 0.775      0.71428571]

mean value: 0.831411702332755

key: train_jcc
value: [0.89489489 0.91463415 0.9009009  0.91238671 0.9009009  0.91515152
 0.918429   0.9118541  0.91515152 0.9186747 ]

mean value: 0.9102978385449625

MCC on Blind test: 0.29

Accuracy on Blind test: 0.8

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.27617764 0.26752806 0.26734591 0.26702046 0.26937532 0.26938963
 0.2754612  0.26950097 0.27531576 0.26875448]

mean value: 0.2705869436264038

key: score_time
value: [0.00917101 0.00919962 0.00898099 0.00887442 0.00896692 0.00959826
 0.0095427  0.00884581 0.0090723  0.00909805]

mean value: 0.009135007858276367

key: test_mcc
value: [0.94511009 0.97220047 1.         0.91885703 0.9451949  1.
 0.91766294 1.         0.94440028 0.94285714]

mean value: 0.9586282844964963

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99686027]

mean value: 0.9996860274824667

key: test_accuracy
value: [0.97183099 0.98591549 1.         0.95774648 0.97183099 1.
 0.95714286 1.         0.97142857 0.97142857]

mean value: 0.9787323943661972

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99842767]

mean value: 0.9998427672955975

key: test_fscore
value: [0.97297297 0.98630137 1.         0.95890411 0.97222222 1.
 0.95890411 1.         0.97222222 0.97142857]

mean value: 0.9792955577887085

key: train_fscore
value: [1.        1.        1.        1.        1.        1.        1.
 1.        1.        0.9984252]

mean value: 0.9998425196850393

key: test_precision
value: [0.94736842 0.97297297 1.         0.92105263 0.94594595 1.
 0.92105263 1.         0.94594595 0.97142857]

mean value: 0.9625767120503963

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97142857]

mean value: 0.9971428571428571

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99685535]

mean value: 0.999685534591195

key: test_roc_auc
value: [0.97142857 0.98571429 1.         0.95833333 0.97222222 1.
 0.95714286 1.         0.97142857 0.97142857]

mean value: 0.9787698412698412

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99842767]

mean value: 0.9998427672955975

key: test_jcc
value: [0.94736842 0.97297297 1.         0.92105263 0.94594595 1.
 0.92105263 1.         0.94594595 0.94444444]

mean value: 0.9598782993519835

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99685535]

mean value: 0.999685534591195

MCC on Blind test: 0.25

Accuracy on Blind test: 0.86

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01353073 0.01510763 0.01533842 0.01545548 0.01502848 0.01551914
 0.01628375 0.01565933 0.01530766 0.01601934]

mean value: 0.015324997901916503

key: score_time
value: [0.01147842 0.01132965 0.01142693 0.01131916 0.01343727 0.01433969
 0.01386809 0.01397896 0.01388669 0.01877761]

mean value: 0.013384246826171875

key: test_mcc
value: [0.48866289 0.69047619 0.8365327  0.67839806 0.78542356 0.65726707
 0.61631563 0.74535599 0.37904902 0.6882472 ]

mean value: 0.656572832612207

key: train_mcc
value: [0.52013388 0.76530718 0.82524897 0.7158657  0.84342302 0.69536527
 0.80154559 0.82054203 0.63315169 0.8819171 ]

mean value: 0.750250042260939

key: test_accuracy
value: [0.69014085 0.84507042 0.91549296 0.83098592 0.88732394 0.8028169
 0.8        0.85714286 0.64285714 0.84285714]

mean value: 0.8114688128772636

key: train_accuracy
value: [0.71496063 0.87244094 0.90866142 0.84251969 0.91968504 0.82992126
 0.89308176 0.90408805 0.78616352 0.94025157]

mean value: 0.861177388203833

key: test_fscore
value: [0.56       0.84507042 0.91176471 0.80645161 0.875      0.75
 0.77419355 0.83333333 0.46808511 0.8358209 ]

mean value: 0.7659719624946587

key: train_fscore
value: [0.6021978  0.85561497 0.90169492 0.81617647 0.91570248 0.79850746
 0.8815331  0.89500861 0.728      0.93851133]

mean value: 0.8332947137085833

key: test_precision
value: [1.         0.85714286 0.96875    0.92592593 0.96551724 1.
 0.88888889 1.         0.91666667 0.875     ]

mean value: 0.9397891580003649

key: train_precision
value: [0.99275362 0.98360656 0.97435897 0.98230088 0.96515679 0.98165138
 0.98828125 0.98859316 1.         0.96666667]

mean value: 0.982336928301226

key: test_recall
value: [0.38888889 0.83333333 0.86111111 0.71428571 0.8        0.6
 0.68571429 0.71428571 0.31428571 0.8       ]

mean value: 0.6711904761904762

key: train_recall
value: [0.43217666 0.75709779 0.83911672 0.69811321 0.87106918 0.67295597
 0.79559748 0.81761006 0.57232704 0.91194969]

mean value: 0.7368013808701863

key: test_roc_auc
value: [0.69444444 0.8452381  0.91626984 0.82936508 0.88611111 0.8
 0.8        0.85714286 0.64285714 0.84285714]

mean value: 0.8114285714285715

key: train_roc_auc
value: [0.714516   0.87225959 0.90855207 0.84274746 0.91976172 0.83016884
 0.89308176 0.90408805 0.78616352 0.94025157]

mean value: 0.8611590579925799

key: test_jcc
value: [0.38888889 0.73170732 0.83783784 0.67567568 0.77777778 0.6
 0.63157895 0.71428571 0.30555556 0.71794872]

mean value: 0.6381256432411759

key: train_jcc
value: [0.43081761 0.74766355 0.82098765 0.68944099 0.8445122  0.66459627
 0.78816199 0.80996885 0.57232704 0.88414634]

mean value: 0.7252622504598514

MCC on Blind test: 0.34

Accuracy on Blind test: 0.93

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01351237 0.02861714 0.03247285 0.0325036  0.0317297  0.03187609
 0.03195882 0.03194857 0.03198266 0.03197289]

mean value: 0.029857468605041505

key: score_time
value: [0.01157928 0.0221467  0.01110148 0.01955748 0.02112865 0.0110445
 0.01989079 0.02152538 0.01919198 0.02169585]

mean value: 0.01788620948791504

key: test_mcc
value: [0.91580648 0.91580648 0.94365079 0.91587302 0.9451949  0.9186708
 0.82992752 0.97182532 0.82992752 0.8871639 ]

mean value: 0.9073846735039883

key: train_mcc
value: [0.92767212 0.94649961 0.92448113 0.95276028 0.94960617 0.92137585
 0.92778765 0.91531613 0.94341489 0.93083602]

mean value: 0.9339749853030713

key: test_accuracy
value: [0.95774648 0.95774648 0.97183099 0.95774648 0.97183099 0.95774648
 0.91428571 0.98571429 0.91428571 0.94285714]

mean value: 0.95317907444668

key: train_accuracy
value: [0.96377953 0.97322835 0.96220472 0.97637795 0.97480315 0.96062992
 0.96383648 0.95754717 0.97169811 0.96540881]

mean value: 0.966951418808498

key: test_fscore
value: [0.95890411 0.95890411 0.97222222 0.95774648 0.97222222 0.95522388
 0.91176471 0.98550725 0.91176471 0.94117647]

mean value: 0.9525436151822534

key: train_fscore
value: [0.96343402 0.9733124  0.96190476 0.97645212 0.97484277 0.96038035
 0.96354992 0.95707472 0.97160883 0.96529968]

mean value: 0.9667859581195395

key: test_precision
value: [0.94594595 0.94594595 0.97222222 0.94444444 0.94594595 1.
 0.93939394 1.         0.93939394 0.96969697]

mean value: 0.9602989352989353

key: train_precision
value: [0.97115385 0.96875    0.96805112 0.97492163 0.97484277 0.96805112
 0.97124601 0.96784566 0.97468354 0.96835443]

mean value: 0.9707900120202521

key: test_recall
value: [0.97222222 0.97222222 0.97222222 0.97142857 1.         0.91428571
 0.88571429 0.97142857 0.88571429 0.91428571]

mean value: 0.9459523809523809

key: train_recall
value: [0.95583596 0.97791798 0.95583596 0.97798742 0.97484277 0.95283019
 0.95597484 0.94654088 0.96855346 0.96226415]

mean value: 0.96285836160546

key: test_roc_auc
value: [0.95753968 0.95753968 0.9718254  0.95793651 0.97222222 0.95714286
 0.91428571 0.98571429 0.91428571 0.94285714]

mean value: 0.9531349206349207

key: train_roc_auc
value: [0.96376704 0.97323572 0.96219471 0.97637541 0.97480309 0.96064222
 0.96383648 0.95754717 0.97169811 0.96540881]

mean value: 0.9669508759399242

key: test_jcc
value: [0.92105263 0.92105263 0.94594595 0.91891892 0.94594595 0.91428571
 0.83783784 0.97142857 0.83783784 0.88888889]

mean value: 0.9103194924247555

key: train_jcc
value: [0.92944785 0.94801223 0.9266055  0.95398773 0.95092025 0.92378049
 0.92966361 0.91768293 0.94478528 0.93292683]

mean value: 0.9357812693762667

MCC on Blind test: 0.26

Accuracy on Blind test: 0.81

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.11364222 0.12605238 0.21460104 0.20560765 0.13292837 0.14832115
 0.20759034 0.1914463  0.1992662  0.23236299]

mean value: 0.17718186378479003

key: score_time
value: [0.01099443 0.01911712 0.01961541 0.02059603 0.0108881  0.02057672
 0.02168489 0.02046657 0.02121878 0.01984239]

mean value: 0.018500041961669923

key: test_mcc
value: [0.94365079 0.8594125  0.97222222 0.91587302 0.9451949  0.94511009
 0.82992752 0.97182532 0.82992752 0.860309  ]

mean value: 0.9073452880284912

key: train_mcc
value: [0.92767212 0.93072627 0.93700772 0.96547312 0.96228025 0.92760136
 0.94029342 0.91531613 0.946583   0.92771424]

mean value: 0.938066763071008

key: test_accuracy
value: [0.97183099 0.92957746 0.98591549 0.95774648 0.97183099 0.97183099
 0.91428571 0.98571429 0.91428571 0.92857143]

mean value: 0.9531589537223341

key: train_accuracy
value: [0.96377953 0.96535433 0.96850394 0.98267717 0.98110236 0.96377953
 0.97012579 0.95754717 0.97327044 0.96383648]

mean value: 0.9689976724607537

key: test_fscore
value: [0.97222222 0.93150685 0.98591549 0.95774648 0.97222222 0.97058824
 0.91176471 0.98550725 0.91176471 0.92537313]

mean value: 0.9524611293354492

key: train_fscore
value: [0.96343402 0.96518987 0.96845426 0.98283931 0.98125    0.96366509
 0.9699842  0.95707472 0.97314376 0.96366509]

mean value: 0.9688700325564479

key: test_precision
value: [0.97222222 0.91891892 1.         0.94444444 0.94594595 1.
 0.93939394 1.         0.93939394 0.96875   ]

mean value: 0.9629069410319411

key: train_precision
value: [0.97115385 0.96825397 0.96845426 0.9752322  0.97515528 0.96825397
 0.97460317 0.96784566 0.97777778 0.96825397]

mean value: 0.971498409878129

key: test_recall
value: [0.97222222 0.94444444 0.97222222 0.97142857 1.         0.94285714
 0.88571429 0.97142857 0.88571429 0.88571429]

mean value: 0.9431746031746031

key: train_recall
value: [0.95583596 0.96214511 0.96845426 0.99056604 0.98742138 0.9591195
 0.96540881 0.94654088 0.96855346 0.9591195 ]

mean value: 0.9663164890978712

key: test_roc_auc
value: [0.9718254  0.92936508 0.98611111 0.95793651 0.97222222 0.97142857
 0.91428571 0.98571429 0.91428571 0.92857143]

mean value: 0.9531746031746031

key: train_roc_auc
value: [0.96376704 0.96534928 0.96850386 0.98266472 0.9810924  0.96378688
 0.97012579 0.95754717 0.97327044 0.96383648]

mean value: 0.9689944050949348

key: test_jcc
value: [0.94594595 0.87179487 0.97222222 0.91891892 0.94594595 0.94285714
 0.83783784 0.97142857 0.83783784 0.86111111]

mean value: 0.9105900405900406

key: train_jcc
value: [0.92944785 0.93272171 0.93883792 0.96625767 0.96319018 0.92987805
 0.94171779 0.91768293 0.94769231 0.92987805]

mean value: 0.939730446204259

MCC on Blind test: 0.25

Accuracy on Blind test: 0.81

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01629114 0.0173347  0.01738286 0.0171926  0.01575541 0.02157068
 0.01644921 0.01875186 0.01989651 0.02733588]

mean value: 0.018796086311340332

key: score_time
value: [0.01044035 0.01035213 0.01081347 0.01036024 0.01029181 0.01031828
 0.01030374 0.01035714 0.01030326 0.01077175]

mean value: 0.010431218147277831

key: test_mcc
value: [0.68543653 0.89893315 0.9        0.57777778 0.89893315 1.
 0.39056329 0.62994079 1.         0.68888889]

mean value: 0.7670473570531189

key: train_mcc
value: [0.81369939 0.8128591  0.8128591  0.82502766 0.82502766 0.78971132
 0.85964432 0.83645826 0.81310714 0.85964432]

mean value: 0.8248038274086386

key: test_accuracy
value: [0.84210526 0.94736842 0.94736842 0.78947368 0.94736842 1.
 0.68421053 0.78947368 1.         0.84210526]

mean value: 0.8789473684210526

key: train_accuracy
value: [0.90643275 0.90643275 0.90643275 0.9122807  0.9122807  0.89473684
 0.92982456 0.91812865 0.90643275 0.92982456]

mean value: 0.9122807017543859

key: test_fscore
value: [0.82352941 0.94117647 0.94736842 0.77777778 0.94117647 1.
 0.75       0.83333333 1.         0.84210526]

mean value: 0.8856467148262814

key: train_fscore
value: [0.90909091 0.90697674 0.90697674 0.91428571 0.91428571 0.89534884
 0.92941176 0.91666667 0.90697674 0.92941176]

mean value: 0.9129431603508211

key: test_precision
value: [0.875      1.         0.9        0.77777778 1.         1.
 0.64285714 0.71428571 1.         0.88888889]

mean value: 0.8798809523809524

key: train_precision
value: [0.88888889 0.90697674 0.90697674 0.8988764  0.8988764  0.88505747
 0.92941176 0.92771084 0.89655172 0.92941176]

mean value: 0.9068738754437303

key: test_recall
value: [0.77777778 0.88888889 1.         0.77777778 0.88888889 1.
 0.9        1.         1.         0.8       ]

mean value: 0.9033333333333333

key: train_recall
value: [0.93023256 0.90697674 0.90697674 0.93023256 0.93023256 0.90588235
 0.92941176 0.90588235 0.91764706 0.92941176]

mean value: 0.9192886456908345

key: test_roc_auc
value: [0.83888889 0.94444444 0.95       0.78888889 0.94444444 1.
 0.67222222 0.77777778 1.         0.84444444]

mean value: 0.8761111111111111

key: train_roc_auc
value: [0.90629275 0.90642955 0.90642955 0.9121751  0.9121751  0.89480164
 0.92982216 0.91805746 0.90649795 0.92982216]

mean value: 0.912250341997264

key: test_jcc
value: [0.7        0.88888889 0.9        0.63636364 0.88888889 1.
 0.6        0.71428571 1.         0.72727273]

mean value: 0.8055699855699856

key: train_jcc
value: [0.83333333 0.82978723 0.82978723 0.84210526 0.84210526 0.81052632
 0.86813187 0.84615385 0.82978723 0.86813187]

mean value: 0.8399849459983838

MCC on Blind test: 0.19

Accuracy on Blind test: 0.69

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.37531638 0.38986826 0.4050014  0.3858676  0.38417578 0.39149809
 0.40014076 0.39357424 0.41155505 0.40910983]

mean value: 0.39461073875427244

key: score_time
value: [0.01066661 0.01065397 0.01060176 0.01059651 0.01084471 0.01105618
 0.01088595 0.01094103 0.01107144 0.01088524]

mean value: 0.010820341110229493

key: test_mcc
value: [0.80507649 0.89893315 0.80903983 0.57777778 0.89893315 0.80507649
 0.80903983 0.78888889 1.         0.68888889]

mean value: 0.8081654497168143

key: train_mcc
value: [0.85964432 0.89480164 0.95321477 0.89480164 0.89480164 0.94158687
 0.96497948 0.95321477 0.94158687 1.        ]

mean value: 0.9298632010943912

key: test_accuracy
value: [0.89473684 0.94736842 0.89473684 0.78947368 0.94736842 0.89473684
 0.89473684 0.89473684 1.         0.84210526]

mean value: 0.9

key: train_accuracy
value: [0.92982456 0.94736842 0.97660819 0.94736842 0.94736842 0.97076023
 0.98245614 0.97660819 0.97076023 1.        ]

mean value: 0.9649122807017544

key: test_fscore
value: [0.875      0.94117647 0.9        0.77777778 0.94117647 0.90909091
 0.88888889 0.9        1.         0.84210526]

mean value: 0.8975215780091941

key: train_fscore
value: [0.93023256 0.94736842 0.97674419 0.94736842 0.94736842 0.97076023
 0.98245614 0.97647059 0.97076023 1.        ]

mean value: 0.9649529203766369

key: test_precision
value: [1.         1.         0.81818182 0.77777778 1.         0.83333333
 1.         0.9        1.         0.88888889]

mean value: 0.9218181818181819

key: train_precision
value: [0.93023256 0.95294118 0.97674419 0.95294118 0.95294118 0.96511628
 0.97674419 0.97647059 0.96511628 1.        ]

mean value: 0.9649247606019151

key: test_recall
value: [0.77777778 0.88888889 1.         0.77777778 0.88888889 1.
 0.8        0.9        1.         0.8       ]

mean value: 0.8833333333333333

key: train_recall
value: [0.93023256 0.94186047 0.97674419 0.94186047 0.94186047 0.97647059
 0.98823529 0.97647059 0.97647059 1.        ]

mean value: 0.9650205198358413

key: test_roc_auc
value: [0.88888889 0.94444444 0.9        0.78888889 0.94444444 0.88888889
 0.9        0.89444444 1.         0.84444444]

mean value: 0.8994444444444444

key: train_roc_auc
value: [0.92982216 0.94740082 0.97660739 0.94740082 0.94740082 0.97079343
 0.98248974 0.97660739 0.97079343 1.        ]

mean value: 0.9649316005471956

key: test_jcc
value: [0.77777778 0.88888889 0.81818182 0.63636364 0.88888889 0.83333333
 0.8        0.81818182 1.         0.72727273]

mean value: 0.8188888888888889

key: train_jcc
value: [0.86956522 0.9        0.95454545 0.9        0.9        0.94318182
 0.96551724 0.95402299 0.94318182 1.        ]

mean value: 0.9330014538185453

MCC on Blind test: 0.2

Accuracy on Blind test: 0.75

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01085544 0.00919461 0.00706816 0.00698352 0.00676513 0.00690603
 0.00674391 0.0070889  0.00666404 0.00664258]

mean value: 0.0074912309646606445

key: score_time
value: [0.01064849 0.00935817 0.00804448 0.00793123 0.00782728 0.00777674
 0.00780153 0.00776029 0.00765848 0.00768161]

mean value: 0.00824882984161377

key: test_mcc
value: [0.48934516 0.71611487 0.9        0.26257545 0.4719399  0.78888889
 0.58655573 0.57777778 0.78888889 0.57777778]

mean value: 0.6159864454279441

key: train_mcc
value: [0.67948707 0.67737019 0.65383223 0.72260902 0.69912629 0.6553202
 0.72095237 0.6878315  0.70870609 0.77152203]

mean value: 0.6976756971665955

key: test_accuracy
value: [0.73684211 0.84210526 0.94736842 0.63157895 0.73684211 0.89473684
 0.78947368 0.78947368 0.89473684 0.78947368]

mean value: 0.8052631578947368

key: train_accuracy
value: [0.83625731 0.83625731 0.8245614  0.85964912 0.84795322 0.8245614
 0.85964912 0.84210526 0.85380117 0.88304094]

mean value: 0.8467836257309942

key: test_fscore
value: [0.66666667 0.8        0.94736842 0.53333333 0.70588235 0.9
 0.81818182 0.8        0.9        0.8       ]

mean value: 0.7871432592175627

key: train_fscore
value: [0.825      0.82716049 0.81481481 0.85365854 0.84146341 0.81012658
 0.85365854 0.83229814 0.84848485 0.88888889]

mean value: 0.8395554252745034

key: test_precision
value: [0.83333333 1.         0.9        0.66666667 0.75       0.9
 0.75       0.8        0.9        0.8       ]

mean value: 0.8300000000000001

key: train_precision
value: [0.89189189 0.88157895 0.86842105 0.8974359  0.88461538 0.87671233
 0.88607595 0.88157895 0.875      0.84210526]

mean value: 0.8785415662603702

key: test_recall
value: [0.55555556 0.66666667 1.         0.44444444 0.66666667 0.9
 0.9        0.8        0.9        0.8       ]

mean value: 0.7633333333333333

key: train_recall
value: [0.76744186 0.77906977 0.76744186 0.81395349 0.80232558 0.75294118
 0.82352941 0.78823529 0.82352941 0.94117647]

mean value: 0.8059644322845417

key: test_roc_auc
value: [0.72777778 0.83333333 0.95       0.62222222 0.73333333 0.89444444
 0.78333333 0.78888889 0.89444444 0.78888889]

mean value: 0.8016666666666666

key: train_roc_auc
value: [0.83666211 0.83659371 0.8248974  0.85991792 0.84822161 0.82414501
 0.85943912 0.84179207 0.85362517 0.88337893]

mean value: 0.8468673050615595

key: test_jcc
value: [0.5        0.66666667 0.9        0.36363636 0.54545455 0.81818182
 0.69230769 0.66666667 0.81818182 0.66666667]

mean value: 0.6637762237762238

key: train_jcc
value: [0.70212766 0.70526316 0.6875     0.74468085 0.72631579 0.68085106
 0.74468085 0.71276596 0.73684211 0.8       ]

mean value: 0.7241027435610302

MCC on Blind test: 0.27

Accuracy on Blind test: 0.79

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00703979 0.006778   0.00680351 0.00683117 0.00685048 0.00680065
 0.00678611 0.006809   0.00679731 0.00678182]

mean value: 0.006827783584594726

key: score_time
value: [0.00774384 0.00769997 0.00771785 0.00773907 0.00769234 0.00770354
 0.00767231 0.00767827 0.00772476 0.00772119]

mean value: 0.007709312438964844

key: test_mcc
value: [0.57777778 0.68888889 0.78888889 0.03580574 0.41773368 0.36666667
 0.2857738  0.62994079 0.78888889 0.72456884]

mean value: 0.5304933960318973

key: train_mcc
value: [0.58646061 0.59367966 0.57166923 0.61721762 0.62711195 0.55822989
 0.65085813 0.56730506 0.58506018 0.61093648]

mean value: 0.596852879815003

key: test_accuracy
value: [0.78947368 0.84210526 0.89473684 0.52631579 0.68421053 0.68421053
 0.63157895 0.78947368 0.89473684 0.84210526]

mean value: 0.7578947368421053

key: train_accuracy
value: [0.78947368 0.79532164 0.78362573 0.80701754 0.8128655  0.77777778
 0.8245614  0.78362573 0.78947368 0.80116959]

mean value: 0.7964912280701755

key: test_fscore
value: [0.77777778 0.84210526 0.88888889 0.4        0.72727273 0.7
 0.72       0.83333333 0.9        0.82352941]

mean value: 0.7612907402195328

key: train_fscore
value: [0.80645161 0.80662983 0.79781421 0.81767956 0.82022472 0.78651685
 0.82954545 0.78362573 0.8021978  0.81521739]

mean value: 0.8065903164894157

key: test_precision
value: [0.77777778 0.8        0.88888889 0.5        0.61538462 0.7
 0.6        0.71428571 0.9        1.        ]

mean value: 0.7496336996336996

key: train_precision
value: [0.75       0.76842105 0.75257732 0.77894737 0.79347826 0.75268817
 0.8021978  0.77906977 0.75257732 0.75757576]

mean value: 0.7687532820355886

key: test_recall
value: [0.77777778 0.88888889 0.88888889 0.33333333 0.88888889 0.7
 0.9        1.         0.9        0.7       ]

mean value: 0.7977777777777778

key: train_recall
value: [0.87209302 0.84883721 0.84883721 0.86046512 0.84883721 0.82352941
 0.85882353 0.78823529 0.85882353 0.88235294]

mean value: 0.8490834473324214

key: test_roc_auc
value: [0.78888889 0.84444444 0.89444444 0.51666667 0.69444444 0.68333333
 0.61666667 0.77777778 0.89444444 0.85      ]

mean value: 0.7561111111111111

key: train_roc_auc
value: [0.78898769 0.79500684 0.78324213 0.80670315 0.8126539  0.77804378
 0.8247606  0.78365253 0.78987688 0.80164159]

mean value: 0.7964569083447333

key: test_jcc
value: [0.63636364 0.72727273 0.8        0.25       0.57142857 0.53846154
 0.5625     0.71428571 0.81818182 0.7       ]

mean value: 0.6318494005994006

key: train_jcc
value: [0.67567568 0.67592593 0.66363636 0.69158879 0.6952381  0.64814815
 0.70873786 0.64423077 0.66972477 0.68807339]

mean value: 0.6760979792116991

MCC on Blind test: 0.23

Accuracy on Blind test: 0.62

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00671601 0.00724053 0.00691652 0.00643086 0.00659227 0.00642991
 0.0070684  0.00696349 0.00667977 0.00645947]

mean value: 0.006749725341796875

key: score_time
value: [0.00925422 0.0137701  0.0091207  0.00863004 0.0085907  0.00899935
 0.00881934 0.0092144  0.00863957 0.008605  ]

mean value: 0.00936434268951416

key: test_mcc
value: [0.4719399  0.9        0.57777778 0.25844328 0.50604808 0.56694671
 0.25844328 0.36803496 0.9        0.41773368]

mean value: 0.5225367670594359

key: train_mcc
value: [0.68468598 0.59069767 0.69589603 0.63788154 0.66205542 0.65057205
 0.68455664 0.67315132 0.62630196 0.66082912]

mean value: 0.6566627720071807

key: test_accuracy
value: [0.73684211 0.94736842 0.78947368 0.63157895 0.73684211 0.73684211
 0.63157895 0.68421053 0.94736842 0.68421053]

mean value: 0.7526315789473684

key: train_accuracy
value: [0.84210526 0.79532164 0.84795322 0.81871345 0.83040936 0.8245614
 0.84210526 0.83625731 0.8128655  0.83040936]

mean value: 0.8280701754385965

key: test_fscore
value: [0.70588235 0.94736842 0.77777778 0.58823529 0.76190476 0.66666667
 0.66666667 0.72727273 0.94736842 0.625     ]

mean value: 0.7414143089452687

key: train_fscore
value: [0.84023669 0.79532164 0.84883721 0.81656805 0.82634731 0.81707317
 0.83832335 0.8313253  0.80722892 0.82840237]

mean value: 0.8249663993602754

key: test_precision
value: [0.75       0.9        0.77777778 0.625      0.66666667 1.
 0.63636364 0.66666667 1.         0.83333333]

mean value: 0.7855808080808081

key: train_precision
value: [0.85542169 0.8        0.84883721 0.8313253  0.85185185 0.84810127
 0.85365854 0.85185185 0.82716049 0.83333333]

mean value: 0.8401541530526481

key: test_recall
value: [0.66666667 1.         0.77777778 0.55555556 0.88888889 0.5
 0.7        0.8        0.9        0.5       ]

mean value: 0.7288888888888889

key: train_recall
value: [0.8255814  0.79069767 0.84883721 0.80232558 0.80232558 0.78823529
 0.82352941 0.81176471 0.78823529 0.82352941]

mean value: 0.8105061559507524

key: test_roc_auc
value: [0.73333333 0.95       0.78888889 0.62777778 0.74444444 0.75
 0.62777778 0.67777778 0.95       0.69444444]

mean value: 0.7544444444444445

key: train_roc_auc
value: [0.84220246 0.79534884 0.84794802 0.81880985 0.83057456 0.82435021
 0.84199726 0.83611491 0.8127223  0.83036936]

mean value: 0.8280437756497948

key: test_jcc
value: [0.54545455 0.9        0.63636364 0.41666667 0.61538462 0.5
 0.5        0.57142857 0.9        0.45454545]

mean value: 0.6039843489843489

key: train_jcc
value: [0.7244898  0.66019417 0.73737374 0.69       0.70408163 0.69072165
 0.72164948 0.71134021 0.67676768 0.70707071]

mean value: 0.7023689064747017

MCC on Blind test: 0.25

Accuracy on Blind test: 0.73

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0085094  0.00796008 0.00784731 0.00864291 0.0081656  0.00886369
 0.00788093 0.00836372 0.00793576 0.00788879]

mean value: 0.008205819129943847

key: score_time
value: [0.0081985  0.00789022 0.00786686 0.00799942 0.00821161 0.00802946
 0.00788641 0.00793266 0.00784802 0.00792909]

mean value: 0.007979226112365723

key: test_mcc
value: [0.59554321 0.80903983 0.80903983 0.47777778 0.50604808 0.89893315
 0.39056329 0.62994079 0.89893315 0.57777778]

mean value: 0.659359689118782

key: train_mcc
value: [0.68426013 0.65000183 0.66041977 0.73935782 0.70514081 0.70309379
 0.72670051 0.70309379 0.6820046  0.73981073]

mean value: 0.699388376917528

key: test_accuracy
value: [0.78947368 0.89473684 0.89473684 0.73684211 0.73684211 0.94736842
 0.68421053 0.78947368 0.94736842 0.78947368]

mean value: 0.8210526315789474

key: train_accuracy
value: [0.83625731 0.81871345 0.8245614  0.86549708 0.84795322 0.84795322
 0.85964912 0.84795322 0.83625731 0.86549708]

mean value: 0.8450292397660819

key: test_fscore
value: [0.8        0.9        0.9        0.73684211 0.76190476 0.95238095
 0.75       0.83333333 0.95238095 0.8       ]

mean value: 0.8386842105263158

key: train_fscore
value: [0.85106383 0.83597884 0.84042553 0.87567568 0.86021505 0.85714286
 0.86813187 0.85714286 0.84782609 0.87431694]

mean value: 0.8567919536384895

key: test_precision
value: [0.72727273 0.81818182 0.81818182 0.7        0.66666667 0.90909091
 0.64285714 0.71428571 0.90909091 0.8       ]

mean value: 0.7705627705627706

key: train_precision
value: [0.78431373 0.76699029 0.7745098  0.81818182 0.8        0.80412371
 0.81443299 0.80412371 0.78787879 0.81632653]

mean value: 0.7970881369717886

key: test_recall
value: [0.88888889 1.         1.         0.77777778 0.88888889 1.
 0.9        1.         1.         0.8       ]

mean value: 0.9255555555555556

key: train_recall
value: [0.93023256 0.91860465 0.91860465 0.94186047 0.93023256 0.91764706
 0.92941176 0.91764706 0.91764706 0.94117647]

mean value: 0.9263064295485636

key: test_roc_auc
value: [0.79444444 0.9        0.9        0.73888889 0.74444444 0.94444444
 0.67222222 0.77777778 0.94444444 0.78888889]

mean value: 0.8205555555555556

key: train_roc_auc
value: [0.83570451 0.81812585 0.82400821 0.86504788 0.84746922 0.84835841
 0.86005472 0.84835841 0.83673051 0.86593707]

mean value: 0.8449794801641587

key: test_jcc
value: [0.66666667 0.81818182 0.81818182 0.58333333 0.61538462 0.90909091
 0.6        0.71428571 0.90909091 0.66666667]

mean value: 0.7300882450882451

key: train_jcc
value: [0.74074074 0.71818182 0.72477064 0.77884615 0.75471698 0.75
 0.76699029 0.75       0.73584906 0.77669903]

mean value: 0.7496794713094747

MCC on Blind test: 0.17

Accuracy on Blind test: 0.54

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.50620532 0.52595353 0.51024294 0.6996367  0.51282358 0.5229423
 0.52637339 0.6919179  0.50890207 0.50792599]

mean value: 0.5512923717498779

key: score_time
value: [0.01093102 0.01325846 0.01348591 0.01509166 0.01097798 0.0216291
 0.01327252 0.01341844 0.01341486 0.01526189]

mean value: 0.014074182510375977

key: test_mcc
value: [0.58655573 0.57777778 0.68888889 0.36666667 0.89893315 0.57777778
 0.48934516 0.62994079 1.         0.68543653]

mean value: 0.6501322465616389

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.78947368 0.84210526 0.68421053 0.94736842 0.78947368
 0.73684211 0.78947368 1.         0.84210526]

mean value: 0.8210526315789474

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.77777778 0.84210526 0.66666667 0.94117647 0.8
 0.7826087  0.83333333 1.         0.85714286]

mean value: 0.8250811064318939

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.85714286 0.77777778 0.8        0.66666667 1.         0.8
 0.69230769 0.71428571 1.         0.81818182]

mean value: 0.8126362526362526

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.77777778 0.88888889 0.66666667 0.88888889 0.8
 0.9        1.         1.         0.9       ]

mean value: 0.8488888888888889

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78333333 0.78888889 0.84444444 0.68333333 0.94444444 0.78888889
 0.72777778 0.77777778 1.         0.83888889]

mean value: 0.8177777777777777

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.63636364 0.72727273 0.5        0.88888889 0.66666667
 0.64285714 0.71428571 1.         0.75      ]

mean value: 0.7126334776334776

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.19

Accuracy on Blind test: 0.69

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01248384 0.00804639 0.00774622 0.00760841 0.00760221 0.00763416
 0.00761247 0.00740051 0.00772691 0.00756049]

mean value: 0.00814216136932373

key: score_time
value: [0.01788831 0.00798225 0.00790691 0.00766706 0.00764203 0.00762773
 0.00762391 0.00771236 0.00768185 0.00769854]

mean value: 0.008743095397949218

key: test_mcc
value: [0.80903983 1.         0.80903983 0.80903983 0.80507649 0.68888889
 0.78888889 0.80507649 1.         0.68888889]

mean value: 0.8203939143333164

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89473684 1.         0.89473684 0.89473684 0.89473684 0.84210526
 0.89473684 0.89473684 1.         0.84210526]

mean value: 0.9052631578947369

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.9        1.         0.9        0.9        0.875      0.84210526
 0.9        0.90909091 1.         0.84210526]

mean value: 0.9068301435406699

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.81818182 1.         0.81818182 0.81818182 1.         0.88888889
 0.9        0.83333333 1.         0.88888889]

mean value: 0.8965656565656566

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.77777778 0.8
 0.9        1.         1.         0.8       ]

mean value: 0.9277777777777778

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9        1.         0.9        0.9        0.88888889 0.84444444
 0.89444444 0.88888889 1.         0.84444444]

mean value: 0.9061111111111111

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.81818182 1.         0.81818182 0.81818182 0.77777778 0.72727273
 0.81818182 0.83333333 1.         0.72727273]

mean value: 0.8338383838383838

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.31

Accuracy on Blind test: 0.84

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08216429 0.0823946  0.08215213 0.08401656 0.08392739 0.08516693
 0.08451629 0.08552861 0.08529258 0.08295751]

mean value: 0.08381168842315674

key: score_time
value: [0.01642704 0.01630354 0.01634693 0.0172317  0.01676369 0.01684332
 0.01709843 0.01667094 0.01670718 0.01638293]

mean value: 0.01667757034301758

key: test_mcc
value: [0.78888889 1.         0.9        0.80903983 0.89893315 1.
 0.39056329 0.62994079 0.89893315 0.80903983]

mean value: 0.8125338935827562

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89473684 1.         0.94736842 0.89473684 0.94736842 1.
 0.68421053 0.78947368 0.94736842 0.89473684]

mean value: 0.9

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         0.94736842 0.9        0.94117647 1.
 0.75       0.83333333 0.95238095 0.88888889]

mean value: 0.910203695513293

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 1.         0.9        0.81818182 1.         1.
 0.64285714 0.71428571 0.90909091 1.        ]

mean value: 0.8873304473304473

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.88888889 1.         1.         1.         0.88888889 1.
 0.9        1.         1.         0.8       ]

mean value: 0.9477777777777778

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.89444444 1.         0.95       0.9        0.94444444 1.
 0.67222222 0.77777778 0.94444444 0.9       ]

mean value: 0.8983333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        1.         0.9        0.81818182 0.88888889 1.
 0.6        0.71428571 0.90909091 0.8       ]

mean value: 0.843044733044733

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.76

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00726008 0.00728798 0.00723386 0.00722384 0.00689101 0.00721788
 0.00701737 0.00681758 0.00731874 0.00693321]

mean value: 0.007120156288146972

key: score_time
value: [0.00801849 0.00767374 0.00813246 0.00769758 0.00767255 0.00790119
 0.00813627 0.00808954 0.00791264 0.00815868]

mean value: 0.007939314842224121

key: test_mcc
value: [0.36803496 0.4719399  0.4719399  0.64450339 0.68888889 1.
 0.1495142  0.48934516 0.19096397 0.26666667]

mean value: 0.4741797049483811

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.68421053 0.73684211 0.73684211 0.78947368 0.84210526 1.
 0.57894737 0.73684211 0.57894737 0.63157895]

mean value: 0.731578947368421

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.625      0.70588235 0.70588235 0.81818182 0.84210526 1.
 0.63636364 0.7826087  0.5        0.63157895]

mean value: 0.7247603066606297

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.75       0.75       0.69230769 0.8        1.
 0.58333333 0.69230769 0.66666667 0.66666667]

mean value: 0.7315567765567765

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.55555556 0.66666667 0.66666667 1.         0.88888889 1.
 0.7        0.9        0.4        0.6       ]

mean value: 0.7377777777777778

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67777778 0.73333333 0.73333333 0.8        0.84444444 1.
 0.57222222 0.72777778 0.58888889 0.63333333]

mean value: 0.731111111111111

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.45454545 0.54545455 0.54545455 0.69230769 0.72727273 1.
 0.46666667 0.64285714 0.33333333 0.46153846]

mean value: 0.5869430569430569

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.26

Accuracy on Blind test: 0.75

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.09578466 1.03582072 1.0463326  1.02560258 1.02915144 1.02786326
 1.02509475 1.03272462 1.03403187 1.02677202]

mean value: 1.0379178524017334

key: score_time
value: [0.08918476 0.08878589 0.09057307 0.08787775 0.08753514 0.08778787
 0.08985972 0.08711696 0.08690763 0.08730698]

mean value: 0.08829357624053955

key: test_mcc
value: [1.         1.         0.9        0.9        1.         0.9
 0.9        0.89893315 1.         0.80903983]

mean value: 0.930797298490688

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         0.94736842 0.94736842 1.         0.94736842
 0.94736842 0.94736842 1.         0.89473684]

mean value: 0.9631578947368421

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         0.94736842 0.94736842 1.         0.94736842
 0.94736842 0.95238095 1.         0.88888889]

mean value: 0.9630743525480367

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.9        0.9        1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9709090909090909

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  1.  0.9 0.9 1.  1.  0.8]

mean value: 0.96

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         0.95       0.95       1.         0.95
 0.95       0.94444444 1.         0.9       ]

mean value: 0.9644444444444444

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         0.9        0.9        1.         0.9
 0.9        0.90909091 1.         0.8       ]

mean value: 0.9309090909090909

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.83

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.79450846 0.99196959 0.89163351 0.86481595 0.85897207 0.88148761
 0.85166144 0.85085249 0.98532867 0.87325144]

mean value: 0.8844481229782104

key: score_time
value: [0.228971   0.21244311 0.2221725  0.17969465 0.22778034 0.23846292
 0.18934894 0.2319572  0.19641733 0.20504928]

mean value: 0.21322972774505616

key: test_mcc
value: [0.9        1.         0.9        0.80903983 1.         0.9
 0.9        0.89893315 1.         0.80903983]

mean value: 0.9117012819862771

key: train_mcc
value: [0.94157888 0.95346936 0.95321477 0.95321477 0.95346936 0.94158687
 0.95348202 0.96497948 0.94158687 0.95348202]

mean value: 0.9510064395117223

key: test_accuracy
value: [0.94736842 1.         0.94736842 0.89473684 1.         0.94736842
 0.94736842 0.94736842 1.         0.89473684]

mean value: 0.9526315789473684

key: train_accuracy
value: [0.97076023 0.97660819 0.97660819 0.97660819 0.97660819 0.97076023
 0.97660819 0.98245614 0.97076023 0.97660819]

mean value: 0.975438596491228

key: test_fscore
value: [0.94736842 1.         0.94736842 0.9        1.         0.94736842
 0.94736842 0.95238095 1.         0.88888889]

mean value: 0.9530743525480367

key: train_fscore
value: [0.97109827 0.97701149 0.97674419 0.97674419 0.97701149 0.97076023
 0.97674419 0.98245614 0.97076023 0.97674419]

mean value: 0.9756074606774882

key: test_precision
value: [0.9        1.         0.9        0.81818182 1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9527272727272728

key: train_precision
value: [0.96551724 0.96590909 0.97674419 0.97674419 0.96590909 0.96511628
 0.96551724 0.97674419 0.96511628 0.96551724]

mean value: 0.9688835022235183

key: test_recall
value: [1.  1.  1.  1.  1.  0.9 0.9 1.  1.  0.8]

mean value: 0.96

key: train_recall
value: [0.97674419 0.98837209 0.97674419 0.97674419 0.98837209 0.97647059
 0.98823529 0.98823529 0.97647059 0.98823529]

mean value: 0.9824623803009576

key: test_roc_auc
value: [0.95       1.         0.95       0.9        1.         0.95
 0.95       0.94444444 1.         0.9       ]

mean value: 0.9544444444444444

key: train_roc_auc
value: [0.97072503 0.97653899 0.97660739 0.97660739 0.97653899 0.97079343
 0.97667579 0.98248974 0.97079343 0.97667579]

mean value: 0.9754445964432285

key: test_jcc
value: [0.9        1.         0.9        0.81818182 1.         0.9
 0.9        0.90909091 1.         0.8       ]

mean value: 0.9127272727272727

key: train_jcc
value: [0.94382022 0.95505618 0.95454545 0.95454545 0.95505618 0.94318182
 0.95454545 0.96551724 0.94318182 0.95454545]

mean value: 0.9523995280194428

MCC on Blind test: 0.27

Accuracy on Blind test: 0.83

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00803375 0.00765824 0.00742745 0.00700617 0.00725985 0.00694323
 0.00772119 0.00693536 0.00706053 0.00701404]

mean value: 0.0073059797286987305

key: score_time
value: [0.00847721 0.0100956  0.00787282 0.00814962 0.00778627 0.00804114
 0.00835919 0.00783134 0.00775075 0.00776839]

mean value: 0.008213233947753907

key: test_mcc
value: [0.57777778 0.68888889 0.78888889 0.03580574 0.41773368 0.36666667
 0.2857738  0.62994079 0.78888889 0.72456884]

mean value: 0.5304933960318973

key: train_mcc
value: [0.58646061 0.59367966 0.57166923 0.61721762 0.62711195 0.55822989
 0.65085813 0.56730506 0.58506018 0.61093648]

mean value: 0.596852879815003

key: test_accuracy
value: [0.78947368 0.84210526 0.89473684 0.52631579 0.68421053 0.68421053
 0.63157895 0.78947368 0.89473684 0.84210526]

mean value: 0.7578947368421053

key: train_accuracy
value: [0.78947368 0.79532164 0.78362573 0.80701754 0.8128655  0.77777778
 0.8245614  0.78362573 0.78947368 0.80116959]

mean value: 0.7964912280701755

key: test_fscore
value: [0.77777778 0.84210526 0.88888889 0.4        0.72727273 0.7
 0.72       0.83333333 0.9        0.82352941]

mean value: 0.7612907402195328

key: train_fscore
value: [0.80645161 0.80662983 0.79781421 0.81767956 0.82022472 0.78651685
 0.82954545 0.78362573 0.8021978  0.81521739]

mean value: 0.8065903164894157

key: test_precision
value: [0.77777778 0.8        0.88888889 0.5        0.61538462 0.7
 0.6        0.71428571 0.9        1.        ]

mean value: 0.7496336996336996

key: train_precision
value: [0.75       0.76842105 0.75257732 0.77894737 0.79347826 0.75268817
 0.8021978  0.77906977 0.75257732 0.75757576]

mean value: 0.7687532820355886

key: test_recall
value: [0.77777778 0.88888889 0.88888889 0.33333333 0.88888889 0.7
 0.9        1.         0.9        0.7       ]

mean value: 0.7977777777777778

key: train_recall
value: [0.87209302 0.84883721 0.84883721 0.86046512 0.84883721 0.82352941
 0.85882353 0.78823529 0.85882353 0.88235294]

mean value: 0.8490834473324214

key: test_roc_auc
value: [0.78888889 0.84444444 0.89444444 0.51666667 0.69444444 0.68333333
 0.61666667 0.77777778 0.89444444 0.85      ]

mean value: 0.7561111111111111

key: train_roc_auc
value: [0.78898769 0.79500684 0.78324213 0.80670315 0.8126539  0.77804378
 0.8247606  0.78365253 0.78987688 0.80164159]

mean value: 0.7964569083447333

key: test_jcc
value: [0.63636364 0.72727273 0.8        0.25       0.57142857 0.53846154
 0.5625     0.71428571 0.81818182 0.7       ]

mean value: 0.6318494005994006

key: train_jcc
value: [0.67567568 0.67592593 0.66363636 0.69158879 0.6952381  0.64814815
 0.70873786 0.64423077 0.66972477 0.68807339]

mean value: 0.6760979792116991

MCC on Blind test: 0.23

Accuracy on Blind test: 0.62

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06458902 0.03820992 0.03959632 0.03878093 0.03928733 0.04292321
 0.04065013 0.04027557 0.04257679 0.03781724]

mean value: 0.04247064590454101

key: score_time
value: [0.00953722 0.00945091 0.01028013 0.01050878 0.01019955 0.01042247
 0.01043653 0.01029015 0.01024365 0.01017427]

mean value: 0.010154366493225098

key: test_mcc
value: [1.         1.         0.9        0.9        1.         0.9
 0.9        0.89893315 1.         0.80903983]

mean value: 0.930797298490688

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         0.94736842 0.94736842 1.         0.94736842
 0.94736842 0.94736842 1.         0.89473684]

mean value: 0.9631578947368421

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         0.94736842 0.94736842 1.         0.94736842
 0.94736842 0.95238095 1.         0.88888889]

mean value: 0.9630743525480367

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.9        0.9        1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9709090909090909

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  1.  0.9 0.9 1.  1.  0.8]

mean value: 0.96

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         0.95       0.95       1.         0.95
 0.95       0.94444444 1.         0.9       ]

mean value: 0.9644444444444444

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         0.9        0.9        1.         0.9
 0.9        0.90909091 1.         0.8       ]

mean value: 0.9309090909090909

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.79

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01104665 0.03395987 0.03166819 0.03184342 0.02563572 0.027812
 0.03157783 0.03174877 0.03313327 0.02740669]

mean value: 0.028583240509033204

key: score_time
value: [0.00994396 0.02052045 0.01887202 0.01847482 0.01112342 0.02053523
 0.01040173 0.01528358 0.01049042 0.01989079]

mean value: 0.015553641319274902

key: test_mcc
value: [1.         0.80507649 0.68888889 0.9        0.89893315 0.89893315
 0.80903983 0.78888889 1.         0.80903983]

mean value: 0.8598800233490951

key: train_mcc
value: [0.92982216 0.94157888 0.94157888 0.94158687 0.92982216 0.92982216
 0.96497948 0.9649747  0.94158687 0.96497948]

mean value: 0.9450731645321953

key: test_accuracy
value: [1.         0.89473684 0.84210526 0.94736842 0.94736842 0.94736842
 0.89473684 0.89473684 1.         0.89473684]

mean value: 0.9263157894736842

key: train_accuracy
value: [0.96491228 0.97076023 0.97076023 0.97076023 0.96491228 0.96491228
 0.98245614 0.98245614 0.97076023 0.98245614]

mean value: 0.9725146198830409

key: test_fscore
value: [1.         0.875      0.84210526 0.94736842 0.94117647 0.95238095
 0.88888889 0.9        1.         0.88888889]

mean value: 0.9235808884957493

key: train_fscore
value: [0.96511628 0.97109827 0.97109827 0.97076023 0.96511628 0.96470588
 0.98245614 0.98224852 0.97076023 0.98245614]

mean value: 0.9725816241532454

key: test_precision
value: [1.         1.         0.8        0.9        1.         0.90909091
 1.         0.9        1.         1.        ]

mean value: 0.9509090909090909

key: train_precision
value: [0.96511628 0.96551724 0.96551724 0.97647059 0.96511628 0.96470588
 0.97674419 0.98809524 0.96511628 0.97674419]

mean value: 0.970914340074442

key: test_recall
value: [1.         0.77777778 0.88888889 1.         0.88888889 1.
 0.8        0.9        1.         0.8       ]

mean value: 0.9055555555555556

key: train_recall
value: [0.96511628 0.97674419 0.97674419 0.96511628 0.96511628 0.96470588
 0.98823529 0.97647059 0.97647059 0.98823529]

mean value: 0.9742954856361149

key: test_roc_auc
value: [1.         0.88888889 0.84444444 0.95       0.94444444 0.94444444
 0.9        0.89444444 1.         0.9       ]

mean value: 0.9266666666666666

key: train_roc_auc
value: [0.96491108 0.97072503 0.97072503 0.97079343 0.96491108 0.96491108
 0.98248974 0.98242134 0.97079343 0.98248974]

mean value: 0.9725170998632011

key: test_jcc
value: [1.         0.77777778 0.72727273 0.9        0.88888889 0.90909091
 0.8        0.81818182 1.         0.8       ]

mean value: 0.8621212121212122

key: train_jcc
value: [0.93258427 0.94382022 0.94382022 0.94318182 0.93258427 0.93181818
 0.96551724 0.96511628 0.94318182 0.96551724]

mean value: 0.9467141568774251

MCC on Blind test: 0.22

Accuracy on Blind test: 0.78

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01662064 0.00703716 0.0070796  0.00686336 0.00670743 0.00671721
 0.00670624 0.0067091  0.00672913 0.00688386]

mean value: 0.007805371284484863

key: score_time
value: [0.00827503 0.00794244 0.00786853 0.00760841 0.00764394 0.0076046
 0.00760007 0.00758982 0.00767159 0.007622  ]

mean value: 0.007742643356323242

key: test_mcc
value: [0.47777778 0.78888889 0.80903983 0.4719399  0.47777778 0.9
 0.39056329 0.62994079 0.89893315 0.68888889]

mean value: 0.6533750299089396

key: train_mcc
value: [0.6682897  0.64075558 0.64460032 0.72230744 0.69197907 0.64522558
 0.72260902 0.69674175 0.66470432 0.67154946]

mean value: 0.6768762239015662

key: test_accuracy
value: [0.73684211 0.89473684 0.89473684 0.73684211 0.73684211 0.94736842
 0.68421053 0.78947368 0.94736842 0.84210526]

mean value: 0.8210526315789474

key: train_accuracy
value: [0.83040936 0.81871345 0.81871345 0.85964912 0.84210526 0.81871345
 0.85964912 0.84795322 0.83040936 0.83040936]

mean value: 0.835672514619883

key: test_fscore
value: [0.73684211 0.88888889 0.9        0.70588235 0.73684211 0.94736842
 0.75       0.83333333 0.95238095 0.84210526]

mean value: 0.8293643422281193

key: train_fscore
value: [0.84324324 0.82872928 0.83243243 0.86666667 0.85405405 0.83060109
 0.86516854 0.85057471 0.83798883 0.84324324]

mean value: 0.8452702093088933

key: test_precision
value: [0.7        0.88888889 0.81818182 0.75       0.7        1.
 0.64285714 0.71428571 0.90909091 0.88888889]

mean value: 0.8012193362193362

key: train_precision
value: [0.78787879 0.78947368 0.77777778 0.82978723 0.7979798  0.7755102
 0.82795699 0.83146067 0.79787234 0.78      ]

mean value: 0.7995697489801223

key: test_recall
value: [0.77777778 0.88888889 1.         0.66666667 0.77777778 0.9
 0.9        1.         1.         0.8       ]

mean value: 0.8711111111111112

key: train_recall
value: [0.90697674 0.87209302 0.89534884 0.90697674 0.91860465 0.89411765
 0.90588235 0.87058824 0.88235294 0.91764706]

mean value: 0.8970588235294118

key: test_roc_auc
value: [0.73888889 0.89444444 0.9        0.73333333 0.73888889 0.95
 0.67222222 0.77777778 0.94444444 0.84444444]

mean value: 0.8194444444444444

key: train_roc_auc
value: [0.82995896 0.81839945 0.81826265 0.85937073 0.84165527 0.81915185
 0.85991792 0.84808482 0.83071135 0.83091655]

mean value: 0.8356429548563611

key: test_jcc
value: [0.58333333 0.8        0.81818182 0.54545455 0.58333333 0.9
 0.6        0.71428571 0.90909091 0.72727273]

mean value: 0.7180952380952381

key: train_jcc
value: [0.72897196 0.70754717 0.71296296 0.76470588 0.74528302 0.71028037
 0.76237624 0.74       0.72115385 0.72897196]

mean value: 0.7322253416838178

MCC on Blind test: 0.21

Accuracy on Blind test: 0.63

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0078938  0.01025677 0.01055956 0.01001406 0.01028895 0.01002979
 0.01060987 0.01027226 0.01006579 0.01109862]

mean value: 0.01010894775390625

key: score_time
value: [0.00768161 0.01027536 0.01019096 0.01026964 0.01025581 0.01024055
 0.01023459 0.01036429 0.01042557 0.01041293]

mean value: 0.010035133361816407

key: test_mcc
value: [0.78888889 0.80507649 0.78888889 0.9        0.9        0.80903983
 0.48934516 0.50604808 1.         0.68888889]

mean value: 0.7676176228299976

key: train_mcc
value: [0.89769958 0.90744828 0.75930915 0.91870817 0.93006714 0.91967295
 0.89779492 0.87613518 0.87279143 0.96497948]

mean value: 0.8944606272273846

key: test_accuracy
value: [0.89473684 0.89473684 0.89473684 0.94736842 0.94736842 0.89473684
 0.73684211 0.73684211 1.         0.84210526]

mean value: 0.8789473684210526

key: train_accuracy
value: [0.94736842 0.95321637 0.86549708 0.95906433 0.96491228 0.95906433
 0.94736842 0.93567251 0.93567251 0.98245614]

mean value: 0.9450292397660818

key: test_fscore
value: [0.88888889 0.875      0.88888889 0.94736842 0.94736842 0.88888889
 0.7826087  0.70588235 1.         0.84210526]

mean value: 0.8766999820523175

key: train_fscore
value: [0.94972067 0.95238095 0.84563758 0.95857988 0.96551724 0.95757576
 0.94915254 0.93167702 0.93333333 0.98245614]

mean value: 0.9426031121967136

key: test_precision
value: [0.88888889 1.         0.88888889 0.9        0.9        1.
 0.69230769 0.85714286 1.         0.88888889]

mean value: 0.9016117216117217

key: train_precision
value: [0.91397849 0.97560976 1.         0.97590361 0.95454545 0.9875
 0.91304348 0.98684211 0.9625     0.97674419]

mean value: 0.9646667089295042

key: test_recall
value: [0.88888889 0.77777778 0.88888889 1.         1.         0.8
 0.9        0.6        1.         0.8       ]

mean value: 0.8655555555555555

key: train_recall
value: [0.98837209 0.93023256 0.73255814 0.94186047 0.97674419 0.92941176
 0.98823529 0.88235294 0.90588235 0.98823529]

mean value: 0.9263885088919288

key: test_roc_auc
value: [0.89444444 0.88888889 0.89444444 0.95       0.95       0.9
 0.72777778 0.74444444 1.         0.84444444]

mean value: 0.8794444444444445

key: train_roc_auc
value: [0.94712722 0.95335157 0.86627907 0.95916553 0.96484268 0.95889193
 0.94760602 0.93536252 0.93549932 0.98248974]

mean value: 0.9450615595075239

key: test_jcc
value: [0.8        0.77777778 0.8        0.9        0.9        0.8
 0.64285714 0.54545455 1.         0.72727273]

mean value: 0.7893362193362193

key: train_jcc
value: [0.90425532 0.90909091 0.73255814 0.92045455 0.93333333 0.91860465
 0.90322581 0.87209302 0.875      0.96551724]

mean value: 0.8934132968812135

MCC on Blind test: 0.25

Accuracy on Blind test: 0.83

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01043749 0.01005483 0.01035213 0.01050234 0.00995088 0.00993705
 0.0102191  0.00978518 0.01020694 0.00993204]

mean value: 0.010137796401977539

key: score_time
value: [0.01046276 0.01042175 0.01033998 0.01044917 0.01030922 0.01031208
 0.0102849  0.01033711 0.01031756 0.01019359]

mean value: 0.010342812538146973

key: test_mcc
value: [0.72456884 0.78888889 0.80903983 0.9        1.         0.80507649
 0.68888889 0.71611487 0.71611487 0.68888889]

mean value: 0.7837581572910308

key: train_mcc
value: [0.6741192  0.88517311 0.72063365 0.94157888 0.94157888 0.86350542
 0.92982216 0.89630221 0.7838874  0.93006714]

mean value: 0.8566668051359216

key: test_accuracy
value: [0.84210526 0.89473684 0.89473684 0.94736842 1.         0.89473684
 0.84210526 0.84210526 0.84210526 0.84210526]

mean value: 0.8842105263157894

key: train_accuracy
value: [0.8128655  0.94152047 0.84210526 0.97076023 0.97076023 0.92982456
 0.96491228 0.94736842 0.88304094 0.96491228]

mean value: 0.9228070175438596

key: test_fscore
value: [0.85714286 0.88888889 0.9        0.94736842 1.         0.90909091
 0.84210526 0.86956522 0.86956522 0.84210526]

mean value: 0.8925832037273685

key: train_fscore
value: [0.84313725 0.94382022 0.86432161 0.97109827 0.97109827 0.93258427
 0.96470588 0.94857143 0.89361702 0.96428571]

mean value: 0.9297239935602771

key: test_precision
value: [0.75       0.88888889 0.81818182 0.9        1.         0.83333333
 0.88888889 0.76923077 0.76923077 0.88888889]

mean value: 0.8506643356643356

key: train_precision
value: [0.72881356 0.91304348 0.76106195 0.96551724 0.96551724 0.89247312
 0.96470588 0.92222222 0.81553398 0.97590361]

mean value: 0.8904792285139268

key: test_recall
value: [1.         0.88888889 1.         1.         1.         1.
 0.8        1.         1.         0.8       ]

mean value: 0.9488888888888889

key: train_recall
value: [1.         0.97674419 1.         0.97674419 0.97674419 0.97647059
 0.96470588 0.97647059 0.98823529 0.95294118]

mean value: 0.97890560875513

key: test_roc_auc
value: [0.85       0.89444444 0.9        0.95       1.         0.88888889
 0.84444444 0.83333333 0.83333333 0.84444444]

mean value: 0.8838888888888888

key: train_roc_auc
value: [0.81176471 0.94131327 0.84117647 0.97072503 0.97072503 0.93009576
 0.96491108 0.94753762 0.88365253 0.96484268]

mean value: 0.9226744186046512

key: test_jcc
value: [0.75       0.8        0.81818182 0.9        1.         0.83333333
 0.72727273 0.76923077 0.76923077 0.72727273]

mean value: 0.8094522144522145

key: train_jcc
value: [0.72881356 0.89361702 0.76106195 0.94382022 0.94382022 0.87368421
 0.93181818 0.90217391 0.80769231 0.93103448]

mean value: 0.8717536072778391

MCC on Blind test: 0.3

Accuracy on Blind test: 0.87

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.07930589 0.06574082 0.06760287 0.06530023 0.06639194 0.06540418
 0.06537628 0.06978536 0.06504154 0.06514907]

mean value: 0.0675098180770874

key: score_time
value: [0.01435232 0.01375031 0.01374173 0.0138588  0.01387048 0.01396298
 0.01380754 0.01467633 0.01412082 0.01378512]

mean value: 0.013992643356323243

key: test_mcc
value: [0.89893315 0.80507649 0.9        0.9        1.         0.9
 0.9        0.89893315 1.         0.68888889]

mean value: 0.8891831674690281

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.89473684 0.94736842 0.94736842 1.         0.94736842
 0.94736842 0.94736842 1.         0.84210526]

mean value: 0.9421052631578947

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.875      0.94736842 0.94736842 1.         0.94736842
 0.94736842 0.95238095 1.         0.84210526]

mean value: 0.940013637033761

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.9        0.9        1.         1.
 1.         0.90909091 1.         0.88888889]

mean value: 0.9597979797979798

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.88888889 0.77777778 1.         1.         1.         0.9
 0.9        1.         1.         0.8       ]

mean value: 0.9266666666666666

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94444444 0.88888889 0.95       0.95       1.         0.95
 0.95       0.94444444 1.         0.84444444]

mean value: 0.9422222222222222

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.77777778 0.9        0.9        1.         0.9
 0.9        0.90909091 1.         0.72727273]

mean value: 0.8903030303030303

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.8

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03435493 0.03850365 0.03137898 0.02753544 0.03600645 0.03681159
 0.03525805 0.0235014  0.02443385 0.03159785]

mean value: 0.03193821907043457

key: score_time
value: [0.02038193 0.028126   0.01892233 0.02161765 0.02731395 0.02152586
 0.0160749  0.01497722 0.01738143 0.03216863]

mean value: 0.02184898853302002

key: test_mcc
value: [1.         1.         0.9        0.9        1.         0.80903983
 0.9        0.89893315 1.         0.80903983]

mean value: 0.9217012819862771

key: train_mcc
value: [0.97687783 0.96497948 0.97660739 0.9655126  0.97687783 0.97687158
 0.98837051 0.96497948 0.98837051 1.        ]

mean value: 0.9779447211260681

key: test_accuracy
value: [1.         1.         0.94736842 0.94736842 1.         0.89473684
 0.94736842 0.94736842 1.         0.89473684]

mean value: 0.9578947368421052

key: train_accuracy
value: [0.98830409 0.98245614 0.98830409 0.98245614 0.98830409 0.98830409
 0.99415205 0.98245614 0.99415205 1.        ]

mean value: 0.9888888888888888

key: test_fscore
value: [1.         1.         0.94736842 0.94736842 1.         0.88888889
 0.94736842 0.95238095 1.         0.88888889]

mean value: 0.9572263993316625

key: train_fscore
value: [0.98823529 0.98245614 0.98837209 0.98224852 0.98823529 0.98809524
 0.99408284 0.98245614 0.99408284 1.        ]

mean value: 0.9888264401238974

key: test_precision
value: [1.         1.         0.9        0.9        1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9709090909090909

key: train_precision
value: [1.         0.98823529 0.98837209 1.         1.         1.
 1.         0.97674419 1.         1.        ]

mean value: 0.9953351573187414

key: test_recall
value: [1.  1.  1.  1.  1.  0.8 0.9 1.  1.  0.8]

mean value: 0.95

key: train_recall
value: [0.97674419 0.97674419 0.98837209 0.96511628 0.97674419 0.97647059
 0.98823529 0.98823529 0.98823529 1.        ]

mean value: 0.9824897400820793

key: test_roc_auc
value: [1.         1.         0.95       0.95       1.         0.9
 0.95       0.94444444 1.         0.9       ]

mean value: 0.9594444444444444

key: train_roc_auc
value: [0.98837209 0.98248974 0.98830369 0.98255814 0.98837209 0.98823529
 0.99411765 0.98248974 0.99411765 1.        ]

mean value: 0.98890560875513

key: test_jcc
value: [1.         1.         0.9        0.9        1.         0.8
 0.9        0.90909091 1.         0.8       ]

mean value: 0.9209090909090909

key: train_jcc
value: [0.97674419 0.96551724 0.97701149 0.96511628 0.97674419 0.97647059
 0.98823529 0.96551724 0.98823529 1.        ]

mean value: 0.9779591804644874

MCC on Blind test: 0.32

Accuracy on Blind test: 0.84

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03849268 0.04577231 0.04582977 0.04652309 0.0430398  0.0437479
 0.04057932 0.04710174 0.04087782 0.04346371]

mean value: 0.04354281425476074

key: score_time
value: [0.01976991 0.01946115 0.01306677 0.01965237 0.02220845 0.02236199
 0.02163577 0.02175426 0.02835393 0.01248932]

mean value: 0.020075392723083497

key: test_mcc
value: [0.4719399  0.78888889 0.47777778 0.25844328 0.47777778 0.59554321
 0.26257545 0.62994079 0.89893315 0.47777778]

mean value: 0.5339598010514092

key: train_mcc
value: [0.9300862  0.88329458 0.89480164 0.90642955 0.90642955 0.91867501
 0.89526317 0.91867501 0.89526317 0.91818307]

mean value: 0.9067100950674556

key: test_accuracy
value: [0.73684211 0.89473684 0.73684211 0.63157895 0.73684211 0.78947368
 0.63157895 0.78947368 0.94736842 0.73684211]

mean value: 0.763157894736842

key: train_accuracy
value: [0.96491228 0.94152047 0.94736842 0.95321637 0.95321637 0.95906433
 0.94736842 0.95906433 0.94736842 0.95906433]

mean value: 0.9532163742690059

key: test_fscore
value: [0.70588235 0.88888889 0.73684211 0.58823529 0.73684211 0.77777778
 0.69565217 0.83333333 0.95238095 0.73684211]

mean value: 0.7652677089142292

key: train_fscore
value: [0.96470588 0.94117647 0.94736842 0.95348837 0.95348837 0.95808383
 0.94610778 0.95808383 0.94610778 0.95857988]

mean value: 0.9527190633369593

key: test_precision
value: [0.75       0.88888889 0.7        0.625      0.7        0.875
 0.61538462 0.71428571 0.90909091 0.77777778]

mean value: 0.7555427905427905

key: train_precision
value: [0.97619048 0.95238095 0.95294118 0.95348837 0.95348837 0.97560976
 0.96341463 0.97560976 0.96341463 0.96428571]

mean value: 0.9630823844001583

key: test_recall
value: [0.66666667 0.88888889 0.77777778 0.55555556 0.77777778 0.7
 0.8        1.         1.         0.7       ]

mean value: 0.7866666666666666

key: train_recall
value: [0.95348837 0.93023256 0.94186047 0.95348837 0.95348837 0.94117647
 0.92941176 0.94117647 0.92941176 0.95294118]

mean value: 0.9426675786593708

key: test_roc_auc
value: [0.73333333 0.89444444 0.73888889 0.62777778 0.73888889 0.79444444
 0.62222222 0.77777778 0.94444444 0.73888889]

mean value: 0.7611111111111111

key: train_roc_auc
value: [0.96497948 0.94158687 0.94740082 0.95321477 0.95321477 0.95896033
 0.94726402 0.95896033 0.94726402 0.95902873]

mean value: 0.953187414500684

key: test_jcc
value: [0.54545455 0.8        0.58333333 0.41666667 0.58333333 0.63636364
 0.53333333 0.71428571 0.90909091 0.58333333]

mean value: 0.6305194805194805

key: train_jcc
value: [0.93181818 0.88888889 0.9        0.91111111 0.91111111 0.91954023
 0.89772727 0.91954023 0.89772727 0.92045455]

mean value: 0.9097918843608499

MCC on Blind test: 0.25

Accuracy on Blind test: 0.67

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.12574768 0.10982108 0.10891128 0.11000681 0.10834146 0.1092
 0.10903215 0.11250973 0.11054492 0.10791087]

mean value: 0.11120259761810303

key: score_time
value: [0.00880623 0.00850511 0.00842857 0.00843477 0.00855279 0.00872946
 0.00836015 0.00840807 0.00873184 0.0089736 ]

mean value: 0.008593058586120606

key: test_mcc
value: [0.9        1.         0.9        0.9        1.         0.80903983
 0.68888889 0.80507649 1.         0.68888889]

mean value: 0.8691894098633082

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 1.         0.94736842 0.94736842 1.         0.89473684
 0.84210526 0.89473684 1.         0.84210526]

mean value: 0.9315789473684211

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 1.         0.94736842 0.94736842 1.         0.88888889
 0.84210526 0.90909091 1.         0.84210526]

mean value: 0.9324295587453483

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        1.         0.9        0.9        1.         1.
 0.88888889 0.83333333 1.         0.88888889]

mean value: 0.9311111111111111

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  1.  0.8 0.8 1.  1.  0.8]

mean value: 0.9400000000000001

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95       1.         0.95       0.95       1.         0.9
 0.84444444 0.88888889 1.         0.84444444]

mean value: 0.9327777777777778

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        1.         0.9        0.9        1.         0.8
 0.72727273 0.83333333 1.         0.72727273]

mean value: 0.8787878787878788

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00977683 0.01165676 0.01222324 0.01371932 0.01175737 0.01175189
 0.01182127 0.01176047 0.01254773 0.01222157]

mean value: 0.011923646926879883

key: score_time
value: [0.01100063 0.01107335 0.01077461 0.01081491 0.0108037  0.01076961
 0.01068234 0.01070929 0.01347923 0.01104093]

mean value: 0.011114859580993652

key: test_mcc
value: [0.59554321 0.45643546 0.80903983 0.54433105 0.38204659 0.56694671
 0.36666667 0.25844328 0.71611487 0.48934516]

mean value: 0.5184912848824843

key: train_mcc
value: [0.94158687 0.88403644 0.97687783 0.68754923 0.95321477 0.77792524
 0.82502766 0.89967314 0.77850962 0.76887959]

mean value: 0.8493280396728016

key: test_accuracy
value: [0.78947368 0.68421053 0.89473684 0.73684211 0.68421053 0.73684211
 0.68421053 0.63157895 0.84210526 0.73684211]

mean value: 0.7421052631578947

key: train_accuracy
value: [0.97076023 0.94152047 0.98830409 0.8245614  0.97660819 0.87719298
 0.9122807  0.94736842 0.87719298 0.87134503]

mean value: 0.9187134502923976

key: test_fscore
value: [0.8        0.5        0.9        0.61538462 0.7        0.66666667
 0.7        0.66666667 0.86956522 0.7826087 ]

mean value: 0.7200891861761427

key: train_fscore
value: [0.97076023 0.94047619 0.98823529 0.79166667 0.97674419 0.8590604
 0.91017964 0.94409938 0.89005236 0.88541667]

mean value: 0.9156691016197868

key: test_precision
value: [0.72727273 1.         0.81818182 1.         0.63636364 1.
 0.7        0.63636364 0.76923077 0.69230769]

mean value: 0.7979720279720279

key: train_precision
value: [0.97647059 0.96341463 1.         0.98275862 0.97674419 1.
 0.92682927 1.         0.80188679 0.79439252]

mean value: 0.9422496613227801

key: test_recall
value: [0.88888889 0.33333333 1.         0.44444444 0.77777778 0.5
 0.7        0.7        1.         0.9       ]

mean value: 0.7244444444444444

key: train_recall
value: [0.96511628 0.91860465 0.97674419 0.6627907  0.97674419 0.75294118
 0.89411765 0.89411765 1.         1.        ]

mean value: 0.9041176470588235

key: test_roc_auc
value: [0.79444444 0.66666667 0.9        0.72222222 0.68888889 0.75
 0.68333333 0.62777778 0.83333333 0.72777778]

mean value: 0.7394444444444445

key: train_roc_auc
value: [0.97079343 0.94165527 0.98837209 0.825513   0.97660739 0.87647059
 0.9121751  0.94705882 0.87790698 0.87209302]

mean value: 0.9188645690834473

key: test_jcc
value: [0.66666667 0.33333333 0.81818182 0.44444444 0.53846154 0.5
 0.53846154 0.5        0.76923077 0.64285714]

mean value: 0.5751637251637252

key: train_jcc
value: [0.94318182 0.88764045 0.97674419 0.65517241 0.95454545 0.75294118
 0.83516484 0.89411765 0.80188679 0.79439252]

mean value: 0.8495787296516654

MCC on Blind test: 0.23

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01883245 0.02752137 0.02750969 0.0275383  0.02753377 0.0268929
 0.02757883 0.02765822 0.0218935  0.0287528 ]

mean value: 0.026171183586120604

key: score_time
value: [0.01898217 0.01984358 0.01071167 0.01931334 0.0211885  0.02128792
 0.0107007  0.01979136 0.02170277 0.02104163]

mean value: 0.018456363677978517

key: test_mcc
value: [0.89893315 0.89893315 0.9        0.57777778 0.89893315 1.
 0.58655573 0.68543653 1.         0.72456884]

mean value: 0.8171138317218849

key: train_mcc
value: [0.91819425 0.92982216 0.91870817 0.91870817 0.91819425 0.90666492
 0.88303694 0.90739811 0.90666492 0.92982216]

mean value: 0.9137214046919647

key: test_accuracy
value: [0.94736842 0.94736842 0.94736842 0.78947368 0.94736842 1.
 0.78947368 0.84210526 1.         0.84210526]

mean value: 0.9052631578947368

key: train_accuracy
value: [0.95906433 0.96491228 0.95906433 0.95906433 0.95906433 0.95321637
 0.94152047 0.95321637 0.95321637 0.96491228]

mean value: 0.9567251461988304

key: test_fscore
value: [0.94117647 0.94117647 0.94736842 0.77777778 0.94117647 1.
 0.81818182 0.85714286 1.         0.82352941]

mean value: 0.9047529697684497

key: train_fscore
value: [0.95906433 0.96511628 0.95857988 0.95857988 0.95906433 0.95238095
 0.94117647 0.95180723 0.95238095 0.96470588]

mean value: 0.9562856183972881

key: test_precision
value: [1.         1.         0.9        0.77777778 1.         1.
 0.75       0.81818182 1.         1.        ]

mean value: 0.9245959595959596

key: train_precision
value: [0.96470588 0.96511628 0.97590361 0.97590361 0.96470588 0.96385542
 0.94117647 0.97530864 0.96385542 0.96470588]

mean value: 0.9655237110981292

key: test_recall
value: [0.88888889 0.88888889 1.         0.77777778 0.88888889 1.
 0.9        0.9        1.         0.7       ]

mean value: 0.8944444444444444

key: train_recall
value: [0.95348837 0.96511628 0.94186047 0.94186047 0.95348837 0.94117647
 0.94117647 0.92941176 0.94117647 0.96470588]

mean value: 0.9473461012311901

key: test_roc_auc
value: [0.94444444 0.94444444 0.95       0.78888889 0.94444444 1.
 0.78333333 0.83888889 1.         0.85      ]

mean value: 0.9044444444444445

key: train_roc_auc
value: [0.95909713 0.96491108 0.95916553 0.95916553 0.95909713 0.95314637
 0.94151847 0.95307798 0.95314637 0.96491108]

mean value: 0.9567236662106703

key: test_jcc
value: [0.88888889 0.88888889 0.9        0.63636364 0.88888889 1.
 0.69230769 0.75       1.         0.7       ]

mean value: 0.8345337995337995

key: train_jcc
value: [0.92134831 0.93258427 0.92045455 0.92045455 0.92134831 0.90909091
 0.88888889 0.90804598 0.90909091 0.93181818]

mean value: 0.9163124855685878

MCC on Blind test: 0.19

Accuracy on Blind test: 0.76

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.19279718 0.17178965 0.18140745 0.1711936  0.17231464 0.17179489
 0.17664242 0.17488527 0.17687273 0.20818853]

mean value: 0.17978863716125487

key: score_time
value: [0.01818156 0.01183128 0.01928973 0.02106404 0.02174473 0.02162886
 0.01080251 0.02007103 0.02017498 0.02091074]

mean value: 0.0185699462890625

key: test_mcc
value: [1.         0.71611487 0.80903983 0.9        0.89893315 0.89893315
 0.80903983 0.78888889 1.         0.9       ]

mean value: 0.8720949732742082

key: train_mcc
value: [0.92982216 0.94157888 0.9300862  0.9300862  0.94157888 0.92982216
 0.94158687 0.9649747  0.94158687 0.94158687]

mean value: 0.9392709798490188

key: test_accuracy
value: [1.         0.84210526 0.89473684 0.94736842 0.94736842 0.94736842
 0.89473684 0.89473684 1.         0.94736842]

mean value: 0.9315789473684211

key: train_accuracy
value: [0.96491228 0.97076023 0.96491228 0.96491228 0.97076023 0.96491228
 0.97076023 0.98245614 0.97076023 0.97076023]

mean value: 0.9695906432748538

key: test_fscore
value: [1.         0.8        0.9        0.94736842 0.94117647 0.95238095
 0.88888889 0.9        1.         0.94736842]

mean value: 0.927718315396334

key: train_fscore
value: [0.96511628 0.97109827 0.96470588 0.96470588 0.97109827 0.96470588
 0.97076023 0.98224852 0.97076023 0.97076023]

mean value: 0.9695959680384943

key: test_precision
value: [1.         1.         0.81818182 0.9        1.         0.90909091
 1.         0.9        1.         1.        ]

mean value: 0.9527272727272728

key: train_precision
value: [0.96511628 0.96551724 0.97619048 0.97619048 0.96551724 0.96470588
 0.96511628 0.98809524 0.96511628 0.96511628]

mean value: 0.9696681671866823

key: test_recall
value: [1.         0.66666667 1.         1.         0.88888889 1.
 0.8        0.9        1.         0.9       ]

mean value: 0.9155555555555556

key: train_recall
value: [0.96511628 0.97674419 0.95348837 0.95348837 0.97674419 0.96470588
 0.97647059 0.97647059 0.97647059 0.97647059]

mean value: 0.9696169630642955

key: test_roc_auc
value: [1.         0.83333333 0.9        0.95       0.94444444 0.94444444
 0.9        0.89444444 1.         0.95      ]

mean value: 0.9316666666666666

key: train_roc_auc
value: [0.96491108 0.97072503 0.96497948 0.96497948 0.97072503 0.96491108
 0.97079343 0.98242134 0.97079343 0.97079343]

mean value: 0.9696032831737347

key: test_jcc
value: [1.         0.66666667 0.81818182 0.9        0.88888889 0.90909091
 0.8        0.81818182 1.         0.9       ]

mean value: 0.8701010101010102

key: train_jcc
value: [0.93258427 0.94382022 0.93181818 0.93181818 0.94382022 0.93181818
 0.94318182 0.96511628 0.94318182 0.94318182]

mean value: 0.9410340998170891

MCC on Blind test: 0.2

Accuracy on Blind test: 0.78

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02805471 0.02848077 0.05038261 0.0256362  0.0284946  0.02617216
 0.02909684 0.03196239 0.02643657 0.03116679]

mean value: 0.030588364601135253

key: score_time
value: [0.01084852 0.01086903 0.01122189 0.01082182 0.01088285 0.01087236
 0.01082659 0.010993   0.01088881 0.01089334]

mean value: 0.010911822319030762

key: test_mcc
value: [0.83214239 0.91580648 0.85952381 0.86205133 0.9186708  0.77460317
 0.77269114 0.91465912 0.80829038 0.94440028]

mean value: 0.8602838910751969

key: train_mcc
value: [0.88350545 0.88047545 0.88677561 0.89921235 0.90558158 0.88987113
 0.87117688 0.88686262 0.88065992 0.86512643]

mean value: 0.8849247400940407

key: test_accuracy
value: [0.91549296 0.95774648 0.92957746 0.92957746 0.95774648 0.88732394
 0.88571429 0.95714286 0.9        0.97142857]

mean value: 0.9291750503018108

key: train_accuracy
value: [0.94173228 0.94015748 0.94330709 0.9496063  0.95275591 0.94488189
 0.93553459 0.94339623 0.94025157 0.93238994]

mean value: 0.9424013271925915

key: test_fscore
value: [0.91176471 0.95652174 0.92957746 0.93333333 0.96       0.88888889
 0.88235294 0.95774648 0.89230769 0.97058824]

mean value: 0.9283081479675263

key: train_fscore
value: [0.94154818 0.93968254 0.94285714 0.94952681 0.95238095 0.94435612
 0.93502377 0.94303797 0.93968254 0.93141946]

mean value: 0.9419515496773954

key: test_precision
value: [0.93939394 0.97058824 0.91666667 0.8974359  0.92307692 0.88888889
 0.90909091 0.94444444 0.96666667 1.        ]

mean value: 0.9356252570958453

key: train_precision
value: [0.94603175 0.94871795 0.95192308 0.94952681 0.95846645 0.95192308
 0.94249201 0.94904459 0.94871795 0.94498382]

mean value: 0.9491827482405085

key: test_recall
value: [0.88571429 0.94285714 0.94285714 0.97222222 1.         0.88888889
 0.85714286 0.97142857 0.82857143 0.94285714]

mean value: 0.9232539682539682

key: train_recall
value: [0.93710692 0.93081761 0.93396226 0.94952681 0.94637224 0.93690852
 0.92767296 0.93710692 0.93081761 0.91823899]

mean value: 0.934853084141817

key: test_roc_auc
value: [0.91507937 0.95753968 0.9297619  0.92896825 0.95714286 0.88730159
 0.88571429 0.95714286 0.9        0.97142857]

mean value: 0.9290079365079364

key: train_roc_auc
value: [0.94173958 0.94017221 0.94332183 0.94960617 0.95274587 0.94486935
 0.93553459 0.94339623 0.94025157 0.93238994]

mean value: 0.9424027339642481

key: test_jcc
value: [0.83783784 0.91666667 0.86842105 0.875      0.92307692 0.8
 0.78947368 0.91891892 0.80555556 0.94285714]

mean value: 0.867780778175515

key: train_jcc
value: [0.88955224 0.88622754 0.89189189 0.9039039  0.90909091 0.89457831
 0.87797619 0.89221557 0.88622754 0.87164179]

mean value: 0.8903305897149288

MCC on Blind test: 0.25

Accuracy on Blind test: 0.79

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.75583076 0.91039634 0.74130368 0.72916102 0.87464905 0.73867345
 0.78031874 0.86983085 0.73920441 0.86410165]

mean value: 0.8003469944000244

key: score_time
value: [0.0140388  0.01402926 0.01415515 0.01423001 0.01435184 0.01453185
 0.01459599 0.01456857 0.01456761 0.01473331]

mean value: 0.014380240440368652

key: test_mcc
value: [0.88880092 0.97220047 0.91885703 0.9186708  0.83240693 0.94365079
 0.94440028 0.91766294 0.94285714 0.97182532]

mean value: 0.9251332620272206

key: train_mcc
value: [0.96881022 0.95928679 0.96558776 0.96559014 0.96228175 0.96574383
 0.96564279 0.9625688  0.96243548 0.95935195]

mean value: 0.9637299511779713

key: test_accuracy
value: [0.94366197 0.98591549 0.95774648 0.95774648 0.91549296 0.97183099
 0.97142857 0.95714286 0.97142857 0.98571429]

mean value: 0.9618108651911469

key: train_accuracy
value: [0.98425197 0.97952756 0.98267717 0.98267717 0.98110236 0.98267717
 0.9827044  0.98113208 0.98113208 0.97955975]

mean value: 0.9817441687713564

key: test_fscore
value: [0.94444444 0.98550725 0.95890411 0.96       0.91428571 0.97222222
 0.97222222 0.95890411 0.97142857 0.98550725]

mean value: 0.962342588653488

key: train_fscore
value: [0.98447205 0.97978227 0.98289269 0.98283931 0.98119122 0.98289269
 0.98289269 0.98136646 0.98130841 0.97978227]

mean value: 0.981942006942752

key: test_precision
value: [0.91891892 1.         0.92105263 0.92307692 0.94117647 0.97222222
 0.94594595 0.92105263 0.97142857 1.        ]

mean value: 0.9514874315338712

key: train_precision
value: [0.97239264 0.96923077 0.97230769 0.97222222 0.97507788 0.96932515
 0.97230769 0.96932515 0.97222222 0.96923077]

mean value: 0.9713642193926582

key: test_recall
value: [0.97142857 0.97142857 1.         1.         0.88888889 0.97222222
 1.         1.         0.97142857 0.97142857]

mean value: 0.9746825396825397

key: train_recall
value: [0.99685535 0.99056604 0.99371069 0.99369085 0.9873817  0.99684543
 0.99371069 0.99371069 0.99056604 0.99056604]

mean value: 0.992760351566375

key: test_roc_auc
value: [0.94404762 0.98571429 0.95833333 0.95714286 0.91587302 0.9718254
 0.97142857 0.95714286 0.97142857 0.98571429]

mean value: 0.9618650793650794

key: train_roc_auc
value: [0.98423209 0.97951015 0.98265976 0.98269448 0.98111224 0.98269944
 0.9827044  0.98113208 0.98113208 0.97955975]

mean value: 0.981743646211535

key: test_jcc
value: [0.89473684 0.97142857 0.92105263 0.92307692 0.84210526 0.94594595
 0.94594595 0.92105263 0.94444444 0.97142857]

mean value: 0.9281217770691454

key: train_jcc
value: [0.96941896 0.96036585 0.96636086 0.96625767 0.96307692 0.96636086
 0.96636086 0.96341463 0.96330275 0.96036585]

mean value: 0.964528521459756

MCC on Blind test: 0.25

Accuracy on Blind test: 0.81

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01126242 0.01005054 0.00817871 0.00798202 0.00799775 0.00875211
 0.00785851 0.00802612 0.00777888 0.00775719]

mean value: 0.008564424514770509

key: score_time
value: [0.01088428 0.00858426 0.00835347 0.00822639 0.00815606 0.00880837
 0.00810361 0.00805187 0.00803757 0.00845098]

mean value: 0.008565688133239746

key: test_mcc
value: [0.83095238 0.77991323 0.88880092 0.83214239 0.78542356 0.63412698
 0.47304992 0.72501849 0.7581754  0.80032673]

mean value: 0.7507929990980453

key: train_mcc
value: [0.78353551 0.76086142 0.77667955 0.77992485 0.75779503 0.77726182
 0.66166919 0.76767526 0.77154353 0.76445717]

mean value: 0.7601403313212555

key: test_accuracy
value: [0.91549296 0.88732394 0.94366197 0.91549296 0.88732394 0.81690141
 0.72857143 0.85714286 0.87142857 0.9       ]

mean value: 0.8723340040241448

key: train_accuracy
value: [0.89133858 0.88031496 0.88818898 0.88976378 0.87874016 0.88818898
 0.82704403 0.8836478  0.88522013 0.88207547]

mean value: 0.8794522854454514

key: test_fscore
value: [0.91428571 0.89189189 0.94444444 0.91891892 0.8974359  0.81690141
 0.68852459 0.86842105 0.85714286 0.90140845]

mean value: 0.8699375226070167

key: train_fscore
value: [0.89400922 0.88198758 0.88992248 0.89130435 0.88024883 0.89060092
 0.81292517 0.88544892 0.88820827 0.88372093]

mean value: 0.8798376667002142

key: test_precision
value: [0.91428571 0.84615385 0.91891892 0.89473684 0.83333333 0.82857143
 0.80769231 0.80487805 0.96428571 0.88888889]

mean value: 0.8701745043015903

key: train_precision
value: [0.87387387 0.87116564 0.87767584 0.87767584 0.86809816 0.87048193
 0.88518519 0.87195122 0.86567164 0.87155963]

mean value: 0.8733338966738833

key: test_recall
value: [0.91428571 0.94285714 0.97142857 0.94444444 0.97222222 0.80555556
 0.6        0.94285714 0.77142857 0.91428571]

mean value: 0.8779365079365079

key: train_recall
value: [0.91509434 0.89308176 0.90251572 0.90536278 0.89274448 0.91167192
 0.75157233 0.89937107 0.91194969 0.89622642]

mean value: 0.8879590500565443

key: test_roc_auc
value: [0.91547619 0.88809524 0.94404762 0.91507937 0.88611111 0.81706349
 0.72857143 0.85714286 0.87142857 0.9       ]

mean value: 0.8723015873015872

key: train_roc_auc
value: [0.89130111 0.88029482 0.88816638 0.88978831 0.87876218 0.8882259
 0.82704403 0.8836478  0.88522013 0.88207547]

mean value: 0.8794526119477015

key: test_jcc
value: [0.84210526 0.80487805 0.89473684 0.85       0.81395349 0.69047619
 0.525      0.76744186 0.75       0.82051282]

mean value: 0.7759104513869866

key: train_jcc
value: [0.80833333 0.78888889 0.80167598 0.80392157 0.78611111 0.80277778
 0.68481375 0.79444444 0.79889807 0.79166667]

mean value: 0.786153159371031

MCC on Blind test: 0.28

Accuracy on Blind test: 0.76

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00870681 0.00830054 0.00821733 0.00855494 0.00802279 0.00820494
 0.00809073 0.00801396 0.0081532  0.0081017 ]

mean value: 0.0082366943359375

key: score_time
value: [0.00878096 0.00862241 0.00818205 0.0089016  0.00812292 0.00837469
 0.00819492 0.00814867 0.00829911 0.00819397]

mean value: 0.008382129669189452

key: test_mcc
value: [0.69047619 0.63383658 0.69762232 0.60881948 0.53699395 0.52233453
 0.48891771 0.7581754  0.65821838 0.60901553]

mean value: 0.6204410069366437

key: train_mcc
value: [0.60949181 0.61718891 0.63374209 0.58769936 0.60523202 0.64225486
 0.62174197 0.61279592 0.60729861 0.63335019]

mean value: 0.6170795744074016

key: test_accuracy
value: [0.84507042 0.81690141 0.84507042 0.8028169  0.76056338 0.76056338
 0.74285714 0.87142857 0.82857143 0.8       ]

mean value: 0.8073843058350101

key: train_accuracy
value: [0.8015748  0.80787402 0.81574803 0.79212598 0.8015748  0.82047244
 0.80974843 0.80503145 0.80031447 0.81289308]

mean value: 0.8067357500123805

key: test_fscore
value: [0.84507042 0.8115942  0.85333333 0.81578947 0.79012346 0.77333333
 0.75675676 0.88311688 0.83333333 0.81578947]

mean value: 0.8178240669465947

key: train_fscore
value: [0.81524927 0.81458967 0.82352941 0.80239521 0.80909091 0.82568807
 0.81749623 0.81381381 0.81405564 0.82627737]

mean value: 0.8162185588580184

key: test_precision
value: [0.83333333 0.82352941 0.8        0.775      0.71111111 0.74358974
 0.71794872 0.80952381 0.81081081 0.75609756]

mean value: 0.7780944499057842

key: train_precision
value: [0.76373626 0.78823529 0.79130435 0.76353276 0.77842566 0.80118694
 0.78550725 0.77873563 0.76164384 0.77111717]

mean value: 0.7783425149199308

key: test_recall
value: [0.85714286 0.8        0.91428571 0.86111111 0.88888889 0.80555556
 0.8        0.97142857 0.85714286 0.88571429]

mean value: 0.8641269841269841

key: train_recall
value: [0.87421384 0.8427673  0.85849057 0.84542587 0.84227129 0.85173502
 0.85220126 0.85220126 0.87421384 0.88993711]

mean value: 0.8583457333888856

key: test_roc_auc
value: [0.8452381  0.81666667 0.84603175 0.80198413 0.75873016 0.75992063
 0.74285714 0.87142857 0.82857143 0.8       ]

mean value: 0.8071428571428572

key: train_roc_auc
value: [0.80146023 0.80781898 0.81568061 0.79220979 0.80163879 0.8205216
 0.80974843 0.80503145 0.80031447 0.81289308]

mean value: 0.8067317421582049

key: test_jcc
value: [0.73170732 0.68292683 0.74418605 0.68888889 0.65306122 0.63043478
 0.60869565 0.79069767 0.71428571 0.68888889]

mean value: 0.6933773018607593

key: train_jcc
value: [0.68811881 0.68717949 0.7        0.67       0.67938931 0.703125
 0.69132653 0.68607595 0.68641975 0.7039801 ]

mean value: 0.6895614944606016

MCC on Blind test: 0.22

Accuracy on Blind test: 0.63

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00879741 0.00835109 0.00783396 0.00787377 0.00823879 0.00841093
 0.00836062 0.00857162 0.00833774 0.00834942]

mean value: 0.008312535285949708

key: score_time
value: [0.01278543 0.01182032 0.0121007  0.01211429 0.01177168 0.01202226
 0.01196551 0.01235938 0.01185727 0.01215744]

mean value: 0.012095427513122559

key: test_mcc
value: [0.7468254  0.81122596 0.78640246 0.56233478 0.72811105 0.69047619
 0.65714286 0.80829038 0.85749293 0.72501849]

mean value: 0.7373320483363375

key: train_mcc
value: [0.81913455 0.82781019 0.80765457 0.82304906 0.81230309 0.83247297
 0.84055828 0.80334707 0.80678833 0.83153352]

mean value: 0.82046516428445

key: test_accuracy
value: [0.87323944 0.90140845 0.88732394 0.77464789 0.85915493 0.84507042
 0.82857143 0.9        0.92857143 0.85714286]

mean value: 0.8655130784708249

key: train_accuracy
value: [0.90708661 0.91181102 0.9007874  0.91023622 0.90393701 0.91338583
 0.91981132 0.89937107 0.9009434  0.91352201]

mean value: 0.9080891893230327

key: test_fscore
value: [0.87323944 0.90666667 0.89473684 0.8        0.87179487 0.84507042
 0.82857143 0.90666667 0.92957746 0.86842105]

mean value: 0.8724744852380137

key: train_fscore
value: [0.91207154 0.91616766 0.90666667 0.91350531 0.90854573 0.91803279
 0.92165899 0.90447761 0.90611028 0.91778774]

mean value: 0.9125024315633475

key: test_precision
value: [0.86111111 0.85       0.82926829 0.72727273 0.80952381 0.85714286
 0.82857143 0.85       0.91666667 0.80487805]

mean value: 0.8334434941752015

key: train_precision
value: [0.86685552 0.87428571 0.85714286 0.88011696 0.86571429 0.8700565
 0.9009009  0.86079545 0.8611898  0.87464387]

mean value: 0.8711701869251592

key: test_recall
value: [0.88571429 0.97142857 0.97142857 0.88888889 0.94444444 0.83333333
 0.82857143 0.97142857 0.94285714 0.94285714]

mean value: 0.9180952380952381

key: train_recall
value: [0.96226415 0.96226415 0.96226415 0.94952681 0.95583596 0.97160883
 0.94339623 0.95283019 0.95597484 0.96540881]

mean value: 0.9581374124556078

key: test_roc_auc
value: [0.8734127  0.90238095 0.88849206 0.77301587 0.85793651 0.8452381
 0.82857143 0.9        0.92857143 0.85714286]

mean value: 0.8654761904761905

key: train_roc_auc
value: [0.90699958 0.91173144 0.90069044 0.910298   0.90401861 0.91347737
 0.91981132 0.89937107 0.9009434  0.91352201]

mean value: 0.9080863242267325

key: test_jcc
value: [0.775      0.82926829 0.80952381 0.66666667 0.77272727 0.73170732
 0.70731707 0.82926829 0.86842105 0.76744186]

mean value: 0.77573416376242

key: train_jcc
value: [0.83835616 0.84530387 0.82926829 0.84078212 0.83241758 0.84848485
 0.85470085 0.82561308 0.82833787 0.8480663 ]

mean value: 0.8391330984999132

MCC on Blind test: 0.25

Accuracy on Blind test: 0.74

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.02019525 0.02194238 0.02207088 0.01722932 0.0174396  0.01698589
 0.0200386  0.01803017 0.01915526 0.01805472]

mean value: 0.019114208221435548

key: score_time
value: [0.00993657 0.0108788  0.01091743 0.00970459 0.00995064 0.00961328
 0.00987148 0.01081896 0.00993538 0.00960827]

mean value: 0.010123538970947265

key: test_mcc
value: [0.8594125  0.89315217 0.9451949  0.8594125  0.9186708  0.72329377
 0.77142857 0.860309   0.8340361  0.82992752]

mean value: 0.8494837832110852

key: train_mcc
value: [0.89928172 0.88663036 0.88976737 0.88987659 0.88663261 0.88661389
 0.874283   0.87739754 0.88368712 0.88695034]

mean value: 0.8861120546487431

key: test_accuracy
value: [0.92957746 0.94366197 0.97183099 0.92957746 0.95774648 0.85915493
 0.88571429 0.92857143 0.91428571 0.91428571]

mean value: 0.923440643863179

key: train_accuracy
value: [0.9496063  0.94330709 0.94488189 0.94488189 0.94330709 0.94330709
 0.93710692 0.93867925 0.9418239  0.94339623]

mean value: 0.9430297627890853

key: test_fscore
value: [0.92753623 0.94594595 0.97222222 0.93150685 0.96       0.85294118
 0.88571429 0.93150685 0.90909091 0.91666667]

mean value: 0.9233131136624813

key: train_fscore
value: [0.95       0.94357367 0.94505495 0.94522692 0.94339623 0.94321767
 0.9375     0.93838863 0.94209703 0.94392523]

mean value: 0.9432380307696029

key: test_precision
value: [0.94117647 0.8974359  0.94594595 0.91891892 0.92307692 0.90625
 0.88571429 0.89473684 0.96774194 0.89189189]

mean value: 0.9172889111161232

key: train_precision
value: [0.94409938 0.940625   0.94357367 0.9378882  0.94043887 0.94321767
 0.93167702 0.94285714 0.9376947  0.93518519]

mean value: 0.9397256833165559

key: test_recall
value: [0.91428571 1.         1.         0.94444444 1.         0.80555556
 0.88571429 0.97142857 0.85714286 0.94285714]

mean value: 0.9321428571428572

key: train_recall
value: [0.95597484 0.94654088 0.94654088 0.95268139 0.94637224 0.94321767
 0.94339623 0.93396226 0.94654088 0.95283019]

mean value: 0.9468057456897407

key: test_roc_auc
value: [0.92936508 0.94444444 0.97222222 0.92936508 0.95714286 0.85992063
 0.88571429 0.92857143 0.91428571 0.91428571]

mean value: 0.923531746031746

key: train_roc_auc
value: [0.94959625 0.94330199 0.94487927 0.94489415 0.94331191 0.94330695
 0.93710692 0.93867925 0.9418239  0.94339623]

mean value: 0.9430296807729699

key: test_jcc
value: [0.86486486 0.8974359  0.94594595 0.87179487 0.92307692 0.74358974
 0.79487179 0.87179487 0.83333333 0.84615385]

mean value: 0.8592862092862092

key: train_jcc
value: [0.9047619  0.89317507 0.89583333 0.89614243 0.89285714 0.89253731
 0.88235294 0.88392857 0.89053254 0.89380531]

mean value: 0.8925926568521868

MCC on Blind test: 0.32

Accuracy on Blind test: 0.8

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.9707489  1.80206823 2.04756761 2.01823282 1.97800541 1.954983
 2.09475994 1.9617672  1.91035748 1.84747243]

mean value: 1.9585963010787963

key: score_time
value: [0.01395583 0.01141071 0.02199006 0.01435852 0.01450992 0.01351452
 0.01374483 0.01391625 0.02986383 0.01390171]

mean value: 0.016116619110107422

key: test_mcc
value: [0.88730159 0.97222222 0.91885703 0.91580648 0.94511009 0.97220047
 0.94440028 0.94440028 0.97182532 0.97182532]

mean value: 0.944394907660236

key: train_mcc
value: [0.99370077 0.99372043 0.99372043 0.99685535 0.99685535 0.99685535
 0.99686027 1.         0.99686027 0.99686027]

mean value: 0.9962288484844167

key: test_accuracy
value: [0.94366197 0.98591549 0.95774648 0.95774648 0.97183099 0.98591549
 0.97142857 0.97142857 0.98571429 0.98571429]

mean value: 0.9717102615694165

key: train_accuracy
value: [0.99685039 0.99685039 0.99685039 0.9984252  0.9984252  0.9984252
 0.99842767 1.         0.99842767 0.99842767]

mean value: 0.9981109790521467

key: test_fscore
value: [0.94285714 0.98591549 0.95890411 0.95890411 0.97297297 0.98630137
 0.97222222 0.97222222 0.98591549 0.98591549]

mean value: 0.9722130628188895

key: train_fscore
value: [0.99685535 0.9968652  0.9968652  0.9984252  0.9984252  0.9984252
 0.99843014 1.         0.99843014 0.99843014]

mean value: 0.9981151767848494

key: test_precision
value: [0.94285714 0.97222222 0.92105263 0.94594595 0.94736842 0.97297297
 0.94594595 0.94594595 0.97222222 0.97222222]

mean value: 0.9538755672966199

key: train_precision
value: [0.99685535 0.99375    0.99375    0.99685535 0.99685535 0.99685535
 0.9968652  1.         0.9968652  0.9968652 ]

mean value: 0.9965516994933066

key: test_recall
value: [0.94285714 1.         1.         0.97222222 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9915079365079364

key: train_recall
value: [0.99685535 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.999685534591195

key: test_roc_auc
value: [0.94365079 0.98611111 0.95833333 0.95753968 0.97142857 0.98571429
 0.97142857 0.97142857 0.98571429 0.98571429]

mean value: 0.9717063492063491

key: train_roc_auc
value: [0.99685039 0.99684543 0.99684543 0.99842767 0.99842767 0.99842767
 0.99842767 1.         0.99842767 0.99842767]

mean value: 0.9981107275360594

key: test_jcc
value: [0.89189189 0.97222222 0.92105263 0.92105263 0.94736842 0.97297297
 0.94594595 0.94594595 0.97222222 0.97222222]

mean value: 0.946289710763395

key: train_jcc
value: [0.99373041 0.99375    0.99375    0.99685535 0.99685535 0.99685535
 0.9968652  1.         0.9968652  0.9968652 ]

mean value: 0.9962392056544627

MCC on Blind test: 0.27

Accuracy on Blind test: 0.84

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02684522 0.01258469 0.0121913  0.01171684 0.01236367 0.01212668
 0.01183128 0.01180315 0.01206851 0.01324224]

mean value: 0.013677358627319336

key: score_time
value: [0.00874949 0.00807166 0.00804973 0.00800586 0.0087328  0.00819206
 0.00814438 0.00796533 0.00803995 0.00842094]

mean value: 0.008237218856811524

key: test_mcc
value: [0.9451949  0.97220047 0.97220047 0.97220047 0.94511009 0.94511009
 0.94440028 0.89155583 0.97182532 0.97182532]

mean value: 0.9531623215602024

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 0.98591549 0.98591549 0.98591549 0.97183099 0.97183099
 0.97142857 0.94285714 0.98571429 0.98571429]

mean value: 0.9758953722334004

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97222222 0.98550725 0.98550725 0.98630137 0.97297297 0.97297297
 0.97222222 0.94594595 0.98591549 0.98591549]

mean value: 0.9765483184868466

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94594595 1.         1.         0.97297297 0.94736842 0.94736842
 0.94594595 0.8974359  0.97222222 0.97222222]

mean value: 0.9601482048850469

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.97142857 0.97142857 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9942857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97222222 0.98571429 0.98571429 0.98571429 0.97142857 0.97142857
 0.97142857 0.94285714 0.98571429 0.98571429]

mean value: 0.9757936507936508

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94594595 0.97142857 0.97142857 0.97297297 0.94736842 0.94736842
 0.94594595 0.8974359  0.97222222 0.97222222]

mean value: 0.9544339191707613

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.86

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.11004734 0.11200666 0.105335   0.10864019 0.1060133  0.10523963
 0.1030283  0.10427046 0.10395288 0.10813165]

mean value: 0.10666654109954835

key: score_time
value: [0.01880479 0.01825428 0.01717353 0.01854348 0.01716185 0.01714182
 0.01717067 0.01719475 0.01825452 0.01732445]

mean value: 0.017702412605285645

key: test_mcc
value: [0.94365079 1.         0.97222222 0.94365079 0.91580648 0.88880092
 0.97182532 0.97182532 0.97182532 1.        ]

mean value: 0.9579607157012351

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         0.98591549 0.97183099 0.95774648 0.94366197
 0.98571429 0.98571429 0.98571429 1.        ]

mean value: 0.9788128772635815

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97142857 1.         0.98591549 0.97222222 0.95890411 0.94285714
 0.98591549 0.98591549 0.98591549 1.        ]

mean value: 0.9789074017927963

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.97142857 1.         0.97222222 0.97222222 0.94594595 0.97058824
 0.97222222 0.97222222 0.97222222 1.        ]

mean value: 0.9749073863779746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.97142857 1.         1.         0.97222222 0.97222222 0.91666667
 1.         1.         1.         1.        ]

mean value: 0.9832539682539683

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9718254  1.         0.98611111 0.9718254  0.95753968 0.94404762
 0.98571429 0.98571429 0.98571429 1.        ]

mean value: 0.9788492063492064

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94444444 1.         0.97222222 0.94594595 0.92105263 0.89189189
 0.97222222 0.97222222 0.97222222 1.        ]

mean value: 0.9592223802750118

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.38

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00825858 0.00866151 0.00814795 0.0079174  0.00801253 0.00788665
 0.00870991 0.00868416 0.00885129 0.00809765]

mean value: 0.008322763442993163

key: score_time
value: [0.00829744 0.00852799 0.00861549 0.00824738 0.0080688  0.00800133
 0.0085113  0.00867772 0.00855446 0.00791121]

mean value: 0.008341312408447266

key: test_mcc
value: [0.94365079 0.8365327  0.91885703 0.86205133 0.81050059 0.91885703
 0.89155583 0.8871639  0.91766294 0.8340361 ]

mean value: 0.8820868251428485

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 0.91549296 0.95774648 0.92957746 0.90140845 0.95774648
 0.94285714 0.94285714 0.95714286 0.91428571]

mean value: 0.9390945674044265

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97142857 0.91891892 0.95890411 0.93333333 0.90909091 0.95652174
 0.94594595 0.94117647 0.95890411 0.91891892]

mean value: 0.941314302653335

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.97142857 0.87179487 0.92105263 0.8974359  0.85365854 1.
 0.8974359  0.96969697 0.92105263 0.87179487]

mean value: 0.9175350879330341

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.97142857 0.97142857 1.         0.97222222 0.97222222 0.91666667
 1.         0.91428571 1.         0.97142857]

mean value: 0.9689682539682539

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9718254  0.91626984 0.95833333 0.92896825 0.90039683 0.95833333
 0.94285714 0.94285714 0.95714286 0.91428571]

mean value: 0.9391269841269841

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94444444 0.85       0.92105263 0.875      0.83333333 0.91666667
 0.8974359  0.88888889 0.92105263 0.85      ]

mean value: 0.8897874493927125

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.81

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.382689   1.36185718 1.36501622 1.36442327 1.36342573 1.37483025
 1.44966221 1.39965343 1.40602112 1.3880384 ]

mean value: 1.3855616807937623

key: score_time
value: [0.09274054 0.09215426 0.09248495 0.0925467  0.09295416 0.09729934
 0.10089469 0.09895921 0.09574556 0.15336585]

mean value: 0.10091452598571778

key: test_mcc
value: [0.9451949  1.         0.9451949  0.97220047 0.94511009 0.97220047
 0.94440028 0.97182532 0.97182532 1.        ]

mean value: 0.9667951728615773

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         0.97183099 0.98591549 0.97183099 0.98591549
 0.97142857 0.98571429 0.98571429 1.        ]

mean value: 0.9830181086519115

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97222222 1.         0.97222222 0.98630137 0.97297297 0.98630137
 0.97222222 0.98591549 0.98591549 1.        ]

mean value: 0.983407336528116

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94594595 1.         0.94594595 0.97297297 0.94736842 0.97297297
 0.94594595 0.97222222 0.97222222 1.        ]

mean value: 0.967559664928086

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97222222 1.         0.97222222 0.98571429 0.97142857 0.98571429
 0.97142857 0.98571429 0.98571429 1.        ]

mean value: 0.983015873015873

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94594595 1.         0.94594595 0.97297297 0.94736842 0.97297297
 0.94594595 0.97222222 0.97222222 1.        ]

mean value: 0.967559664928086

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.29

Accuracy on Blind test: 0.86

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.91464925 0.9240458  0.96008396 0.89609337 0.91591477 0.93456602
 0.91718268 0.93010211 0.98904967 0.94583178]

mean value: 0.9327519416809082

key: score_time
value: [0.24406385 0.26979637 0.23408437 0.26847029 0.24405241 0.2887404
 0.27261448 0.26284695 0.19351506 0.25960517]

mean value: 0.25377893447875977

key: test_mcc
value: [0.91587302 1.         0.9451949  0.97220047 0.94511009 0.94365079
 0.97182532 0.97182532 0.97182532 1.        ]

mean value: 0.9637505209774089

key: train_mcc
value: [0.96881022 0.96250874 0.96867592 0.96867777 0.97177468 0.96881268
 0.96564279 0.96872591 0.96564279 0.9625688 ]

mean value: 0.9671840283439759

key: test_accuracy
value: [0.95774648 1.         0.97183099 0.98591549 0.97183099 0.97183099
 0.98571429 0.98571429 0.98571429 1.        ]

mean value: 0.9816297786720323

key: train_accuracy
value: [0.98425197 0.98110236 0.98425197 0.98425197 0.98582677 0.98425197
 0.9827044  0.98427673 0.9827044  0.98113208]

mean value: 0.9834754617936908

key: test_fscore
value: [0.95774648 1.         0.97222222 0.98630137 0.97297297 0.97222222
 0.98591549 0.98591549 0.98591549 1.        ]

mean value: 0.981921174502691

key: train_fscore
value: [0.98447205 0.98136646 0.98442368 0.984375   0.98591549 0.98442368
 0.98289269 0.98442368 0.98289269 0.98136646]

mean value: 0.9836551870965667

key: test_precision
value: [0.94444444 1.         0.94594595 0.97297297 0.94736842 0.97222222
 0.97222222 0.97222222 0.97222222 1.        ]

mean value: 0.9699620673304884

key: train_precision
value: [0.97239264 0.96932515 0.97530864 0.9752322  0.97826087 0.97230769
 0.97230769 0.97530864 0.97230769 0.96932515]

mean value: 0.9732076373366603

key: test_recall
value: [0.97142857 1.         1.         1.         1.         0.97222222
 1.         1.         1.         1.        ]

mean value: 0.9943650793650793

key: train_recall
value: [0.99685535 0.99371069 0.99371069 0.99369085 0.99369085 0.99684543
 0.99371069 0.99371069 0.99371069 0.99371069]

mean value: 0.9943346626192886

key: test_roc_auc
value: [0.95793651 1.         0.97222222 0.98571429 0.97142857 0.9718254
 0.98571429 0.98571429 0.98571429 1.        ]

mean value: 0.9816269841269841

key: train_roc_auc
value: [0.98423209 0.98108248 0.98423705 0.98426681 0.98583914 0.98427177
 0.9827044  0.98427673 0.9827044  0.98113208]

mean value: 0.983474693966629

key: test_jcc
value: [0.91891892 1.         0.94594595 0.97297297 0.94736842 0.94594595
 0.97222222 0.97222222 0.97222222 1.        ]

mean value: 0.9647818871503082

key: train_jcc
value: [0.96941896 0.96341463 0.96932515 0.96923077 0.97222222 0.96932515
 0.96636086 0.96932515 0.96636086 0.96341463]

mean value: 0.9678398392651248

MCC on Blind test: 0.31

Accuracy on Blind test: 0.85

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02012014 0.00854611 0.00811195 0.00855899 0.0088582  0.00802565
 0.00814772 0.00833344 0.00853729 0.00875807]

mean value: 0.009599757194519044

key: score_time
value: [0.01067162 0.00829363 0.00833917 0.00915647 0.00838065 0.00874257
 0.00873756 0.00849104 0.00841093 0.00848961]

mean value: 0.008771324157714843

key: test_mcc
value: [0.69047619 0.63383658 0.69762232 0.60881948 0.53699395 0.52233453
 0.48891771 0.7581754  0.65821838 0.60901553]

mean value: 0.6204410069366437

key: train_mcc
value: [0.60949181 0.61718891 0.63374209 0.58769936 0.60523202 0.64225486
 0.62174197 0.61279592 0.60729861 0.63335019]

mean value: 0.6170795744074016

key: test_accuracy
value: [0.84507042 0.81690141 0.84507042 0.8028169  0.76056338 0.76056338
 0.74285714 0.87142857 0.82857143 0.8       ]

mean value: 0.8073843058350101

key: train_accuracy
value: [0.8015748  0.80787402 0.81574803 0.79212598 0.8015748  0.82047244
 0.80974843 0.80503145 0.80031447 0.81289308]

mean value: 0.8067357500123805

key: test_fscore
value: [0.84507042 0.8115942  0.85333333 0.81578947 0.79012346 0.77333333
 0.75675676 0.88311688 0.83333333 0.81578947]

mean value: 0.8178240669465947

key: train_fscore
value: [0.81524927 0.81458967 0.82352941 0.80239521 0.80909091 0.82568807
 0.81749623 0.81381381 0.81405564 0.82627737]

mean value: 0.8162185588580184

key: test_precision
value: [0.83333333 0.82352941 0.8        0.775      0.71111111 0.74358974
 0.71794872 0.80952381 0.81081081 0.75609756]

mean value: 0.7780944499057842

key: train_precision
value: [0.76373626 0.78823529 0.79130435 0.76353276 0.77842566 0.80118694
 0.78550725 0.77873563 0.76164384 0.77111717]

mean value: 0.7783425149199308

key: test_recall
value: [0.85714286 0.8        0.91428571 0.86111111 0.88888889 0.80555556
 0.8        0.97142857 0.85714286 0.88571429]

mean value: 0.8641269841269841

key: train_recall
value: [0.87421384 0.8427673  0.85849057 0.84542587 0.84227129 0.85173502
 0.85220126 0.85220126 0.87421384 0.88993711]

mean value: 0.8583457333888856

key: test_roc_auc
value: [0.8452381  0.81666667 0.84603175 0.80198413 0.75873016 0.75992063
 0.74285714 0.87142857 0.82857143 0.8       ]

mean value: 0.8071428571428572

key: train_roc_auc
value: [0.80146023 0.80781898 0.81568061 0.79220979 0.80163879 0.8205216
 0.80974843 0.80503145 0.80031447 0.81289308]

mean value: 0.8067317421582049

key: test_jcc
value: [0.73170732 0.68292683 0.74418605 0.68888889 0.65306122 0.63043478
 0.60869565 0.79069767 0.71428571 0.68888889]

mean value: 0.6933773018607593

key: train_jcc
value: [0.68811881 0.68717949 0.7        0.67       0.67938931 0.703125
 0.69132653 0.68607595 0.68641975 0.7039801 ]

mean value: 0.6895614944606016

MCC on Blind test: 0.22

Accuracy on Blind test: 0.63

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.08229804 0.05557299 0.05145884 0.07025552 0.0688889  0.06463599
 0.06412101 0.06187201 0.06107402 0.06366992]

mean value: 0.06438472270965576

key: score_time
value: [0.01020217 0.01022315 0.00988221 0.01065993 0.01005268 0.01064777
 0.01069522 0.00998998 0.01045489 0.00996876]

mean value: 0.010277676582336425

key: test_mcc
value: [0.9451949  1.         0.91587302 0.97220047 0.94511009 0.97220047
 0.97182532 0.97182532 0.97182532 1.        ]

mean value: 0.9666054881651751

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97183099 1.         0.95774648 0.98591549 0.97183099 0.98591549
 0.98571429 0.98571429 0.98571429 1.        ]

mean value: 0.9830382293762576

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97222222 1.         0.95774648 0.98630137 0.97297297 0.98630137
 0.98591549 0.98591549 0.98591549 1.        ]

mean value: 0.9833290892667701

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94594595 1.         0.94444444 0.97297297 0.94736842 0.97297297
 0.97222222 0.97222222 0.97222222 1.        ]

mean value: 0.9700371424055635

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.97142857 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9971428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97222222 1.         0.95793651 0.98571429 0.97142857 0.98571429
 0.98571429 0.98571429 0.98571429 1.        ]

mean value: 0.983015873015873

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94594595 1.         0.91891892 0.97297297 0.94736842 0.97297297
 0.97222222 0.97222222 0.97222222 1.        ]

mean value: 0.9674845898530109

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.85

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01726913 0.04849577 0.04532862 0.04679036 0.04754329 0.04615712
 0.04615998 0.01974487 0.06901121 0.04615664]

mean value: 0.043265700340270996

key: score_time
value: [0.01070952 0.01685834 0.02024412 0.01978326 0.017102   0.01800013
 0.01840425 0.01667547 0.01130104 0.0193522 ]

mean value: 0.016843032836914063

key: test_mcc
value: [0.91580648 0.94511009 0.88730159 0.88862624 0.77460317 0.8594125
 0.82857143 0.94440028 0.91465912 0.94440028]

mean value: 0.890289119006275

key: train_mcc
value: [0.9401617  0.93702568 0.95276028 0.93702693 0.9433251  0.92759921
 0.93083602 0.93718106 0.94025622 0.94976067]

mean value: 0.9395932881039084

key: test_accuracy
value: [0.95774648 0.97183099 0.94366197 0.94366197 0.88732394 0.92957746
 0.91428571 0.97142857 0.95714286 0.97142857]

mean value: 0.9448088531187122

key: train_accuracy
value: [0.97007874 0.96850394 0.97637795 0.96850394 0.97165354 0.96377953
 0.96540881 0.96855346 0.97012579 0.97484277]

mean value: 0.9697828455405338

key: test_fscore
value: [0.95652174 0.97058824 0.94285714 0.94594595 0.88888889 0.93150685
 0.91428571 0.97222222 0.95652174 0.97058824]

mean value: 0.9449926712364087

key: train_fscore
value: [0.97017268 0.96865204 0.97645212 0.96855346 0.97151899 0.96354992
 0.96529968 0.96875    0.97017268 0.975     ]

mean value: 0.9698121577608168

key: test_precision
value: [0.97058824 1.         0.94285714 0.92105263 0.88888889 0.91891892
 0.91428571 0.94594595 0.97058824 1.        ]

mean value: 0.9473125713063794

key: train_precision
value: [0.96865204 0.965625   0.97492163 0.96551724 0.97460317 0.96815287
 0.96835443 0.96273292 0.96865204 0.9689441 ]

mean value: 0.9686155436566964

key: test_recall
value: [0.94285714 0.94285714 0.94285714 0.97222222 0.88888889 0.94444444
 0.91428571 1.         0.94285714 0.94285714]

mean value: 0.9434126984126984

key: train_recall
value: [0.97169811 0.97169811 0.97798742 0.97160883 0.96845426 0.95899054
 0.96226415 0.97484277 0.97169811 0.98113208]

mean value: 0.9710374382477234

key: test_roc_auc
value: [0.95753968 0.97142857 0.94365079 0.94325397 0.88730159 0.92936508
 0.91428571 0.97142857 0.95714286 0.97142857]

mean value: 0.9446825396825397

key: train_roc_auc
value: [0.97007619 0.9684989  0.97637541 0.96850882 0.97164851 0.963772
 0.96540881 0.96855346 0.97012579 0.97484277]

mean value: 0.9697810646191696

key: test_jcc
value: [0.91666667 0.94285714 0.89189189 0.8974359  0.8        0.87179487
 0.84210526 0.94594595 0.91666667 0.94285714]

mean value: 0.8968221489274121

key: train_jcc
value: [0.94207317 0.93920973 0.95398773 0.93902439 0.94461538 0.92966361
 0.93292683 0.93939394 0.94207317 0.95121951]

mean value: 0.9414187462247865

MCC on Blind test: 0.26

Accuracy on Blind test: 0.8

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01100755 0.01057196 0.00890303 0.00805974 0.00824904 0.00848842
 0.00868011 0.00886703 0.00807524 0.00789809]

mean value: 0.008880019187927246

key: score_time
value: [0.01157236 0.00949931 0.00915504 0.00873566 0.00878334 0.00882101
 0.00864768 0.00887275 0.00798082 0.00794458]

mean value: 0.00900125503540039

key: test_mcc
value: [0.69292162 0.71961897 0.75442414 0.70310369 0.72472613 0.61348603
 0.45883147 0.68572751 0.74560114 0.69985421]

mean value: 0.6798294910722227

key: train_mcc
value: [0.68941392 0.68469209 0.68590643 0.6803341  0.69349825 0.71023697
 0.66497357 0.69736279 0.69416896 0.69447562]

mean value: 0.6895062697027032

key: test_accuracy
value: [0.84507042 0.85915493 0.87323944 0.84507042 0.84507042 0.8028169
 0.72857143 0.82857143 0.87142857 0.84285714]

mean value: 0.8341851106639839

key: train_accuracy
value: [0.84094488 0.83937008 0.83937008 0.83622047 0.84409449 0.8519685
 0.83176101 0.84591195 0.8427673  0.84433962]

mean value: 0.8416748378150845

key: test_fscore
value: [0.84931507 0.86111111 0.88       0.86075949 0.86746988 0.82051282
 0.73972603 0.85       0.86567164 0.85714286]

mean value: 0.8451708899637203

key: train_fscore
value: [0.85212299 0.84955752 0.85043988 0.84750733 0.85289747 0.86094675
 0.83713851 0.85502959 0.85422741 0.85376662]

mean value: 0.8513634059429992

key: test_precision
value: [0.81578947 0.83783784 0.825      0.79069767 0.76595745 0.76190476
 0.71052632 0.75555556 0.90625    0.78571429]

mean value: 0.795523335171324

key: train_precision
value: [0.79726027 0.8        0.7967033  0.79178082 0.80617978 0.81058496
 0.81120944 0.80726257 0.79619565 0.80501393]

mean value: 0.8022190715202817

key: test_recall
value: [0.88571429 0.88571429 0.94285714 0.94444444 1.         0.88888889
 0.77142857 0.97142857 0.82857143 0.94285714]

mean value: 0.9061904761904762

key: train_recall
value: [0.91509434 0.90566038 0.91194969 0.91167192 0.90536278 0.91798107
 0.86477987 0.90880503 0.92138365 0.90880503]

mean value: 0.9071493760292046

key: test_roc_auc
value: [0.84563492 0.85952381 0.87420635 0.84365079 0.84285714 0.8015873
 0.72857143 0.82857143 0.87142857 0.84285714]

mean value: 0.8338888888888889

key: train_roc_auc
value: [0.84082793 0.83926552 0.8392556  0.83633911 0.84419082 0.8520723
 0.83176101 0.84591195 0.8427673  0.84433962]

mean value: 0.8416731146955538

key: test_jcc
value: [0.73809524 0.75609756 0.78571429 0.75555556 0.76595745 0.69565217
 0.58695652 0.73913043 0.76315789 0.75      ]

mean value: 0.7336317112320825

key: train_jcc
value: [0.74234694 0.73846154 0.73979592 0.73536896 0.74352332 0.75584416
 0.71989529 0.74677003 0.74554707 0.74484536]

mean value: 0.7412398572667729

MCC on Blind test: 0.27

Accuracy on Blind test: 0.73

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01256037 0.0129602  0.0124402  0.01418591 0.01519251 0.01379037
 0.0145843  0.01559997 0.01467609 0.01388359]

mean value: 0.013987350463867187

key: score_time
value: [0.00821686 0.01022291 0.01050091 0.01071692 0.01076841 0.01073503
 0.01092577 0.01097298 0.01072097 0.01067448]

mean value: 0.010445523262023925

key: test_mcc
value: [0.7380153  0.94511009 0.88730159 0.81839321 0.9186708  0.86205133
 0.78301997 0.91465912 0.91465912 0.97182532]

mean value: 0.8753705850345086

key: train_mcc
value: [0.86062704 0.94027246 0.90337782 0.93520499 0.91737981 0.92960161
 0.82836275 0.89394851 0.84815773 0.95036243]

mean value: 0.9007295140283917

key: test_accuracy
value: [0.85915493 0.97183099 0.94366197 0.90140845 0.95774648 0.92957746
 0.88571429 0.95714286 0.95714286 0.98571429]

mean value: 0.9349094567404427

key: train_accuracy
value: [0.92598425 0.97007874 0.9511811  0.96692913 0.95748031 0.96377953
 0.91037736 0.94654088 0.91981132 0.97484277]

mean value: 0.9487005397910167

key: test_fscore
value: [0.87179487 0.97058824 0.94285714 0.91139241 0.96       0.93333333
 0.875      0.95774648 0.95774648 0.98591549]

mean value: 0.9366374439046983

key: train_fscore
value: [0.93098385 0.97035881 0.95008052 0.96774194 0.95890411 0.9648855
 0.90387858 0.94533762 0.92511013 0.97530864]

mean value: 0.9492589696376544

key: test_precision
value: [0.79069767 1.         0.94285714 0.8372093  0.92307692 0.8974359
 0.96551724 0.94444444 0.94444444 0.97222222]

mean value: 0.9217905292604571

key: train_precision
value: [0.87327824 0.9628483  0.97359736 0.94311377 0.92647059 0.93491124
 0.97454545 0.96710526 0.8677686  0.95757576]

mean value: 0.938121456747856

key: test_recall
value: [0.97142857 0.94285714 0.94285714 1.         1.         0.97222222
 0.8        0.97142857 0.97142857 1.        ]

mean value: 0.9572222222222222

key: train_recall
value: [0.99685535 0.97798742 0.92767296 0.99369085 0.99369085 0.99684543
 0.8427673  0.9245283  0.99056604 0.99371069]

mean value: 0.9638315179652005

key: test_roc_auc
value: [0.86071429 0.97142857 0.94365079 0.9        0.95714286 0.92896825
 0.88571429 0.95714286 0.95714286 0.98571429]

mean value: 0.9347619047619048

key: train_roc_auc
value: [0.92587247 0.97006627 0.95121818 0.96697121 0.95753725 0.96383152
 0.91037736 0.94654088 0.91981132 0.97484277]

mean value: 0.9487069222070115

key: test_jcc
value: [0.77272727 0.94285714 0.89189189 0.8372093  0.92307692 0.875
 0.77777778 0.91891892 0.91891892 0.97222222]

mean value: 0.883060037071665

key: train_jcc
value: [0.87087912 0.94242424 0.90490798 0.9375     0.92105263 0.93215339
 0.82461538 0.89634146 0.86065574 0.95180723]

mean value: 0.9042337177323416

MCC on Blind test: 0.22

Accuracy on Blind test: 0.78

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01248407 0.01530766 0.01435423 0.01306081 0.01362562 0.01316786
 0.01642776 0.01362872 0.01625896 0.0138433 ]

mean value: 0.014215898513793946

key: score_time
value: [0.01122069 0.01088285 0.01063156 0.01069832 0.01098776 0.01070976
 0.01086879 0.01070619 0.01069164 0.01066446]

mean value: 0.01080620288848877

key: test_mcc
value: [0.88730159 0.91885703 0.85952381 0.79446135 0.83240693 0.7364297
 0.88571429 0.91766294 0.8871639  0.81649658]

mean value: 0.8536018118121076

key: train_mcc
value: [0.94338294 0.85426212 0.93386306 0.92923073 0.95276028 0.80610644
 0.93712545 0.95036243 0.94029342 0.80723238]

mean value: 0.9054619258434897

key: test_accuracy
value: [0.94366197 0.95774648 0.92957746 0.88732394 0.91549296 0.85915493
 0.94285714 0.95714286 0.94285714 0.9       ]

mean value: 0.9235814889336016

key: train_accuracy
value: [0.97165354 0.92283465 0.96692913 0.96377953 0.97637795 0.89448819
 0.96855346 0.97484277 0.97012579 0.89937107]

mean value: 0.950895607388699

key: test_fscore
value: [0.94285714 0.95890411 0.92957746 0.9        0.91428571 0.875
 0.94285714 0.95890411 0.94117647 0.88888889]

mean value: 0.9252451043443939

key: train_fscore
value: [0.97151899 0.92804699 0.96692913 0.96477795 0.97630332 0.90414878
 0.96845426 0.97530864 0.9699842  0.89152542]

mean value: 0.9516997686957204

key: test_precision
value: [0.94285714 0.92105263 0.91666667 0.81818182 0.94117647 0.79545455
 0.94285714 0.92105263 0.96969697 1.        ]

mean value: 0.9168996019460416

key: train_precision
value: [0.97770701 0.87052342 0.96845426 0.9375     0.9778481  0.82722513
 0.97151899 0.95757576 0.97460317 0.96691176]

mean value: 0.9429867597404928

key: test_recall
value: [0.94285714 1.         0.94285714 1.         0.88888889 0.97222222
 0.94285714 1.         0.91428571 0.8       ]

mean value: 0.9403968253968253

key: train_recall
value: [0.96540881 0.99371069 0.96540881 0.99369085 0.97476341 0.99684543
 0.96540881 0.99371069 0.96540881 0.82704403]

mean value: 0.9641400313473405

key: test_roc_auc
value: [0.94365079 0.95833333 0.9297619  0.88571429 0.91587302 0.85753968
 0.94285714 0.95714286 0.94285714 0.9       ]

mean value: 0.9233730158730159

key: train_roc_auc
value: [0.97166339 0.92272285 0.96693153 0.96382656 0.97637541 0.89464913
 0.96855346 0.97484277 0.97012579 0.89937107]

mean value: 0.9509061960597583

key: test_jcc
value: [0.89189189 0.92105263 0.86842105 0.81818182 0.84210526 0.77777778
 0.89189189 0.92105263 0.88888889 0.8       ]

mean value: 0.8621263847579637

key: train_jcc
value: [0.94461538 0.86575342 0.93597561 0.93195266 0.9537037  0.82506527
 0.93883792 0.95180723 0.94171779 0.80428135]

mean value: 0.9093710345987801

MCC on Blind test: 0.18

Accuracy on Blind test: 0.71

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11450005 0.09942293 0.10065579 0.09938693 0.10254526 0.09947491
 0.10343981 0.10581398 0.10528588 0.10459161]

mean value: 0.10351171493530273

key: score_time
value: [0.01439619 0.01475668 0.0148623  0.01507521 0.01486588 0.01494408
 0.01576829 0.01550889 0.01577449 0.01586604]

mean value: 0.015181803703308105

key: test_mcc
value: [0.97222222 1.         0.9451949  0.94511009 0.94511009 0.97220047
 0.97182532 0.94440028 0.97182532 1.        ]

mean value: 0.9667888678525607

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98591549 1.         0.97183099 0.97183099 0.97183099 0.98591549
 0.98571429 0.97142857 0.98571429 1.        ]

mean value: 0.9830181086519115

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98591549 1.         0.97222222 0.97297297 0.97297297 0.98630137
 0.98591549 0.97222222 0.98591549 1.        ]

mean value: 0.9834438239126644

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.97222222 1.         0.94594595 0.94736842 0.94736842 0.97297297
 0.97222222 0.94594595 0.97222222 1.        ]

mean value: 0.9676268373636795

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98611111 1.         0.97222222 0.97142857 0.97142857 0.98571429
 0.98571429 0.97142857 0.98571429 1.        ]

mean value: 0.9829761904761904

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.97222222 1.         0.94594595 0.94736842 0.94736842 0.97297297
 0.97222222 0.94594595 0.97222222 1.        ]

mean value: 0.9676268373636795

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.85

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04095316 0.0422883  0.04308581 0.03728151 0.03888655 0.03602791
 0.04428387 0.04721117 0.03666067 0.03684378]

mean value: 0.040352272987365725

key: score_time
value: [0.02391195 0.03111863 0.02407122 0.02295971 0.02479601 0.02285576
 0.02889371 0.01769805 0.01712298 0.01948595]

mean value: 0.023291397094726562

key: test_mcc
value: [0.9451949  1.         0.97222222 0.97220047 0.94511009 0.94511009
 0.94440028 0.91766294 0.97182532 0.97182532]

mean value: 0.9585551614007853

key: train_mcc
value: [1.         0.99685531 0.99059524 1.         1.         1.
 0.99686027 0.99686027 0.99373035 0.99686027]

mean value: 0.9971761724807292

key: test_accuracy
value: [0.97183099 1.         0.98591549 0.98591549 0.97183099 0.97183099
 0.97142857 0.95714286 0.98571429 0.98571429]

mean value: 0.9787323943661972

key: train_accuracy
value: [1.         0.9984252  0.99527559 1.         1.         1.
 0.99842767 0.99842767 0.99685535 0.99842767]

mean value: 0.9985839152181448

key: test_fscore
value: [0.97222222 1.         0.98591549 0.98630137 0.97297297 0.97297297
 0.97222222 0.95890411 0.98591549 0.98591549]

mean value: 0.9793342348715685

key: train_fscore
value: [1.         0.99843014 0.99530516 1.         1.         1.
 0.99843014 0.99843014 0.9968652  0.99843014]

mean value: 0.9985890933230142

key: test_precision
value: [0.94594595 1.         0.97222222 0.97297297 0.94736842 0.94736842
 0.94594595 0.92105263 0.97222222 0.97222222]

mean value: 0.9597321005215742

key: train_precision
value: [1.         0.9968652  0.99065421 1.         1.         1.
 0.9968652  0.9968652  0.99375    0.9968652 ]

mean value: 0.9971865020654499

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97222222 1.         0.98611111 0.98571429 0.97142857 0.97142857
 0.97142857 0.95714286 0.98571429 0.98571429]

mean value: 0.9786904761904761

key: train_roc_auc
value: [1.         0.99842271 0.99526814 1.         1.         1.
 0.99842767 0.99842767 0.99685535 0.99842767]

mean value: 0.998582921651489

key: test_jcc
value: [0.94594595 1.         0.97222222 0.97297297 0.94736842 0.94736842
 0.94594595 0.92105263 0.97222222 0.97222222]

mean value: 0.9597321005215742

key: train_jcc
value: [1.         0.9968652  0.99065421 1.         1.         1.
 0.9968652  0.9968652  0.99375    0.9968652 ]

mean value: 0.9971865020654499

MCC on Blind test: 0.24

Accuracy on Blind test: 0.85

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.28364706 0.2668128  0.25616741 0.16965199 0.18673849 0.24170947
 0.24973774 0.27796173 0.29994655 0.25314331]

mean value: 0.24855165481567382

key: score_time
value: [0.02206802 0.02942705 0.0219636  0.02204132 0.01348066 0.02202201
 0.02197385 0.02503347 0.02198958 0.02210307]

mean value: 0.02221026420593262

key: test_mcc
value: [0.83214239 0.84343471 0.9451949  0.70310369 0.81050059 0.69047619
 0.71545476 0.84102145 0.8871639  0.74560114]

mean value: 0.801409370970335

key: train_mcc
value: [0.90553048 0.89928572 0.90247629 0.90236595 0.90558158 0.90866524
 0.91509886 0.90582163 0.90566038 0.90573203]

mean value: 0.9056218165156902

key: test_accuracy
value: [0.91549296 0.91549296 0.97183099 0.84507042 0.90140845 0.84507042
 0.85714286 0.91428571 0.94285714 0.87142857]

mean value: 0.8980080482897385

key: train_accuracy
value: [0.95275591 0.9496063  0.9511811  0.9511811  0.95275591 0.95433071
 0.95754717 0.95283019 0.95283019 0.95283019]

mean value: 0.9527848759471104

key: test_fscore
value: [0.91176471 0.92105263 0.97222222 0.86075949 0.90909091 0.84507042
 0.86111111 0.92105263 0.94117647 0.87671233]

mean value: 0.9020012927025947

key: train_fscore
value: [0.95268139 0.94936709 0.95087163 0.95102686 0.95238095 0.95418641
 0.95748031 0.95238095 0.95283019 0.95253165]

mean value: 0.9525737433063429

key: test_precision
value: [0.93939394 0.85365854 0.94594595 0.79069767 0.85365854 0.85714286
 0.83783784 0.85365854 0.96969697 0.84210526]

mean value: 0.8743796097350147

key: train_precision
value: [0.9556962  0.95541401 0.95846645 0.95253165 0.95846645 0.9556962
 0.95899054 0.96153846 0.95283019 0.95859873]

mean value: 0.9568228883329967

key: test_recall
value: [0.88571429 1.         1.         0.94444444 0.97222222 0.83333333
 0.88571429 1.         0.91428571 0.91428571]

mean value: 0.9349999999999999

key: train_recall
value: [0.94968553 0.94339623 0.94339623 0.94952681 0.94637224 0.95268139
 0.95597484 0.94339623 0.95283019 0.94654088]

mean value: 0.9483800567426542

key: test_roc_auc
value: [0.91507937 0.91666667 0.97222222 0.84365079 0.90039683 0.8452381
 0.85714286 0.91428571 0.94285714 0.87142857]

mean value: 0.8978968253968254

key: train_roc_auc
value: [0.95276075 0.94961609 0.95119338 0.9511785  0.95274587 0.95432812
 0.95754717 0.95283019 0.95283019 0.95283019]

mean value: 0.9527860444814793

key: test_jcc
value: [0.83783784 0.85365854 0.94594595 0.75555556 0.83333333 0.73170732
 0.75609756 0.85365854 0.88888889 0.7804878 ]

mean value: 0.8237171317659122

key: train_jcc
value: [0.90963855 0.90361446 0.90634441 0.90662651 0.90909091 0.91238671
 0.918429   0.90909091 0.90990991 0.90936556]

mean value: 0.9094496925922325

MCC on Blind test: 0.29

Accuracy on Blind test: 0.8

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.28502631 0.27786803 0.2669673  0.26025629 0.26016879 0.26106143
 0.26042795 0.26144886 0.26163244 0.25996804]

mean value: 0.2654825448989868

key: score_time
value: [0.00984454 0.00959039 0.00850868 0.00844693 0.00874805 0.00843954
 0.00860071 0.00844836 0.00860429 0.00847673]

mean value: 0.00877082347869873

key: test_mcc
value: [0.9451949  1.         0.9451949  0.97220047 0.94511009 0.94365079
 0.94440028 0.97182532 0.97182532 0.97182532]

mean value: 0.9611227372545662

key: train_mcc
value: [1.         1.         1.         1.         1.         0.99685531
 1.         1.         1.         1.        ]

mean value: 0.9996855314765581

key: test_accuracy
value: [0.97183099 1.         0.97183099 0.98591549 0.97183099 0.97183099
 0.97142857 0.98571429 0.98571429 0.98571429]

mean value: 0.9801810865191147

key: train_accuracy
value: [1.        1.        1.        1.        1.        0.9984252 1.
 1.        1.        1.       ]

mean value: 0.9998425196850393

key: test_fscore
value: [0.97222222 1.         0.97222222 0.98630137 0.97297297 0.97222222
 0.97222222 0.98591549 0.98591549 0.98591549]

mean value: 0.9805909710598115

key: train_fscore
value: [1.         1.         1.         1.         1.         0.99842022
 1.         1.         1.         1.        ]

mean value: 0.9998420221169037

key: test_precision
value: [0.94594595 1.         0.94594595 0.97297297 0.94736842 0.97222222
 0.94594595 0.97222222 0.97222222 0.97222222]

mean value: 0.9647068120752331

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.97222222
 1.         1.         1.         1.        ]

mean value: 0.9972222222222222

key: train_recall
value: [1.         1.         1.         1.         1.         0.99684543
 1.         1.         1.         1.        ]

mean value: 0.9996845425867508

key: test_roc_auc
value: [0.97222222 1.         0.97222222 0.98571429 0.97142857 0.9718254
 0.97142857 0.98571429 0.98571429 0.98571429]

mean value: 0.9801984126984127

key: train_roc_auc
value: [1.         1.         1.         1.         1.         0.99842271
 1.         1.         1.         1.        ]

mean value: 0.9998422712933754

key: test_jcc
value: [0.94594595 1.         0.94594595 0.97297297 0.94736842 0.94594595
 0.94594595 0.97222222 0.97222222 0.97222222]

mean value: 0.9620791844476055

key: train_jcc
value: [1.         1.         1.         1.         1.         0.99684543
 1.         1.         1.         1.        ]

mean value: 0.9996845425867508

MCC on Blind test: 0.25

Accuracy on Blind test: 0.86

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01253176 0.01580763 0.01565361 0.01494575 0.01510954 0.01479077
 0.01480365 0.01490068 0.014853   0.01492882]

mean value: 0.014832520484924316

key: score_time
value: [0.01115751 0.01106358 0.01114917 0.01116085 0.01098204 0.01356149
 0.01323414 0.01438808 0.01097846 0.01121426]

mean value: 0.011888957023620606

key: test_mcc
value: [0.60561605 0.74766718 0.65726707 0.44129696 0.5532359  0.58548477
 0.66212219 0.72501849 0.78301997 0.5923057 ]

mean value: 0.6353034273077338

key: train_mcc
value: [0.76004007 0.72245201 0.66278625 0.65560022 0.57404094 0.83304874
 0.80325449 0.83067537 0.79045363 0.75380319]

mean value: 0.7386154903774159

key: test_accuracy
value: [0.78873239 0.85915493 0.8028169  0.67605634 0.73239437 0.77464789
 0.81428571 0.85714286 0.88571429 0.77142857]

mean value: 0.7962374245472836

key: train_accuracy
value: [0.86771654 0.8503937  0.80629921 0.80314961 0.7496063  0.91338583
 0.89465409 0.91037736 0.88836478 0.86477987]

mean value: 0.8548727281731293

key: test_fscore
value: [0.74576271 0.83333333 0.75       0.54901961 0.64150943 0.73333333
 0.77966102 0.84375    0.875      0.71428571]

mean value: 0.7465655151571342

key: train_fscore
value: [0.84892086 0.83005367 0.76116505 0.75633528 0.66666667 0.90756303
 0.88388215 0.90289608 0.87694974 0.84532374]

mean value: 0.8279756265504205

key: test_precision
value: [0.91666667 1.         1.         0.93333333 1.         0.91666667
 0.95833333 0.93103448 0.96551724 0.95238095]

mean value: 0.9573932676518884

key: train_precision
value: [0.99159664 0.9626556  0.99492386 0.98979592 0.99375    0.97122302
 0.98455598 0.98513011 0.97683398 0.98739496]

mean value: 0.9837860069030633

key: test_recall
value: [0.62857143 0.71428571 0.6        0.38888889 0.47222222 0.61111111
 0.65714286 0.77142857 0.8        0.57142857]

mean value: 0.6215079365079366

key: train_recall
value: [0.74213836 0.72955975 0.6163522  0.61198738 0.50157729 0.85173502
 0.80188679 0.83333333 0.79559748 0.73899371]

mean value: 0.7223161319762712

key: test_roc_auc
value: [0.78650794 0.85714286 0.8        0.68015873 0.73611111 0.77698413
 0.81428571 0.85714286 0.88571429 0.77142857]

mean value: 0.7965476190476191

key: train_roc_auc
value: [0.86791461 0.85058429 0.80659881 0.80284904 0.74921632 0.91328889
 0.89465409 0.91037736 0.88836478 0.86477987]

mean value: 0.8548628057853699

key: test_jcc
value: [0.59459459 0.71428571 0.6        0.37837838 0.47222222 0.57894737
 0.63888889 0.72972973 0.77777778 0.55555556]

mean value: 0.6040380229853914

key: train_jcc
value: [0.7375     0.70948012 0.61442006 0.60815047 0.5        0.83076923
 0.79192547 0.82298137 0.7808642  0.73208723]

mean value: 0.7128178143252082

MCC on Blind test: 0.29

Accuracy on Blind test: 0.92

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02688622 0.03091717 0.02439308 0.03099203 0.03109002 0.03095508
 0.03092408 0.01178432 0.01177502 0.011724  ]

mean value: 0.0241441011428833

key: score_time
value: [0.01939631 0.01934052 0.01894784 0.01915073 0.01910853 0.01867843
 0.01939607 0.01078653 0.01074886 0.01071095]

mean value: 0.016626477241516113

key: test_mcc
value: [0.91580648 0.94511009 0.88730159 0.94365079 0.8031746  0.85952381
 0.85749293 0.94440028 0.8871639  0.94440028]

mean value: 0.8988024758003245

key: train_mcc
value: [0.93078373 0.92442835 0.93397556 0.9059564  0.9433251  0.92457213
 0.92469291 0.94654556 0.92469291 0.92771424]

mean value: 0.9286686898188696

key: test_accuracy
value: [0.95774648 0.97183099 0.94366197 0.97183099 0.90140845 0.92957746
 0.92857143 0.97142857 0.94285714 0.97142857]

mean value: 0.9490342052313884

key: train_accuracy
value: [0.96535433 0.96220472 0.96692913 0.95275591 0.97165354 0.96220472
 0.96226415 0.97327044 0.96226415 0.96383648]

mean value: 0.9642737582330511

key: test_fscore
value: [0.95652174 0.97058824 0.94285714 0.97222222 0.90140845 0.92957746
 0.92753623 0.97222222 0.94117647 0.97058824]

mean value: 0.9484698414985508

key: train_fscore
value: [0.96518987 0.96214511 0.96671949 0.95192308 0.97151899 0.96178344
 0.96190476 0.9733124  0.96190476 0.96366509]

mean value: 0.9640066993032763

key: test_precision
value: [0.97058824 1.         0.94285714 0.97222222 0.91428571 0.94285714
 0.94117647 0.94594595 0.96969697 1.        ]

mean value: 0.959962984374749

key: train_precision
value: [0.97133758 0.96518987 0.97444089 0.96742671 0.97460317 0.97106109
 0.97115385 0.97178683 0.97115385 0.96825397]

mean value: 0.9706407819970189

key: test_recall
value: [0.94285714 0.94285714 0.94285714 0.97222222 0.88888889 0.91666667
 0.91428571 1.         0.91428571 0.94285714]

mean value: 0.9377777777777777

key: train_recall
value: [0.9591195  0.9591195  0.9591195  0.93690852 0.96845426 0.95268139
 0.95283019 0.97484277 0.95283019 0.9591195 ]

mean value: 0.9575025296113326

key: test_roc_auc
value: [0.95753968 0.97142857 0.94365079 0.9718254  0.9015873  0.9297619
 0.92857143 0.97142857 0.94285714 0.97142857]

mean value: 0.9490079365079365

key: train_roc_auc
value: [0.96536416 0.96220959 0.96694145 0.95273099 0.97164851 0.96218975
 0.96226415 0.97327044 0.96226415 0.96383648]

mean value: 0.9642719679384164

key: test_jcc
value: [0.91666667 0.94285714 0.89189189 0.94594595 0.82051282 0.86842105
 0.86486486 0.94594595 0.88888889 0.94285714]

mean value: 0.9028852363062889

key: train_jcc
value: [0.93272171 0.92705167 0.93558282 0.90825688 0.94461538 0.92638037
 0.9266055  0.94801223 0.9266055  0.92987805]

mean value: 0.930571013017483

MCC on Blind test: 0.26

Accuracy on Blind test: 0.81

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./embb_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.19827652 0.19935083 0.11779952 0.14097071 0.20005989 0.23936892
 0.23391032 0.19965649 0.19985843 0.20182014]

mean value: 0.19310717582702636

key: score_time
value: [0.02154684 0.02093148 0.01095414 0.0209043  0.01941705 0.01915574
 0.01094842 0.02140784 0.02034569 0.02129364]

mean value: 0.01869051456451416

key: test_mcc
value: [0.91580648 0.94511009 0.88730159 0.91580648 0.77460317 0.88730159
 0.91465912 0.94440028 0.91465912 0.94440028]

mean value: 0.9044048212742926

key: train_mcc
value: [0.94330695 0.93702568 0.94646152 0.94330695 0.9433251  0.93078099
 0.95287259 0.94341489 0.94339623 0.95287259]

mean value: 0.9436763480331438

key: test_accuracy
value: [0.95774648 0.97183099 0.94366197 0.95774648 0.88732394 0.94366197
 0.95714286 0.97142857 0.95714286 0.97142857]

mean value: 0.9519114688128772

key: train_accuracy
value: [0.97165354 0.96850394 0.97322835 0.97165354 0.97165354 0.96535433
 0.97641509 0.97169811 0.97169811 0.97641509]

mean value: 0.9718273659188827

key: test_fscore
value: [0.95652174 0.97058824 0.94285714 0.95890411 0.88888889 0.94444444
 0.95652174 0.97222222 0.95652174 0.97058824]

mean value: 0.9518058495981279

key: train_fscore
value: [0.97169811 0.96865204 0.97322835 0.97160883 0.97151899 0.96507937
 0.97652582 0.97178683 0.97169811 0.97652582]

mean value: 0.9718322272766338

key: test_precision
value: [0.97058824 1.         0.94285714 0.94594595 0.88888889 0.94444444
 0.97058824 0.94594595 0.97058824 1.        ]

mean value: 0.9579847073964721

key: train_precision
value: [0.97169811 0.965625   0.97476341 0.97160883 0.97460317 0.97124601
 0.97196262 0.96875    0.97169811 0.97196262]

mean value: 0.9713917880800539

key: test_recall
value: [0.94285714 0.94285714 0.94285714 0.97222222 0.88888889 0.94444444
 0.94285714 1.         0.94285714 0.94285714]

mean value: 0.9462698412698413

key: train_recall
value: [0.97169811 0.97169811 0.97169811 0.97160883 0.96845426 0.95899054
 0.98113208 0.97484277 0.97169811 0.98113208]

mean value: 0.9722952998829435

key: test_roc_auc
value: [0.95753968 0.97142857 0.94365079 0.95753968 0.88730159 0.94365079
 0.95714286 0.97142857 0.95714286 0.97142857]

mean value: 0.9518253968253968

key: train_roc_auc
value: [0.97165347 0.9684989  0.97323076 0.97165347 0.97164851 0.96534432
 0.97641509 0.97169811 0.97169811 0.97641509]

mean value: 0.9718255857786243

key: test_jcc
value: [0.91666667 0.94285714 0.89189189 0.92105263 0.8        0.89473684
 0.91666667 0.94594595 0.91666667 0.94285714]

mean value: 0.9089341597236333

key: train_jcc
value: [0.94495413 0.93920973 0.94785276 0.94478528 0.94461538 0.93251534
 0.95412844 0.94512195 0.94495413 0.95412844]

mean value: 0.9452265574126474

MCC on Blind test: 0.25

Accuracy on Blind test: 0.81