LSHTM_analysis/scripts/ml/log_gid_config.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 531

PASS: my_features_df and aa_df successfully combined
nrows: 531
ncols: 286
count of NULL values before imputation

or_mychisq          263
log10_or_mychisq    263
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

No. of numerical features: 44
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True
Original Data
 Counter({0: 76, 1: 43}) Data dim: (119, 51)

-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (119, 51)
Test data size: (412, 51)
y_train numbers: Counter({0: 76, 1: 43})
y_train ratio: 1.7674418604651163

y_test_numbers: Counter({0: 409, 1: 3})
y_test ratio: 136.33333333333334
-------------------------------------------------------------
Simple Random OverSampling
 Counter({0: 76, 1: 76})
(152, 51)
Simple Random UnderSampling
 Counter({0: 43, 1: 43})
(86, 51)
Simple Combined Over and UnderSampling
 Counter({0: 76, 1: 76})
(152, 51)
SMOTE_NC OverSampling
 Counter({0: 76, 1: 76})
(152, 51)

#####################################################################

Running ML analysis: UQ [without AA  index but with active site annotations]
Gene name: gid
Drug name: streptomycin

Output directory: /home/tanu/git/Data/streptomycin/output/ml/uq_v1/

Sanity checks:
Total input features: 51

Training data size: (119, 51)
Test data size: (412, 51)

Target feature numbers (training data): Counter({0: 76, 1: 43})
Target features ratio (training data: 1.7674418604651163

Target feature numbers (test data): Counter({0: 409, 1: 3})
Target features ratio (test data): 136.33333333333334

#####################################################################


================================================================

Strucutral features (n): 35
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01348901 0.01226354 0.01228833 0.01419163 0.01196408 0.01235008
 0.01226306 0.01198006 0.01196694 0.01303458]

mean value: 0.012579131126403808

key: score_time
value: [0.00877213 0.00875831 0.0089767  0.00837636 0.00834727 0.00831747
 0.00833845 0.00829291 0.00832725 0.00867438]

mean value: 0.008518123626708984

key: test_mcc
value: [0.42640143 0.40824829 0.         0.625      0.63245553 0.70710678
 0.68313005 0.83666003 0.31428571 0.62360956]

mean value: 0.5256897392741394

key: train_mcc
value: [0.73433335 0.80052092 0.81774488 0.71490799 0.77603911 0.73433335
 0.75414636 0.75414636 0.79379397 0.7364483 ]

mean value: 0.7616414584886299

key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.75       0.75       0.5        0.83333333 0.83333333 0.83333333
 0.83333333 0.91666667 0.66666667 0.81818182]

mean value: 0.7734848484848484

key: train_accuracy
value: [0.87850467 0.90654206 0.91588785 0.86915888 0.89719626 0.87850467
 0.88785047 0.88785047 0.90654206 0.87962963]

mean value: 0.89076670128072

key: test_fscore
value: [0.4        0.57142857 0.4        0.75       0.66666667 0.8
 0.75       0.88888889 0.6        0.66666667]

mean value: 0.6493650793650794

key: train_fscore
value: [0.82191781 0.85714286 0.87671233 0.8        0.84931507 0.82191781
 0.82352941 0.82352941 0.86111111 0.81690141]

mean value: 0.8352077213932715

key: test_precision
value: [1.         0.66666667 0.33333333 0.75       1.         0.66666667
 1.         1.         0.6        1.        ]

mean value: 0.8016666666666666

key: train_precision
value: [0.88235294 0.96774194 0.94117647 0.90322581 0.91176471 0.88235294
 0.93333333 0.93333333 0.91176471 0.90625   ]

mean value: 0.9173296173308033

key: test_recall
value: [0.25 0.5  0.5  0.75 0.5  1.   0.6  0.8  0.6  0.5 ]

mean value: 0.6

key: train_recall
value: [0.76923077 0.76923077 0.82051282 0.71794872 0.79487179 0.76923077
 0.73684211 0.73684211 0.81578947 0.74358974]

mean value: 0.7674089068825911

key: test_roc_auc
value: [0.625      0.6875     0.5        0.8125     0.75       0.875
 0.8        0.9        0.65714286 0.75      ]

mean value: 0.7357142857142858

key: train_roc_auc
value: [0.85520362 0.87726244 0.89555053 0.83691554 0.87537707 0.85520362
 0.8539283  0.8539283  0.88615561 0.85005574]

mean value: 0.8639580766297014

key: test_jcc
value: [0.25       0.4        0.25       0.6        0.5        0.66666667
 0.6        0.8        0.42857143 0.5       ]

mean value: 0.49952380952380954

key: train_jcc
value: [0.69767442 0.75       0.7804878  0.66666667 0.73809524 0.69767442
 0.7        0.7        0.75609756 0.69047619]

mean value: 0.7177172298301056

MCC on Blind test: 0.15

Accuracy on Blind test: 0.77

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.38598299 0.37587595 0.37197232 0.36586213 0.3655436  0.3787601
 0.37384486 0.3567059  0.36220002 0.34969902]

mean value: 0.3686446905136108

key: score_time
value: [0.00918126 0.00917006 0.00951552 0.00891447 0.00908375 0.00929761
 0.00941563 0.00886798 0.00938153 0.00919795]

mean value: 0.00920257568359375

key: test_mcc
value: [1.         0.625      0.35355339 0.83666003 0.625      0.70710678
 0.83666003 0.83666003 0.50709255 0.60714286]

mean value: 0.6934875661362015

key: train_mcc
value: [0.89876312 1.         0.9600061  0.85805669 0.95965309 0.95965309
 0.93862091 0.85625561 1.         0.81859189]

mean value: 0.9249600511154796

key: test_accuracy
value: [1.         0.83333333 0.66666667 0.91666667 0.83333333 0.83333333
 0.91666667 0.91666667 0.75       0.81818182]

mean value: 0.8484848484848485

key: train_accuracy
value: [0.95327103 1.         0.98130841 0.93457944 0.98130841 0.98130841
 0.97196262 0.93457944 1.         0.91666667]

mean value: 0.9654984423676012

key: test_fscore
value: [1.         0.75       0.6        0.88888889 0.75       0.8
 0.88888889 0.88888889 0.72727273 0.75      ]

mean value: 0.8043939393939394

key: train_fscore
value: [0.93506494 1.         0.97368421 0.90666667 0.97435897 0.97435897
 0.96       0.90410959 1.         0.87671233]

mean value: 0.9504955678784085

key: test_precision
value: [1.         0.75       0.5        0.8        0.75       0.66666667
 1.         1.         0.66666667 0.75      ]

mean value: 0.7883333333333333

key: train_precision
value: [0.94736842 1.         1.         0.94444444 0.97435897 0.97435897
 0.97297297 0.94285714 1.         0.94117647]

mean value: 0.9697537400633376

key: test_recall
value: [1.   0.75 0.75 1.   0.75 1.   0.8  0.8  0.8  0.75]

mean value: 0.84

key: train_recall
value: [0.92307692 1.         0.94871795 0.87179487 0.97435897 0.97435897
 0.94736842 0.86842105 1.         0.82051282]

mean value: 0.9328609986504723

key: test_roc_auc
value: [1.         0.8125     0.6875     0.9375     0.8125     0.875
 0.9        0.9        0.75714286 0.80357143]

mean value: 0.8485714285714285

key: train_roc_auc
value: [0.94683258 1.         0.97435897 0.92119155 0.97982655 0.97982655
 0.96643783 0.91971777 1.         0.89576366]

mean value: 0.9583955462135567

key: test_jcc
value: [1.         0.6        0.42857143 0.8        0.6        0.66666667
 0.8        0.8        0.57142857 0.6       ]

mean value: 0.6866666666666666

key: train_jcc
value: [0.87804878 1.         0.94871795 0.82926829 0.95       0.95
 0.92307692 0.825      1.         0.7804878 ]

mean value: 0.9084599749843653

MCC on Blind test: 0.01

Accuracy on Blind test: 0.7

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00942707 0.00902557 0.00695324 0.00660396 0.00666595 0.00662398
 0.00659394 0.00675702 0.00663805 0.00661373]

mean value: 0.007190251350402832

key: score_time
value: [0.01058674 0.01051068 0.00814915 0.00790286 0.00790191 0.0078671
 0.00790071 0.00779772 0.00793719 0.0078752 ]

mean value: 0.008442926406860351

key: test_mcc
value: [0.81649658 0.47809144 0.5        0.23904572 0.35355339 0.47809144
 0.16903085 0.50709255 0.16903085 0.35634832]

mean value: 0.4066781158133809

key: train_mcc
value: [0.63375685 0.67693504 0.66003337 0.51450646 0.70701192 0.58648859
 0.69614472 0.60558322 0.65590587 0.6700827 ]

mean value: 0.6406448743407447

key: test_accuracy
value: [0.91666667 0.75       0.66666667 0.58333333 0.66666667 0.75
 0.58333333 0.75       0.58333333 0.54545455]

mean value: 0.6795454545454546

key: train_accuracy
value: [0.80373832 0.8317757  0.8317757  0.71962617 0.85046729 0.81308411
 0.8317757  0.82242991 0.80373832 0.83333333]

mean value: 0.8141744548286605

key: test_fscore
value: [0.85714286 0.66666667 0.66666667 0.54545455 0.6        0.66666667
 0.54545455 0.72727273 0.54545455 0.61538462]

mean value: 0.6436163836163835

key: train_fscore
value: [0.77419355 0.8        0.79069767 0.70588235 0.81818182 0.71428571
 0.80434783 0.6984127  0.77894737 0.79545455]

mean value: 0.7680403546589664

key: test_precision
value: [1.         0.6        0.5        0.42857143 0.5        0.6
 0.5        0.66666667 0.5        0.44444444]

mean value: 0.5739682539682539

key: train_precision
value: [0.66666667 0.70588235 0.72340426 0.57142857 0.73469388 0.80645161
 0.68518519 0.88       0.64912281 0.71428571]

mean value: 0.7137121043298253

key: test_recall
value: [0.75 0.75 1.   0.75 0.75 0.75 0.6  0.8  0.6  1.  ]

mean value: 0.775

key: train_recall
value: [0.92307692 0.92307692 0.87179487 0.92307692 0.92307692 0.64102564
 0.97368421 0.57894737 0.97368421 0.8974359 ]

mean value: 0.8628879892037787

key: test_roc_auc
value: [0.875      0.75       0.75       0.625      0.6875     0.75
 0.58571429 0.75714286 0.58571429 0.64285714]

mean value: 0.7008928571428571

key: train_roc_auc
value: [0.82918552 0.85124434 0.8403092  0.76300905 0.86595023 0.77639517
 0.8636537  0.76773455 0.84191457 0.84726867]

mean value: 0.8246665009957512

key: test_jcc
value: [0.75       0.5        0.5        0.375      0.42857143 0.5
 0.375      0.57142857 0.375      0.44444444]

mean value: 0.48194444444444445

key: train_jcc
value: [0.63157895 0.66666667 0.65384615 0.54545455 0.69230769 0.55555556
 0.67272727 0.53658537 0.63793103 0.66037736]

mean value: 0.625303059275329

MCC on Blind test: 0.03

Accuracy on Blind test: 0.49

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00705171 0.00684381 0.00682545 0.00677752 0.0068419  0.0067687
 0.00679421 0.00680947 0.0066936  0.00685263]

mean value: 0.0068259000778198246

key: score_time
value: [0.00836945 0.0079124  0.0078671  0.00792718 0.00797105 0.00790572
 0.00787711 0.00786543 0.00802302 0.00789714]

mean value: 0.007961559295654296

key: test_mcc
value: [ 0.          0.25       -0.23904572  0.47809144  0.40824829  0.
 -0.09759001  0.52915026  0.31428571  0.38575837]

mean value: 0.20288983564397506

key: train_mcc
value: [0.4754902  0.50673892 0.4653488  0.44239297 0.48817818 0.50337256
 0.39242808 0.39534618 0.48161946 0.37522992]

mean value: 0.4526145268371993

key: test_accuracy
value: [0.66666667 0.66666667 0.41666667 0.75       0.75       0.41666667
 0.5        0.75       0.66666667 0.72727273]

mean value: 0.6310606060606061

key: train_accuracy
value: [0.75700935 0.77570093 0.75700935 0.74766355 0.76635514 0.77570093
 0.72897196 0.71962617 0.76635514 0.72222222]

mean value: 0.7516614745586708

key: test_fscore
value: [0.         0.5        0.22222222 0.66666667 0.57142857 0.46153846
 0.25       0.57142857 0.6        0.57142857]

mean value: 0.4414713064713065

key: train_fscore
value: [0.66666667 0.67567568 0.64864865 0.63013699 0.66666667 0.66666667
 0.5915493  0.61538462 0.65753425 0.57142857]

mean value: 0.6390358039788872

key: test_precision
value: [0.         0.5        0.2        0.6        0.66666667 0.33333333
 0.33333333 1.         0.6        0.66666667]

mean value: 0.49

key: train_precision
value: [0.66666667 0.71428571 0.68571429 0.67647059 0.69444444 0.72727273
 0.63636364 0.6        0.68571429 0.64516129]

mean value: 0.6732093639019635

key: test_recall
value: [0.   0.5  0.25 0.75 0.5  0.75 0.2  0.4  0.6  0.5 ]

mean value: 0.445

key: train_recall
value: [0.66666667 0.64102564 0.61538462 0.58974359 0.64102564 0.61538462
 0.55263158 0.63157895 0.63157895 0.51282051]

mean value: 0.6097840755735493

key: test_roc_auc
value: [0.5        0.625      0.375      0.75       0.6875     0.5
 0.45714286 0.7        0.65714286 0.67857143]

mean value: 0.5930357142857143

key: train_roc_auc
value: [0.7377451  0.74698341 0.72680995 0.71398944 0.73963047 0.74151584
 0.68935927 0.69984744 0.73607933 0.67670011]

mean value: 0.7208660360817448

key: test_jcc
value: [0.         0.33333333 0.125      0.5        0.4        0.3
 0.14285714 0.4        0.42857143 0.4       ]

mean value: 0.30297619047619045

key: train_jcc
value: [0.5        0.51020408 0.48       0.46       0.5        0.5
 0.42       0.44444444 0.48979592 0.4       ]

mean value: 0.47044444444444444

MCC on Blind test: 0.14

Accuracy on Blind test: 0.73

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00671077 0.00906372 0.00673389 0.00643206 0.00652814 0.00724578
 0.00731564 0.00711012 0.00714564 0.00714374]

mean value: 0.007142949104309082

key: score_time
value: [0.04456663 0.02610064 0.00889969 0.00866151 0.00879526 0.00940728
 0.00941706 0.00944591 0.00944233 0.00941896]

mean value: 0.014415526390075683

key: test_mcc
value: [ 0.          0.          0.47809144  0.625       0.15811388  0.47809144
  0.07559289  0.29277002  0.11952286 -0.03857584]

mean value: 0.21886067104052556

key: train_mcc
value: [0.47836451 0.54358024 0.65128682 0.48080439 0.47687292 0.38417516
 0.55925621 0.60298802 0.55802654 0.50141804]

mean value: 0.5236772842107089

key: test_accuracy
value: [0.66666667 0.58333333 0.75       0.83333333 0.66666667 0.75
 0.58333333 0.66666667 0.58333333 0.54545455]

mean value: 0.6628787878787878

key: train_accuracy
value: [0.76635514 0.79439252 0.8411215  0.76635514 0.76635514 0.72897196
 0.80373832 0.82242991 0.80373832 0.77777778]

mean value: 0.7871235721703012

key: test_fscore
value: [0.         0.28571429 0.66666667 0.75       0.33333333 0.66666667
 0.28571429 0.5        0.44444444 0.28571429]

mean value: 0.4218253968253968

key: train_fscore
value: [0.63768116 0.66666667 0.75362319 0.64788732 0.62686567 0.53968254
 0.69565217 0.70769231 0.67692308 0.64705882]

mean value: 0.6599732931818586

key: test_precision
value: [0.         0.33333333 0.6        0.75       0.5        0.6
 0.5        0.66666667 0.5        0.33333333]

mean value: 0.47833333333333333

key: train_precision
value: [0.73333333 0.81481481 0.86666667 0.71875    0.75       0.70833333
 0.77419355 0.85185185 0.81481481 0.75862069]

mean value: 0.7791379052857084

key: test_recall
value: [0.   0.25 0.75 0.75 0.25 0.75 0.2  0.4  0.4  0.25]

mean value: 0.4

key: train_recall
value: [0.56410256 0.56410256 0.66666667 0.58974359 0.53846154 0.43589744
 0.63157895 0.60526316 0.57894737 0.56410256]

mean value: 0.5738866396761133

key: test_roc_auc
value: [0.5        0.5        0.75       0.8125     0.5625     0.75
 0.52857143 0.62857143 0.55714286 0.48214286]

mean value: 0.6071428571428571

key: train_roc_auc
value: [0.72322775 0.74528658 0.80392157 0.72869532 0.71776018 0.66647813
 0.76506484 0.77364607 0.7532418  0.73132664]

mean value: 0.7408648884655077

key: test_jcc
value: [0.         0.16666667 0.5        0.6        0.2        0.5
 0.16666667 0.33333333 0.28571429 0.16666667]

mean value: 0.2919047619047619

key: train_jcc
value: [0.46808511 0.5        0.60465116 0.47916667 0.45652174 0.36956522
 0.53333333 0.54761905 0.51162791 0.47826087]

mean value: 0.4948831049856425

MCC on Blind test: 0.04

Accuracy on Blind test: 0.82

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00767875 0.00732279 0.00758123 0.00753927 0.00771689 0.00793982
 0.00819755 0.00749803 0.00751305 0.00764227]

mean value: 0.0076629638671875

key: score_time
value: [0.00804496 0.00836205 0.00849819 0.0081315  0.00821066 0.00882912
 0.0087676  0.00863767 0.00823522 0.00829029]

mean value: 0.008400726318359374

key: test_mcc
value: [0.42640143 0.40824829 0.11952286 0.81649658 0.63245553 0.83666003
 0.35675303 0.52915026 0.11952286 0.41833001]

mean value: 0.4663540894023734

key: train_mcc
value: [0.71777084 0.72240602 0.71777084 0.69776211 0.73774797 0.67769958
 0.672375   0.71336904 0.78283392 0.67891024]

mean value: 0.7118645559720945

key: test_accuracy
value: [0.75       0.75       0.58333333 0.91666667 0.83333333 0.91666667
 0.66666667 0.75       0.58333333 0.72727273]

mean value: 0.7477272727272727

key: train_accuracy
value: [0.86915888 0.86915888 0.86915888 0.85981308 0.87850467 0.85046729
 0.85046729 0.86915888 0.89719626 0.85185185]

mean value: 0.8664935964001385

key: test_fscore
value: [0.4        0.57142857 0.44444444 0.85714286 0.66666667 0.88888889
 0.33333333 0.57142857 0.44444444 0.4       ]

mean value: 0.5577777777777778

key: train_fscore
value: [0.79411765 0.78787879 0.79411765 0.7761194  0.8115942  0.75757576
 0.75       0.78787879 0.83076923 0.75757576]

mean value: 0.7847627221679594

key: test_precision
value: [1.         0.66666667 0.4        1.         1.         0.8
 1.         1.         0.5        1.        ]

mean value: 0.8366666666666667

key: train_precision
value: [0.93103448 0.96296296 0.93103448 0.92857143 0.93333333 0.92592593
 0.92307692 0.92857143 1.         0.92592593]

mean value: 0.939043689388517

key: test_recall
value: [0.25 0.5  0.5  0.75 0.5  1.   0.2  0.4  0.4  0.25]

mean value: 0.475

key: train_recall
value: [0.69230769 0.66666667 0.69230769 0.66666667 0.71794872 0.64102564
 0.63157895 0.68421053 0.71052632 0.64102564]

mean value: 0.6744264507422402

key: test_roc_auc
value: [0.625      0.6875     0.5625     0.875      0.75       0.9375
 0.6        0.7        0.55714286 0.625     ]

mean value: 0.6919642857142857

key: train_roc_auc
value: [0.83144796 0.82598039 0.83144796 0.81862745 0.84426848 0.80580694
 0.80129672 0.82761251 0.85526316 0.80602007]

mean value: 0.8247771639900459

key: test_jcc
value: [0.25       0.4        0.28571429 0.75       0.5        0.8
 0.2        0.4        0.28571429 0.25      ]

mean value: 0.41214285714285714

key: train_jcc
value: [0.65853659 0.65       0.65853659 0.63414634 0.68292683 0.6097561
 0.6        0.65       0.71052632 0.6097561 ]

mean value: 0.6464184852374839

MCC on Blind test: 0.16

Accuracy on Blind test: 0.79

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.44463158 0.42616534 0.55910635 0.43400979 0.4541378  0.42967033
 0.43932247 0.51193166 0.42841673 0.43269181]

mean value: 0.4560083866119385

key: score_time
value: [0.01107335 0.01112819 0.01112199 0.0153079  0.01128125 0.0111084
 0.02174282 0.01112676 0.01113582 0.01421332]

mean value: 0.012923979759216308

key: test_mcc
value: [0.81649658 0.83666003 0.         0.70710678 0.15811388 0.47809144
 0.07559289 0.47809144 0.31428571 0.38575837]

mean value: 0.42501971429170726

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91666667 0.91666667 0.5        0.83333333 0.66666667 0.75
 0.58333333 0.75       0.66666667 0.72727273]

mean value: 0.7310606060606061

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.88888889 0.4        0.8        0.33333333 0.66666667
 0.28571429 0.66666667 0.6        0.57142857]

mean value: 0.606984126984127

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.8        0.33333333 0.66666667 0.5        0.6
 0.5        0.75       0.6        0.66666667]

mean value: 0.6416666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   0.5  1.   0.25 0.75 0.2  0.6  0.6  0.5 ]

mean value: 0.615

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.9375     0.5        0.875      0.5625     0.75
 0.52857143 0.72857143 0.65714286 0.67857143]

mean value: 0.7092857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.8        0.25       0.66666667 0.2        0.5
 0.16666667 0.5        0.42857143 0.4       ]

mean value: 0.4661904761904762

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.69

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01969337 0.00755858 0.00811434 0.00742817 0.0073216  0.00782657
 0.00776482 0.00792885 0.00732517 0.00794721]

mean value: 0.008890867233276367

key: score_time
value: [0.01085663 0.00857472 0.00874376 0.0083375  0.00824308 0.00869703
 0.00888395 0.00872183 0.00871086 0.00867748]

mean value: 0.008844685554504395

key: test_mcc
value: [0.83666003 0.625      0.81649658 0.81649658 0.83666003 1.
 0.50709255 0.84515425 0.65714286 0.81009259]

mean value: 0.7750795466933069

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91666667 0.83333333 0.91666667 0.91666667 0.91666667 1.
 0.75       0.91666667 0.83333333 0.90909091]

mean value: 0.8909090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.75       0.85714286 0.85714286 0.88888889 1.
 0.72727273 0.90909091 0.8        0.85714286]

mean value: 0.8535569985569985

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        0.75       1.         1.         0.8        1.
 0.66666667 0.83333333 0.8        1.        ]

mean value: 0.865

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.75 1.   1.   0.8  1.   0.8  0.75]

mean value: 0.86

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.8125     0.875      0.875      0.9375     1.
 0.75714286 0.92857143 0.82857143 0.875     ]

mean value: 0.8826785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.6        0.75       0.75       0.8        1.
 0.57142857 0.83333333 0.66666667 0.75      ]

mean value: 0.7521428571428571

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.0870738  0.08674216 0.08633971 0.0796845  0.08702898 0.08723402
 0.0800159  0.08267999 0.08336306 0.0805757 ]

mean value: 0.08407378196716309

key: score_time
value: [0.01838231 0.01821375 0.01814437 0.01772857 0.01825523 0.01835394
 0.01846385 0.01686049 0.01691628 0.0185349 ]

mean value: 0.01798536777496338

key: test_mcc
value: [0.63245553 0.40824829 0.625      1.         0.40824829 0.83666003
 0.35675303 0.68313005 0.50709255 0.        ]

mean value: 0.5457587777402898

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.83333333 0.75       0.83333333 1.         0.75       0.91666667
 0.66666667 0.83333333 0.75       0.63636364]

mean value: 0.796969696969697

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.57142857 0.75       1.         0.57142857 0.88888889
 0.33333333 0.75       0.72727273 0.        ]

mean value: 0.625901875901876

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.66666667 0.75       1.         0.66666667 0.8
 1.         1.         0.66666667 0.        ]

mean value: 0.755

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.5  0.75 1.   0.5  1.   0.2  0.6  0.8  0.  ]

mean value: 0.585

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.6875     0.8125     1.         0.6875     0.9375
 0.6        0.8        0.75714286 0.5       ]

mean value: 0.7532142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.4        0.6        1.         0.4        0.8
 0.2        0.6        0.57142857 0.        ]

mean value: 0.5071428571428571

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.77

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00703287 0.00692582 0.00732732 0.00697088 0.00684643 0.00690222
 0.00689054 0.00692654 0.00703335 0.00678849]

mean value: 0.006964445114135742

key: score_time
value: [0.00811267 0.00805044 0.00872636 0.00804448 0.00804257 0.00807214
 0.0084269  0.00811172 0.00793386 0.00806904]

mean value: 0.008159017562866211

key: test_mcc
value: [ 0.63245553  0.63245553  0.25        0.          0.625       0.15811388
  0.47809144  0.29277002 -0.23904572  0.38575837]

mean value: 0.3215599065732439

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.83333333 0.83333333 0.66666667 0.58333333 0.83333333 0.66666667
 0.75       0.66666667 0.41666667 0.72727273]

mean value: 0.6977272727272728

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.66666667 0.5        0.28571429 0.75       0.33333333
 0.66666667 0.5        0.22222222 0.57142857]

mean value: 0.5162698412698412

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.5        0.33333333 0.75       0.5
 0.75       0.66666667 0.25       0.66666667]

mean value: 0.6416666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.5  0.5  0.25 0.75 0.25 0.6  0.4  0.2  0.5 ]

mean value: 0.445

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.75       0.625      0.5        0.8125     0.5625
 0.72857143 0.62857143 0.38571429 0.67857143]

mean value: 0.6421428571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.5        0.33333333 0.16666667 0.6        0.2
 0.5        0.33333333 0.125      0.4       ]

mean value: 0.36583333333333334

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.6

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [0.9999063  0.96824765 0.96530747 1.01568055 1.00144243 1.0073843
 1.0299902  0.9734323  0.96623063 0.96925235]

mean value: 0.9896874189376831

key: score_time
value: [0.08913732 0.08848977 0.09147906 0.0936265  0.09639764 0.09607625
 0.08916879 0.08925462 0.08908725 0.08973861]

mean value: 0.09124557971954346

key: test_mcc
value: [1.         0.625      0.625      1.         0.40824829 1.
 0.83666003 0.65714286 0.65714286 0.81009259]

mean value: 0.7619286618584635

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.83333333 0.83333333 1.         0.75       1.
 0.91666667 0.83333333 0.83333333 0.90909091]

mean value: 0.8909090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.75       0.75       1.         0.57142857 1.
 0.88888889 0.8        0.8        0.85714286]

mean value: 0.8417460317460318

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       0.75       1.         0.66666667 1.
 1.         0.8        0.8        1.        ]

mean value: 0.8766666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 1.   0.5  1.   0.8  0.8  0.8  0.75]

mean value: 0.8150000000000001

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.8125     0.8125     1.         0.6875     1.
 0.9        0.82857143 0.82857143 0.875     ]

mean value: 0.8744642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.6        0.6        1.         0.4        1.
 0.8        0.66666667 0.66666667 0.75      ]

mean value: 0.7483333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

key: fit_time
value: [1.706635   0.86073232 0.898561   0.83530664 0.94017434 0.93120837
 0.82625723 0.85147214 0.83630848 0.80121708]

mean value: 0.948787260055542

key: score_time
value: [0.23586416 0.21526957 0.2305944  0.21968794 0.23709798 0.14258313
 0.17830396 0.22208166 0.23773837 0.23715353]

mean value: 0.215637469291687

key: test_mcc
value: [0.81649658 0.625      0.625      0.81649658 0.63245553 1.
 0.52915026 0.47809144 0.68313005 0.81009259]

mean value: 0.701591303820076

key: train_mcc
value: [0.94025192 0.9600061  0.94025192 0.9600061  0.9600061  0.9600061
 0.95952175 0.95952175 0.97968078 0.94053994]

mean value: 0.9559792483395588

key: test_accuracy
value: [0.91666667 0.83333333 0.83333333 0.91666667 0.83333333 1.
 0.75       0.75       0.83333333 0.90909091]

mean value: 0.8575757575757575

key: train_accuracy
value: [0.97196262 0.98130841 0.97196262 0.98130841 0.98130841 0.98130841
 0.98130841 0.98130841 0.99065421 0.97222222]

mean value: 0.9794652128764278

key: test_fscore
value: [0.85714286 0.75       0.75       0.85714286 0.66666667 1.
 0.57142857 0.66666667 0.75       0.85714286]

mean value: 0.7726190476190475

key: train_fscore
value: [0.96       0.97368421 0.96       0.97368421 0.97368421 0.97368421
 0.97297297 0.97297297 0.98666667 0.96      ]

mean value: 0.9707349454717876

key: test_precision
value: [1.   0.75 0.75 1.   1.   1.   1.   0.75 1.   1.  ]

mean value: 0.925

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.75 0.75 0.75 0.5  1.   0.4  0.6  0.6  0.75]

mean value: 0.685

key: train_recall
value: [0.92307692 0.94871795 0.92307692 0.94871795 0.94871795 0.94871795
 0.94736842 0.94736842 0.97368421 0.92307692]

mean value: 0.9432523616734143

key: test_roc_auc
value: [0.875      0.8125     0.8125     0.875      0.75       1.
 0.7        0.72857143 0.8        0.875     ]

mean value: 0.8228571428571428

key: train_roc_auc
value: [0.96153846 0.97435897 0.96153846 0.97435897 0.97435897 0.97435897
 0.97368421 0.97368421 0.98684211 0.96153846]

mean value: 0.9716261808367072

key: test_jcc
value: [0.75 0.6  0.6  0.75 0.5  1.   0.4  0.5  0.6  0.75]

mean value: 0.645

key: train_jcc
value: [0.92307692 0.94871795 0.92307692 0.94871795 0.94871795 0.94871795
 0.94736842 0.94736842 0.97368421 0.92307692]

mean value: 0.9432523616734143

MCC on Blind test: 0.14

Accuracy on Blind test: 0.87

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01675677 0.00714898 0.0082562  0.0074234  0.00729084 0.00770926
 0.00744081 0.00784683 0.00782228 0.00789189]

mean value: 0.00855872631072998

key: score_time
value: [0.01316333 0.00809813 0.0098176  0.00799656 0.00866699 0.0089457
 0.00823903 0.00895739 0.00873828 0.0089376 ]

mean value: 0.009156060218811036

key: test_mcc
value: [ 0.          0.25       -0.23904572  0.47809144  0.40824829  0.
 -0.09759001  0.52915026  0.31428571  0.38575837]

mean value: 0.20288983564397506

key: train_mcc
value: [0.4754902  0.50673892 0.4653488  0.44239297 0.48817818 0.50337256
 0.39242808 0.39534618 0.48161946 0.37522992]

mean value: 0.4526145268371993

key: test_accuracy
value: [0.66666667 0.66666667 0.41666667 0.75       0.75       0.41666667
 0.5        0.75       0.66666667 0.72727273]

mean value: 0.6310606060606061

key: train_accuracy
value: [0.75700935 0.77570093 0.75700935 0.74766355 0.76635514 0.77570093
 0.72897196 0.71962617 0.76635514 0.72222222]

mean value: 0.7516614745586708

key: test_fscore
value: [0.         0.5        0.22222222 0.66666667 0.57142857 0.46153846
 0.25       0.57142857 0.6        0.57142857]

mean value: 0.4414713064713065

key: train_fscore
value: [0.66666667 0.67567568 0.64864865 0.63013699 0.66666667 0.66666667
 0.5915493  0.61538462 0.65753425 0.57142857]

mean value: 0.6390358039788872

key: test_precision
value: [0.         0.5        0.2        0.6        0.66666667 0.33333333
 0.33333333 1.         0.6        0.66666667]

mean value: 0.49

key: train_precision
value: [0.66666667 0.71428571 0.68571429 0.67647059 0.69444444 0.72727273
 0.63636364 0.6        0.68571429 0.64516129]

mean value: 0.6732093639019635

key: test_recall
value: [0.   0.5  0.25 0.75 0.5  0.75 0.2  0.4  0.6  0.5 ]

mean value: 0.445

key: train_recall
value: [0.66666667 0.64102564 0.61538462 0.58974359 0.64102564 0.61538462
 0.55263158 0.63157895 0.63157895 0.51282051]

mean value: 0.6097840755735493

key: test_roc_auc
value: [0.5        0.625      0.375      0.75       0.6875     0.5
 0.45714286 0.7        0.65714286 0.67857143]

mean value: 0.5930357142857143

key: train_roc_auc
value: [0.7377451  0.74698341 0.72680995 0.71398944 0.73963047 0.74151584
 0.68935927 0.69984744 0.73607933 0.67670011]

mean value: 0.7208660360817448

key: test_jcc
value: [0.         0.33333333 0.125      0.5        0.4        0.3
 0.14285714 0.4        0.42857143 0.4       ]

mean value: 0.30297619047619045

key: train_jcc
value: [0.5        0.51020408 0.48       0.46       0.5        0.5
 0.42       0.44444444 0.48979592 0.4       ]

mean value: 0.47044444444444444

MCC on Blind test: 0.14

Accuracy on Blind test: 0.73

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07072282 0.03617072 0.03579974 0.03728676 0.03555346 0.03524041
 0.03362727 0.03646469 0.05544496 0.09033132]

mean value: 0.04666421413421631

key: score_time
value: [0.01112819 0.0112555  0.01102757 0.01083326 0.01092458 0.01118302
 0.01061773 0.01085544 0.00988364 0.0103364 ]

mean value: 0.010804533958435059

key: test_mcc
value: [1.         0.625      0.81649658 0.81649658 0.83666003 1.
 0.65714286 1.         0.65714286 0.81009259]

mean value: 0.8219031489976224

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.83333333 0.91666667 0.91666667 0.91666667 1.
 0.83333333 1.         0.83333333 0.90909091]

mean value: 0.9159090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.75       0.85714286 0.85714286 0.88888889 1.
 0.8        1.         0.8        0.85714286]

mean value: 0.881031746031746

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.   0.75 1.   1.   0.8  1.   0.8  1.   0.8  1.  ]

mean value: 0.915

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.75 1.   1.   0.8  1.   0.8  0.75]

mean value: 0.86

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.8125     0.875      0.875      0.9375     1.
 0.82857143 1.         0.82857143 0.875     ]

mean value: 0.9032142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.6        0.75       0.75       0.8        1.
 0.66666667 1.         0.66666667 0.75      ]

mean value: 0.7983333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01759219 0.01122999 0.01159906 0.01160884 0.011446   0.0116322
 0.01226497 0.01147413 0.01333094 0.01733923]

mean value: 0.01295175552368164

key: score_time
value: [0.01068926 0.01071072 0.01065826 0.01099992 0.01074338 0.01079488
 0.01092148 0.01075888 0.01078367 0.01083136]

mean value: 0.010789179801940918

key: test_mcc
value: [ 0.625       0.25        0.35355339  0.83666003  0.83666003  0.83666003
  0.65714286  1.          0.71428571 -0.17857143]

mean value: 0.5931390613052643

key: train_mcc
value: [0.90236159 0.96085507 0.96085507 0.90236159 0.93999796 0.92091277
 0.92008523 0.92008523 0.92008523 0.96106604]

mean value: 0.9308665771065557

key: test_accuracy
value: [0.83333333 0.66666667 0.66666667 0.91666667 0.91666667 0.91666667
 0.83333333 1.         0.83333333 0.45454545]

mean value: 0.8037878787878788

key: train_accuracy
value: [0.95327103 0.98130841 0.98130841 0.95327103 0.97196262 0.96261682
 0.96261682 0.96261682 0.96261682 0.98148148]

mean value: 0.9673070266528211

key: test_fscore
value: [0.75       0.5        0.6        0.88888889 0.88888889 0.88888889
 0.8        1.         0.83333333 0.25      ]

mean value: 0.74

key: train_fscore
value: [0.9382716  0.975      0.975      0.9382716  0.96202532 0.95
 0.94871795 0.94871795 0.94871795 0.975     ]

mean value: 0.9559722372486086

key: test_precision
value: [0.75       0.5        0.5        0.8        0.8        0.8
 0.8        1.         0.71428571 0.25      ]

mean value: 0.6914285714285715

key: train_precision
value: [0.9047619  0.95121951 0.95121951 0.9047619  0.95       0.92682927
 0.925      0.925      0.925      0.95121951]

mean value: 0.9315011614401858

key: test_recall
value: [0.75 0.5  0.75 1.   1.   1.   0.8  1.   1.   0.25]

mean value: 0.805

key: train_recall
value: [0.97435897 1.         1.         0.97435897 0.97435897 0.97435897
 0.97368421 0.97368421 0.97368421 1.        ]

mean value: 0.9818488529014845

key: test_roc_auc
value: [0.8125     0.625      0.6875     0.9375     0.9375     0.9375
 0.82857143 1.         0.85714286 0.41071429]

mean value: 0.8033928571428571

key: train_roc_auc
value: [0.95776772 0.98529412 0.98529412 0.95776772 0.9724736  0.96512066
 0.96510297 0.96510297 0.96510297 0.98550725]

mean value: 0.9704534119579886

key: test_jcc
value: [0.6        0.33333333 0.42857143 0.8        0.8        0.8
 0.66666667 1.         0.71428571 0.14285714]

mean value: 0.6285714285714286

key: train_jcc
value: [0.88372093 0.95121951 0.95121951 0.88372093 0.92682927 0.9047619
 0.90243902 0.90243902 0.90243902 0.95121951]

mean value: 0.9160008643275801

MCC on Blind test: 0.07

Accuracy on Blind test: 0.69

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02494788 0.01585054 0.00776625 0.00726771 0.00712013 0.00683975
 0.00691533 0.00710177 0.00688958 0.00703335]

mean value: 0.00977323055267334

key: score_time
value: [0.01840568 0.0093627  0.00875354 0.00817943 0.00809598 0.00807095
 0.00832725 0.00807333 0.00806046 0.00866556]

mean value: 0.009399485588073731

key: test_mcc
value: [0.42640143 0.40824829 0.11952286 0.47809144 0.15811388 0.35355339
 0.35675303 0.29277002 0.47809144 0.41833001]

mean value: 0.3489875814335667

key: train_mcc
value: [0.45416735 0.52159509 0.45416735 0.49964579 0.52383566 0.43117964
 0.47315489 0.49023798 0.44470372 0.45631672]

mean value: 0.4749004183145091

key: test_accuracy
value: [0.75       0.75       0.58333333 0.75       0.66666667 0.66666667
 0.66666667 0.66666667 0.75       0.72727273]

mean value: 0.6977272727272728

key: train_accuracy
value: [0.75700935 0.78504673 0.75700935 0.77570093 0.78504673 0.74766355
 0.76635514 0.77570093 0.75700935 0.75925926]

mean value: 0.7665801315334025

key: test_fscore
value: [0.4        0.57142857 0.44444444 0.66666667 0.33333333 0.6
 0.33333333 0.5        0.66666667 0.4       ]

mean value: 0.4915873015873016

key: train_fscore
value: [0.60606061 0.66666667 0.60606061 0.625      0.63492063 0.58461538
 0.63768116 0.625      0.59375    0.60606061]

mean value: 0.6185815663804795

key: test_precision
value: [1.         0.66666667 0.4        0.6        0.5        0.5
 1.         0.66666667 0.75       1.        ]

mean value: 0.7083333333333334

key: train_precision
value: [0.74074074 0.76666667 0.74074074 0.8        0.83333333 0.73076923
 0.70967742 0.76923077 0.73076923 0.74074074]

mean value: 0.7562668872346292

key: test_recall
value: [0.25 0.5  0.5  0.75 0.25 0.75 0.2  0.4  0.6  0.25]

mean value: 0.445

key: train_recall
value: [0.51282051 0.58974359 0.51282051 0.51282051 0.51282051 0.48717949
 0.57894737 0.52631579 0.5        0.51282051]

mean value: 0.5246288798920378

key: test_roc_auc
value: [0.625      0.6875     0.5625     0.75       0.5625     0.6875
 0.6        0.62857143 0.72857143 0.625     ]

mean value: 0.6457142857142857

key: train_roc_auc
value: [0.70493967 0.74340121 0.70493967 0.71964555 0.72699849 0.69211916
 0.72425629 0.71967963 0.69927536 0.70568562]

mean value: 0.7140940648394546

key: test_jcc
value: [0.25       0.4        0.28571429 0.5        0.2        0.42857143
 0.2        0.33333333 0.5        0.25      ]

mean value: 0.33476190476190476

key: train_jcc
value: [0.43478261 0.5        0.43478261 0.45454545 0.46511628 0.41304348
 0.46808511 0.45454545 0.42222222 0.43478261]

mean value: 0.44819058211137036

MCC on Blind test: 0.14

Accuracy on Blind test: 0.75

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00787282 0.00709176 0.00747252 0.00727296 0.00753498 0.0074842
 0.00761414 0.00750947 0.00763249 0.00774121]

mean value: 0.00752265453338623

key: score_time
value: [0.00790644 0.00816011 0.00783634 0.00811148 0.00789452 0.00806546
 0.00857043 0.00817347 0.00801802 0.00889587]

mean value: 0.008163213729858398

key: test_mcc
value: [1.         0.625      0.11952286 0.70710678 0.47809144 0.70710678
 0.37142857 0.84515425 0.29277002 0.60714286]

mean value: 0.5753323572224797

key: train_mcc
value: [0.8165399  0.85945065 0.82420912 0.82726738 0.76153359 0.83287099
 0.79235477 0.84830731 0.84110073 0.8789655 ]

mean value: 0.8282599941345357

key: test_accuracy
value: [1.         0.83333333 0.58333333 0.83333333 0.75       0.83333333
 0.66666667 0.91666667 0.66666667 0.81818182]

mean value: 0.7901515151515152

key: train_accuracy
value: [0.90654206 0.93457944 0.91588785 0.91588785 0.87850467 0.91588785
 0.88785047 0.92523364 0.92523364 0.94444444]

mean value: 0.9150051921079958

key: test_fscore
value: [1.         0.75       0.44444444 0.8        0.66666667 0.8
 0.66666667 0.90909091 0.5        0.75      ]

mean value: 0.7286868686868687

key: train_fscore
value: [0.88372093 0.90410959 0.86956522 0.89156627 0.85057471 0.89411765
 0.86363636 0.90243902 0.88235294 0.92105263]

mean value: 0.8863135322209726

key: test_precision
value: [1.         0.75       0.4        0.66666667 0.6        0.66666667
 0.57142857 0.83333333 0.66666667 0.75      ]

mean value: 0.6904761904761905

key: train_precision
value: [0.80851064 0.97058824 1.         0.84090909 0.77083333 0.82608696
 0.76       0.84090909 1.         0.94594595]

mean value: 0.876378329121119

key: test_recall
value: [1.   0.75 0.5  1.   0.75 1.   0.8  1.   0.4  0.75]

mean value: 0.795

key: train_recall
value: [0.97435897 0.84615385 0.76923077 0.94871795 0.94871795 0.97435897
 1.         0.97368421 0.78947368 0.8974359 ]

mean value: 0.9122132253711202

key: test_roc_auc
value: [1.         0.8125     0.5625     0.875      0.75       0.875
 0.68571429 0.92857143 0.62857143 0.80357143]

mean value: 0.7921428571428571

key: train_roc_auc
value: [0.92100302 0.91572398 0.88461538 0.92288839 0.89347662 0.92835596
 0.91304348 0.93611747 0.89473684 0.9342252 ]

mean value: 0.9144186331459181

key: test_jcc
value: [1.         0.6        0.28571429 0.66666667 0.5        0.66666667
 0.5        0.83333333 0.33333333 0.6       ]

mean value: 0.5985714285714285

key: train_jcc
value: [0.79166667 0.825      0.76923077 0.80434783 0.74       0.80851064
 0.76       0.82222222 0.78947368 0.85365854]

mean value: 0.7964110343300379

MCC on Blind test: 0.04

Accuracy on Blind test: 0.82

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00992465 0.00924659 0.00704741 0.00705194 0.00760007 0.00777459
 0.00699854 0.00773525 0.00748038 0.00777173]

mean value: 0.007863116264343262

key: score_time
value: [0.01014447 0.00912404 0.00799847 0.00816655 0.00813746 0.00799036
 0.00796533 0.00817347 0.00830007 0.00831962]

mean value: 0.00843198299407959

key: test_mcc
value: [1.         0.40824829 0.40824829 0.625      0.81649658 0.625
 0.83666003 0.65714286 0.23904572 0.41833001]

mean value: 0.6034171780666301

key: train_mcc
value: [0.8720951  0.89986237 0.74811148 0.77945561 0.86259524 0.6717753
 0.93862091 0.88019137 0.69504805 0.78691217]

mean value: 0.8134667600062208

key: test_accuracy
value: [1.         0.75       0.75       0.83333333 0.91666667 0.83333333
 0.91666667 0.83333333 0.58333333 0.72727273]

mean value: 0.8143939393939394

key: train_accuracy
value: [0.93457944 0.95327103 0.87850467 0.89719626 0.93457944 0.8411215
 0.97196262 0.94392523 0.82242991 0.89814815]

mean value: 0.9075718241606092

key: test_fscore
value: [1.         0.57142857 0.57142857 0.75       0.85714286 0.75
 0.88888889 0.8        0.61538462 0.4       ]

mean value: 0.7204273504273505

key: train_fscore
value: [0.91764706 0.93670886 0.8        0.86075949 0.91358025 0.72131148
 0.96       0.91428571 0.8        0.8358209 ]

mean value: 0.8660113745385427

key: test_precision
value: [1.         0.66666667 0.66666667 0.75       1.         0.75
 1.         0.8        0.5        1.        ]

mean value: 0.8133333333333334

key: train_precision
value: [0.84782609 0.925      1.         0.85       0.88095238 1.
 0.97297297 1.         0.66666667 1.        ]

mean value: 0.9143418107548542

key: test_recall
value: [1.   0.5  0.5  0.75 0.75 0.75 0.8  0.8  0.8  0.25]

mean value: 0.6900000000000001

key: train_recall
value: [1.         0.94871795 0.66666667 0.87179487 0.94871795 0.56410256
 0.94736842 0.84210526 1.         0.71794872]

mean value: 0.8507422402159244

key: test_roc_auc
value: [1.         0.6875     0.6875     0.8125     0.875      0.8125
 0.9        0.82857143 0.61428571 0.625     ]

mean value: 0.7842857142857143

key: train_roc_auc
value: [0.94852941 0.95230015 0.83333333 0.89177979 0.93759427 0.78205128
 0.96643783 0.92105263 0.86231884 0.85897436]

mean value: 0.8954371900141855

key: test_jcc
value: [1.         0.4        0.4        0.6        0.75       0.6
 0.8        0.66666667 0.44444444 0.25      ]

mean value: 0.5911111111111111

key: train_jcc
value: [0.84782609 0.88095238 0.66666667 0.75555556 0.84090909 0.56410256
 0.92307692 0.84210526 0.66666667 0.71794872]

mean value: 0.7705809915992983

MCC on Blind test: 0.07

Accuracy on Blind test: 0.91

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.07426667 0.06154084 0.06283951 0.06345463 0.06271338 0.06455684
 0.06105089 0.06551576 0.06561875 0.06309128]

mean value: 0.0644648551940918

key: score_time
value: [0.01463723 0.01418447 0.01481771 0.01499844 0.01489115 0.01537299
 0.01565957 0.01581383 0.01552248 0.01574159]

mean value: 0.015163946151733398

key: test_mcc
value: [1.         0.625      0.625      0.625      0.83666003 1.
 0.52915026 1.         0.65714286 0.81009259]

mean value: 0.7708045733190834

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.83333333 0.83333333 0.83333333 0.91666667 1.
 0.75       1.         0.83333333 0.90909091]

mean value: 0.8909090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.75       0.75       0.75       0.88888889 1.
 0.57142857 1.         0.8        0.85714286]

mean value: 0.8367460317460318

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.   0.75 0.75 0.75 0.8  1.   1.   1.   0.8  1.  ]

mean value: 0.885

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.75 1.   1.   0.4  1.   0.8  0.75]

mean value: 0.8200000000000001

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.8125     0.8125     0.8125     0.9375     1.
 0.7        1.         0.82857143 0.875     ]

mean value: 0.8778571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.6        0.6        0.6        0.8        1.
 0.4        1.         0.66666667 0.75      ]

mean value: 0.7416666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.78

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.02740383 0.02774906 0.03296709 0.04285073 0.033988   0.02595377
 0.04620218 0.03556585 0.02733755 0.03265309]

mean value: 0.03326711654663086

key: score_time
value: [0.02362061 0.02275753 0.03781056 0.03313112 0.02986407 0.02139044
 0.03054595 0.02594328 0.02176881 0.02311182]

mean value: 0.02699441909790039

key: test_mcc
value: [0.83666003 0.625      0.81649658 1.         0.83666003 1.
 0.83666003 1.         0.65714286 0.81009259]

mean value: 0.8418712104973792

key: train_mcc
value: [1.         0.97991726 1.         0.97991726 0.97991726 1.
 1.         1.         1.         0.98002018]

mean value: 0.9919771953521386

key: test_accuracy
value: [0.91666667 0.83333333 0.91666667 1.         0.91666667 1.
 0.91666667 1.         0.83333333 0.90909091]

mean value: 0.9242424242424242

key: train_accuracy
value: [1.         0.99065421 1.         0.99065421 0.99065421 1.
 1.         1.         1.         0.99074074]

mean value: 0.9962703357563171

key: test_fscore
value: [0.88888889 0.75       0.85714286 1.         0.88888889 1.
 0.88888889 1.         0.8        0.85714286]

mean value: 0.8930952380952382

key: train_fscore
value: [1.         0.98701299 1.         0.98701299 0.98701299 1.
 1.         1.         1.         0.98701299]

mean value: 0.9948051948051948

key: test_precision
value: [0.8  0.75 1.   1.   0.8  1.   1.   1.   0.8  1.  ]

mean value: 0.915

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 1.   1.   1.   0.8  1.   0.8  0.75]

mean value: 0.885

key: train_recall
value: [1.         0.97435897 1.         0.97435897 0.97435897 1.
 1.         1.         1.         0.97435897]

mean value: 0.9897435897435898

key: test_roc_auc
value: [0.9375     0.8125     0.875      1.         0.9375     1.
 0.9        1.         0.82857143 0.875     ]

mean value: 0.9166071428571428

key: train_roc_auc
value: [1.         0.98717949 1.         0.98717949 0.98717949 1.
 1.         1.         1.         0.98717949]

mean value: 0.9948717948717949

key: test_jcc
value: [0.8        0.6        0.75       1.         0.8        1.
 0.8        1.         0.66666667 0.75      ]

mean value: 0.8166666666666667

key: train_jcc
value: [1.         0.97435897 1.         0.97435897 0.97435897 1.
 1.         1.         1.         0.97435897]

mean value: 0.9897435897435898

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.02972174 0.03583384 0.03609109 0.03561258 0.0359695  0.03587818
 0.03609157 0.03597355 0.03254533 0.03658676]

mean value: 0.0350304126739502

key: score_time
value: [0.02094769 0.02031541 0.02000165 0.02015448 0.01970243 0.01941609
 0.01103115 0.02209592 0.0211103  0.02550483]

mean value: 0.020027995109558105

key: test_mcc
value: [0.42640143 0.15811388 0.40824829 0.63245553 0.15811388 0.40824829
 0.35675303 0.07559289 0.11952286 0.        ]

mean value: 0.27434501012310836

key: train_mcc
value: [0.94025192 0.94025192 0.97991726 0.92064018 0.92064018 0.92064018
 0.93950808 0.93950808 0.93950808 0.94053994]

mean value: 0.9381405840047681

key: test_accuracy
value: [0.75       0.66666667 0.75       0.83333333 0.66666667 0.75
 0.66666667 0.58333333 0.58333333 0.63636364]

mean value: 0.6886363636363636

key: train_accuracy
value: [0.97196262 0.97196262 0.99065421 0.96261682 0.96261682 0.96261682
 0.97196262 0.97196262 0.97196262 0.97222222]

mean value: 0.9710539979231568

key: test_fscore
value: [0.4        0.33333333 0.57142857 0.66666667 0.33333333 0.57142857
 0.33333333 0.28571429 0.44444444 0.        ]

mean value: 0.39396825396825397

key: train_fscore
value: [0.96       0.96       0.98701299 0.94594595 0.94594595 0.94594595
 0.95890411 0.95890411 0.95890411 0.96      ]

mean value: 0.9581563153617948

key: test_precision
value: [1.         0.5        0.66666667 1.         0.5        0.66666667
 1.         0.5        0.5        0.        ]

mean value: 0.6333333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.25 0.25 0.5  0.5  0.25 0.5  0.2  0.2  0.4  0.  ]

mean value: 0.305

key: train_recall
value: [0.92307692 0.92307692 0.97435897 0.8974359  0.8974359  0.8974359
 0.92105263 0.92105263 0.92105263 0.92307692]

mean value: 0.9199055330634278

key: test_roc_auc
value: [0.625      0.5625     0.6875     0.75       0.5625     0.6875
 0.6        0.52857143 0.55714286 0.5       ]

mean value: 0.6060714285714286

key: train_roc_auc
value: [0.96153846 0.96153846 0.98717949 0.94871795 0.94871795 0.94871795
 0.96052632 0.96052632 0.96052632 0.96153846]

mean value: 0.9599527665317139

key: test_jcc
value: [0.25       0.2        0.4        0.5        0.2        0.4
 0.2        0.16666667 0.28571429 0.        ]

mean value: 0.26023809523809527

key: train_jcc
value: [0.92307692 0.92307692 0.97435897 0.8974359  0.8974359  0.8974359
 0.92105263 0.92105263 0.92105263 0.92307692]

mean value: 0.9199055330634278

MCC on Blind test: 0.05

Accuracy on Blind test: 0.85

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.08681178 0.08099294 0.0833962  0.08435941 0.08145213 0.08759737
 0.08979297 0.08592725 0.08576846 0.07664156]

mean value: 0.08427400588989258

key: score_time
value: [0.00886655 0.00919652 0.00933671 0.00866079 0.00926757 0.00908661
 0.00926757 0.0088954  0.00944066 0.0094223 ]

mean value: 0.009144067764282227

key: test_mcc
value: [0.83666003 0.625      0.81649658 0.81649658 0.83666003 0.83666003
 0.65714286 0.84515425 0.65714286 0.81009259]

mean value: 0.7737505797772892

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91666667 0.83333333 0.91666667 0.91666667 0.91666667 0.91666667
 0.83333333 0.91666667 0.83333333 0.90909091]

mean value: 0.8909090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.75       0.85714286 0.85714286 0.88888889 0.88888889
 0.8        0.90909091 0.8        0.85714286]

mean value: 0.8497186147186148

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        0.75       1.         1.         0.8        0.8
 0.8        0.83333333 0.8        1.        ]

mean value: 0.8583333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.75 1.   1.   0.8  1.   0.8  0.75]

mean value: 0.86

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.8125     0.875      0.875      0.9375     0.9375
 0.82857143 0.92857143 0.82857143 0.875     ]

mean value: 0.8835714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.6        0.75       0.75       0.8        0.8
 0.66666667 0.83333333 0.66666667 0.75      ]

mean value: 0.7416666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.82

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00962162 0.01063228 0.01091313 0.01079535 0.01196051 0.0170722
 0.01151824 0.01096225 0.02570939 0.01203942]

mean value: 0.01312243938446045

key: score_time
value: [0.01107264 0.01098132 0.01062155 0.01118159 0.01129007 0.01117349
 0.01122856 0.01094556 0.01141524 0.01125026]

mean value: 0.01111602783203125

key: test_mcc
value: [0.         0.         0.         0.         0.         0.
 0.         0.07559289 0.         0.        ]

mean value: 0.007559289460184544

key: train_mcc
value: [0.32183783 0.32183783 0.32183783 0.18223949 0.26021572 0.26021572
 0.32843368 0.32843368 0.29834424 0.29306141]

mean value: 0.2916457442021758

key: test_accuracy
value: [0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.66666667
 0.58333333 0.58333333 0.58333333 0.63636364]

mean value: 0.6386363636363637

key: train_accuracy
value: [0.69158879 0.69158879 0.69158879 0.65420561 0.6728972  0.6728972
 0.70093458 0.70093458 0.69158879 0.68518519]

mean value: 0.6853409484250605

key: test_fscore
value: [0.         0.         0.         0.         0.         0.
 0.         0.28571429 0.         0.        ]

mean value: 0.028571428571428574

key: train_fscore
value: [0.26666667 0.26666667 0.26666667 0.09756098 0.18604651 0.18604651
 0.27272727 0.27272727 0.23255814 0.22727273]

mean value: 0.22749394111277266

key: test_precision
value: [0.  0.  0.  0.  0.  0.  0.  0.5 0.  0. ]

mean value: 0.05

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.  0.  0.  0.  0.  0.  0.  0.2 0.  0. ]

mean value: 0.02

key: train_recall
value: [0.15384615 0.15384615 0.15384615 0.05128205 0.1025641  0.1025641
 0.15789474 0.15789474 0.13157895 0.12820513]

mean value: 0.12935222672064778

key: test_roc_auc
value: [0.5        0.5        0.5        0.5        0.5        0.5
 0.5        0.52857143 0.5        0.5       ]

mean value: 0.5028571428571429

key: train_roc_auc
value: [0.57692308 0.57692308 0.57692308 0.52564103 0.55128205 0.55128205
 0.57894737 0.57894737 0.56578947 0.56410256]

mean value: 0.5646761133603239

key: test_jcc
value: [0.         0.         0.         0.         0.         0.
 0.         0.16666667 0.         0.        ]

mean value: 0.016666666666666666

key: train_jcc
value: [0.15384615 0.15384615 0.15384615 0.05128205 0.1025641  0.1025641
 0.15789474 0.15789474 0.13157895 0.12820513]

mean value: 0.12935222672064778

MCC on Blind test: -0.02

Accuracy on Blind test: 0.95

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0105691  0.01015902 0.00814056 0.00781894 0.00773811 0.00766277
 0.00809789 0.0085144  0.00822377 0.008322  ]

mean value: 0.008524656295776367

key: score_time
value: [0.01082993 0.00936484 0.00863528 0.00822663 0.00833321 0.00819302
 0.0086484  0.00855327 0.00863814 0.00825906]

mean value: 0.008768177032470703

key: test_mcc
value: [0.63245553 0.40824829 0.35355339 1.         0.625      0.70710678
 0.68313005 0.83666003 0.31428571 0.69006556]

mean value: 0.6250505345503478

key: train_mcc
value: [0.79826546 0.8375252  0.89876312 0.85818605 0.85972678 0.87895928
 0.81760898 0.83676583 0.89756105 0.83946488]

mean value: 0.8522826622791292

key: test_accuracy
value: [0.83333333 0.75       0.66666667 1.         0.83333333 0.83333333
 0.83333333 0.91666667 0.66666667 0.81818182]

mean value: 0.8151515151515152

key: train_accuracy
value: [0.90654206 0.92523364 0.95327103 0.93457944 0.93457944 0.94392523
 0.91588785 0.92523364 0.95327103 0.92592593]

mean value: 0.9318449290411908

key: test_fscore
value: [0.66666667 0.57142857 0.6        1.         0.75       0.8
 0.75       0.88888889 0.6        0.8       ]

mean value: 0.7426984126984127

key: train_fscore
value: [0.87179487 0.89473684 0.93506494 0.90909091 0.91139241 0.92307692
 0.88311688 0.89473684 0.93333333 0.8974359 ]

mean value: 0.905377984218757

key: test_precision
value: [1.         0.66666667 0.5        1.         0.75       0.66666667
 1.         1.         0.6        0.66666667]

mean value: 0.785

key: train_precision
value: [0.87179487 0.91891892 0.94736842 0.92105263 0.9        0.92307692
 0.87179487 0.89473684 0.94594595 0.8974359 ]

mean value: 0.9092125323704271

key: test_recall
value: [0.5  0.5  0.75 1.   0.75 1.   0.6  0.8  0.6  1.  ]

mean value: 0.75

key: train_recall
value: [0.87179487 0.87179487 0.92307692 0.8974359  0.92307692 0.92307692
 0.89473684 0.89473684 0.92105263 0.8974359 ]

mean value: 0.9018218623481782

key: test_roc_auc
value: [0.75       0.6875     0.6875     1.         0.8125     0.875
 0.8        0.9        0.65714286 0.85714286]

mean value: 0.8026785714285715

key: train_roc_auc
value: [0.89913273 0.91383861 0.94683258 0.92665913 0.9321267  0.93947964
 0.91113654 0.91838291 0.94603356 0.91973244]

mean value: 0.9253354836037566

key: test_jcc
value: [0.5        0.4        0.42857143 1.         0.6        0.66666667
 0.6        0.8        0.42857143 0.66666667]

mean value: 0.6090476190476191

key: train_jcc
value: [0.77272727 0.80952381 0.87804878 0.83333333 0.8372093  0.85714286
 0.79069767 0.80952381 0.875      0.81395349]

mean value: 0.8277160327855166

MCC on Blind test: 0.08

Accuracy on Blind test: 0.74

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.07413816 0.06058931 0.06105781 0.06061363 0.0620904  0.06173635
 0.06109071 0.06248355 0.06057048 0.06052804]

mean value: 0.06248984336853027

key: score_time
value: [0.00838947 0.00880098 0.00829411 0.0082829  0.00848746 0.00843048
 0.00848484 0.00820589 0.00828862 0.00822687]

mean value: 0.008389163017272949

key: test_mcc
value: [0.63245553 0.40824829 0.35355339 1.         0.625      0.70710678
 0.68313005 0.83666003 0.31428571 0.69006556]

mean value: 0.6250505345503478

key: train_mcc
value: [0.79826546 0.8375252  0.89876312 0.85818605 0.85972678 0.87895928
 0.81760898 0.83676583 0.89756105 0.83946488]

mean value: 0.8522826622791292

key: test_accuracy
value: [0.83333333 0.75       0.66666667 1.         0.83333333 0.83333333
 0.83333333 0.91666667 0.66666667 0.81818182]

mean value: 0.8151515151515152

key: train_accuracy
value: [0.90654206 0.92523364 0.95327103 0.93457944 0.93457944 0.94392523
 0.91588785 0.92523364 0.95327103 0.92592593]

mean value: 0.9318449290411908

key: test_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.66666667 0.57142857 0.6        1.         0.75       0.8
 0.75       0.88888889 0.6        0.8       ]

mean value: 0.7426984126984127

key: train_fscore
value: [0.87179487 0.89473684 0.93506494 0.90909091 0.91139241 0.92307692
 0.88311688 0.89473684 0.93333333 0.8974359 ]

mean value: 0.905377984218757

key: test_precision
value: [1.         0.66666667 0.5        1.         0.75       0.66666667
 1.         1.         0.6        0.66666667]

mean value: 0.785

key: train_precision
value: [0.87179487 0.91891892 0.94736842 0.92105263 0.9        0.92307692
 0.87179487 0.89473684 0.94594595 0.8974359 ]

mean value: 0.9092125323704271

key: test_recall
value: [0.5  0.5  0.75 1.   0.75 1.   0.6  0.8  0.6  1.  ]

mean value: 0.75

key: train_recall
value: [0.87179487 0.87179487 0.92307692 0.8974359  0.92307692 0.92307692
 0.89473684 0.89473684 0.92105263 0.8974359 ]

mean value: 0.9018218623481782

key: test_roc_auc
value: [0.75       0.6875     0.6875     1.         0.8125     0.875
 0.8        0.9        0.65714286 0.85714286]

mean value: 0.8026785714285715

key: train_roc_auc
value: [0.89913273 0.91383861 0.94683258 0.92665913 0.9321267  0.93947964
 0.91113654 0.91838291 0.94603356 0.91973244]

mean value: 0.9253354836037566

key: test_jcc
value: [0.5        0.4        0.42857143 1.         0.6        0.66666667
 0.6        0.8        0.42857143 0.66666667]

mean value: 0.6090476190476191

key: train_jcc
value: [0.77272727 0.80952381 0.87804878 0.83333333 0.8372093  0.85714286
 0.79069767 0.80952381 0.875      0.81395349]

mean value: 0.8277160327855166

MCC on Blind test: 0.08

Accuracy on Blind test: 0.74

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01721478 0.01214457 0.01244617 0.01377439 0.01370144 0.01362348
 0.01294899 0.01230645 0.01298475 0.0133431 ]

mean value: 0.013448810577392578

key: score_time
value: [0.01065063 0.00836754 0.00845647 0.00828004 0.00818396 0.00835776
 0.00841212 0.0085628  0.00873351 0.00875807]

mean value: 0.008676290512084961

key: test_mcc
value: [0.8819171  0.5        0.37796447 0.875      1.         0.60714286
 0.76376262 1.         0.64465837 0.60714286]

mean value: 0.7257588278029415

key: train_mcc
value: [0.79411765 0.85331034 0.79599234 0.76678748 0.81031543 0.82480818
 0.81031543 0.82480818 0.79688349 0.85400682]

mean value: 0.8131345350406455

key: test_accuracy
value: [0.9375     0.75       0.66666667 0.93333333 1.         0.8
 0.86666667 1.         0.8        0.8       ]

mean value: 0.8554166666666667

key: train_accuracy
value: [0.89705882 0.92647059 0.89781022 0.88321168 0.90510949 0.91240876
 0.90510949 0.91240876 0.89781022 0.9270073 ]

mean value: 0.9064405324173466

key: test_fscore
value: [0.93333333 0.75       0.70588235 0.93333333 1.         0.8
 0.85714286 1.         0.84210526 0.8       ]

mean value: 0.8621797139908595

key: train_fscore
value: [0.89705882 0.92537313 0.89705882 0.88235294 0.90510949 0.91304348
 0.90510949 0.91176471 0.89393939 0.92647059]

mean value: 0.9057280866983752

key: test_precision
value: [1.         0.75       0.6        0.875      1.         0.75
 1.         1.         0.72727273 0.85714286]

mean value: 0.8559415584415584

key: train_precision
value: [0.89705882 0.93939394 0.91044776 0.89552239 0.91176471 0.91304348
 0.89855072 0.91176471 0.921875   0.92647059]

mean value: 0.9125892115075633

key: test_recall
value: [0.875      0.75       0.85714286 1.         1.         0.85714286
 0.75       1.         1.         0.75      ]

mean value: 0.8839285714285714

key: train_recall
value: [0.89705882 0.91176471 0.88405797 0.86956522 0.89855072 0.91304348
 0.91176471 0.91176471 0.86764706 0.92647059]

mean value: 0.8991687979539642

key: test_roc_auc
value: [0.9375     0.75       0.67857143 0.9375     1.         0.80357143
 0.875      1.         0.78571429 0.80357143]

mean value: 0.8571428571428571

key: train_roc_auc
value: [0.89705882 0.92647059 0.89791134 0.88331202 0.90515772 0.91240409
 0.90515772 0.91240409 0.89759165 0.92700341]

mean value: 0.9064471440750212

key: test_jcc
value: [0.875      0.6        0.54545455 0.875      1.         0.66666667
 0.75       1.         0.72727273 0.66666667]

mean value: 0.7706060606060606

key: train_jcc
value: [0.81333333 0.86111111 0.81333333 0.78947368 0.82666667 0.84
 0.82666667 0.83783784 0.80821918 0.8630137 ]

mean value: 0.8279655509871804

MCC on Blind test: 0.11

Accuracy on Blind test: 0.65

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.39265418 0.37149286 0.37123227 0.37943006 0.37867284 0.38540936
 0.36591649 0.36693406 0.37429166 0.37283516]

mean value: 0.3758868932723999

key: score_time
value: [0.00942802 0.0091536  0.0086658  0.00927925 0.00937963 0.00886846
 0.00882983 0.00927758 0.009166   0.00868988]

mean value: 0.009073805809020997

key: test_mcc
value: [0.8819171  0.62994079 0.49099025 0.76376262 0.73214286 0.60714286
 0.6000992  1.         0.64465837 0.33928571]

mean value: 0.6689939758834577

key: train_mcc
value: [0.91215932 0.92657079 0.94201665 0.8978896  0.97080136 0.97122151
 0.94201665 0.88320546 1.         0.95630861]

mean value: 0.9402189943658086

key: test_accuracy
value: [0.9375     0.8125     0.73333333 0.86666667 0.86666667 0.8
 0.8        1.         0.8        0.66666667]

mean value: 0.8283333333333334

key: train_accuracy
value: [0.95588235 0.96323529 0.97080292 0.94890511 0.98540146 0.98540146
 0.97080292 0.94160584 1.         0.97810219]

mean value: 0.9700139544869043

key: test_fscore
value: [0.93333333 0.82352941 0.75       0.875      0.85714286 0.8
 0.82352941 1.         0.84210526 0.66666667]

mean value: 0.8371306943830163

key: train_fscore
value: [0.95652174 0.96350365 0.97058824 0.94964029 0.98550725 0.98529412
 0.97101449 0.94117647 1.         0.97810219]

mean value: 0.9701348428976124

key: test_precision
value: [1.         0.77777778 0.66666667 0.77777778 0.85714286 0.75
 0.77777778 1.         0.72727273 0.71428571]

mean value: 0.8048701298701298

key: train_precision
value: [0.94285714 0.95652174 0.98507463 0.94285714 0.98550725 1.
 0.95714286 0.94117647 1.         0.97101449]

mean value: 0.968215171857192

key: test_recall
value: [0.875      0.875      0.85714286 1.         0.85714286 0.85714286
 0.875      1.         1.         0.625     ]

mean value: 0.8821428571428571

key: train_recall
value: [0.97058824 0.97058824 0.95652174 0.95652174 0.98550725 0.97101449
 0.98529412 0.94117647 1.         0.98529412]

mean value: 0.9722506393861893

key: test_roc_auc
value: [0.9375     0.8125     0.74107143 0.875      0.86607143 0.80357143
 0.79464286 1.         0.78571429 0.66964286]

mean value: 0.8285714285714286

key: train_roc_auc
value: [0.95588235 0.96323529 0.97090793 0.9488491  0.98540068 0.98550725
 0.97090793 0.94160273 1.         0.97815431]

mean value: 0.9700447570332481

key: test_jcc
value: [0.875      0.7        0.6        0.77777778 0.75       0.66666667
 0.7        1.         0.72727273 0.5       ]

mean value: 0.7296717171717172

key: train_jcc
value: [0.91666667 0.92957746 0.94285714 0.90410959 0.97142857 0.97101449
 0.94366197 0.88888889 1.         0.95714286]

mean value: 0.9425347645398564

MCC on Blind test: 0.07

Accuracy on Blind test: 0.72

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00969172 0.00904489 0.00695467 0.00679207 0.00658846 0.00662065
 0.00658751 0.00682616 0.00665498 0.00697279]

mean value: 0.007273387908935547

key: score_time
value: [0.01047778 0.01015592 0.00812674 0.0078907  0.00783539 0.00783157
 0.00781465 0.00791001 0.0078311  0.00797391]

mean value: 0.008384776115417481

key: test_mcc
value: [0.8819171  0.5        0.33928571 0.56407607 0.49099025 0.60714286
 0.46428571 0.73214286 0.64465837 0.07142857]

mean value: 0.5295927517042964

key: train_mcc
value: [0.61098829 0.74337629 0.6462903  0.59999905 0.55137884 0.71313464
 0.65613085 0.71021843 0.63063055 0.63867147]

mean value: 0.6500818694571209

key: test_accuracy
value: [0.9375     0.75       0.66666667 0.73333333 0.73333333 0.8
 0.73333333 0.86666667 0.8        0.53333333]

mean value: 0.7554166666666666

key: train_accuracy
value: [0.80147059 0.86764706 0.81751825 0.79562044 0.76642336 0.8540146
 0.81751825 0.84671533 0.81021898 0.81021898]

mean value: 0.8187365822241305

key: test_fscore
value: [0.94117647 0.75       0.66666667 0.77777778 0.75       0.8
 0.75       0.875      0.84210526 0.53333333]

mean value: 0.7686059511523908

key: train_fscore
value: [0.81632653 0.87671233 0.83443709 0.81333333 0.79487179 0.84615385
 0.83660131 0.82644628 0.82432432 0.82894737]

mean value: 0.8298154200757712

key: test_precision
value: [0.88888889 0.75       0.625      0.63636364 0.66666667 0.75
 0.75       0.875      0.72727273 0.57142857]

mean value: 0.7240620490620491

key: train_precision
value: [0.75949367 0.82051282 0.76829268 0.75308642 0.71264368 0.90163934
 0.75294118 0.94339623 0.7625     0.75      ]

mean value: 0.7924506019387709

key: test_recall
value: [1.         0.75       0.71428571 1.         0.85714286 0.85714286
 0.75       0.875      1.         0.5       ]

mean value: 0.8303571428571428

key: train_recall
value: [0.88235294 0.94117647 0.91304348 0.88405797 0.89855072 0.79710145
 0.94117647 0.73529412 0.89705882 0.92647059]

mean value: 0.8816283034953112

key: test_roc_auc
value: [0.9375     0.75       0.66964286 0.75       0.74107143 0.80357143
 0.73214286 0.86607143 0.78571429 0.53571429]

mean value: 0.7571428571428571

key: train_roc_auc
value: [0.80147059 0.86764706 0.81681586 0.79497016 0.76545183 0.85443308
 0.81841432 0.84590793 0.81084825 0.81106138]

mean value: 0.8187020460358057

key: test_jcc
value: [0.88888889 0.6        0.5        0.63636364 0.6        0.66666667
 0.6        0.77777778 0.72727273 0.36363636]

mean value: 0.636060606060606

key: train_jcc
value: [0.68965517 0.7804878  0.71590909 0.68539326 0.65957447 0.73333333
 0.71910112 0.70422535 0.70114943 0.70786517]

mean value: 0.7096694197581203

MCC on Blind test: 0.03

Accuracy on Blind test: 0.49

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00759959 0.00745654 0.0069356  0.00687218 0.00700045 0.00688291
 0.0068655  0.00682497 0.00688171 0.00696039]

mean value: 0.007027983665466309

key: score_time
value: [0.00797582 0.00790715 0.00785089 0.0078764  0.00810456 0.00792956
 0.0079031  0.00789905 0.00793099 0.00804949]

mean value: 0.007942700386047363

key: test_mcc
value: [0.37796447 0.25819889 0.07142857 0.49099025 0.47245559 0.13363062
 0.46428571 0.73214286 0.33928571 0.32732684]

mean value: 0.36677095205019633

key: train_mcc
value: [0.5008673  0.53311399 0.52059257 0.45151662 0.49006025 0.5360985
 0.52559229 0.51215762 0.49197671 0.53517487]

mean value: 0.5097150730382196

key: test_accuracy
value: [0.6875     0.625      0.53333333 0.73333333 0.73333333 0.53333333
 0.73333333 0.86666667 0.66666667 0.66666667]

mean value: 0.6779166666666666

key: train_accuracy
value: [0.75       0.76470588 0.75912409 0.72262774 0.74452555 0.76642336
 0.75912409 0.75182482 0.74452555 0.76642336]

mean value: 0.7529304422498926

key: test_fscore
value: [0.70588235 0.57142857 0.53333333 0.75       0.66666667 0.63157895
 0.75       0.875      0.66666667 0.70588235]

mean value: 0.6856438891346012

key: train_fscore
value: [0.75714286 0.77777778 0.77241379 0.74666667 0.75524476 0.78082192
 0.7755102  0.77027027 0.75524476 0.77464789]

mean value: 0.7665740884664326

key: test_precision
value: [0.66666667 0.66666667 0.5        0.66666667 0.8        0.5
 0.75       0.875      0.71428571 0.66666667]

mean value: 0.680595238095238

key: train_precision
value: [0.73611111 0.73684211 0.73684211 0.69135802 0.72972973 0.74025974
 0.72151899 0.7125     0.72       0.74324324]

mean value: 0.726840504690327

key: test_recall
value: [0.75       0.5        0.57142857 0.85714286 0.57142857 0.85714286
 0.75       0.875      0.625      0.75      ]

mean value: 0.7107142857142857

key: train_recall
value: [0.77941176 0.82352941 0.8115942  0.8115942  0.7826087  0.82608696
 0.83823529 0.83823529 0.79411765 0.80882353]

mean value: 0.8114236999147485

key: test_roc_auc
value: [0.6875     0.625      0.53571429 0.74107143 0.72321429 0.55357143
 0.73214286 0.86607143 0.66964286 0.66071429]

mean value: 0.6794642857142857

key: train_roc_auc
value: [0.75       0.76470588 0.75873828 0.72197357 0.74424552 0.76598465
 0.75969736 0.75245098 0.74488491 0.76673061]

mean value: 0.7529411764705882

key: test_jcc
value: [0.54545455 0.4        0.36363636 0.6        0.5        0.46153846
 0.6        0.77777778 0.5        0.54545455]

mean value: 0.5293861693861693

key: train_jcc
value: [0.6091954  0.63636364 0.62921348 0.59574468 0.60674157 0.64044944
 0.63333333 0.62637363 0.60674157 0.63218391]

mean value: 0.6216340654682218

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00746417 0.00665522 0.00723457 0.00721335 0.00727034 0.00734687
 0.00666738 0.00743985 0.00722885 0.00735736]

mean value: 0.007187795639038086

key: score_time
value: [0.009269   0.00891018 0.00945759 0.00945568 0.00962353 0.01024604
 0.00979686 0.00947714 0.00947309 0.00945425]

mean value: 0.009516334533691407

key: test_mcc
value: [0.51639778 0.25819889 0.33928571 0.66143783 0.76376262 0.60714286
 0.37796447 0.75592895 0.64465837 0.47245559]

mean value: 0.5397233065771696

key: train_mcc
value: [0.63242133 0.69486799 0.73721228 0.640228   0.69398264 0.64981886
 0.69976319 0.63512361 0.69352089 0.63574336]

mean value: 0.6712682142948946

key: test_accuracy
value: [0.75       0.625      0.66666667 0.8        0.86666667 0.8
 0.66666667 0.86666667 0.8        0.73333333]

mean value: 0.7575000000000001

key: train_accuracy
value: [0.81617647 0.84558824 0.86861314 0.81751825 0.84671533 0.82481752
 0.84671533 0.81751825 0.84671533 0.81751825]

mean value: 0.8347896092743666

key: test_fscore
value: [0.77777778 0.57142857 0.66666667 0.82352941 0.875      0.8
 0.61538462 0.88888889 0.84210526 0.77777778]

mean value: 0.7638558972846898

key: train_fscore
value: [0.81751825 0.85314685 0.86956522 0.82993197 0.85106383 0.82857143
 0.85517241 0.81751825 0.84671533 0.82014388]

mean value: 0.8389347425188644

key: test_precision
value: [0.7        0.66666667 0.625      0.7        0.77777778 0.75
 0.8        0.8        0.72727273 0.7       ]

mean value: 0.7246717171717172

key: train_precision
value: [0.8115942  0.81333333 0.86956522 0.78205128 0.83333333 0.81690141
 0.80519481 0.8115942  0.84057971 0.8028169 ]

mean value: 0.8186964397105242

key: test_recall
value: [0.875      0.5        0.71428571 1.         1.         0.85714286
 0.5        1.         1.         0.875     ]

mean value: 0.8321428571428572

key: train_recall
value: [0.82352941 0.89705882 0.86956522 0.88405797 0.86956522 0.84057971
 0.91176471 0.82352941 0.85294118 0.83823529]

mean value: 0.8610826939471441

key: test_roc_auc
value: [0.75       0.625      0.66964286 0.8125     0.875      0.80357143
 0.67857143 0.85714286 0.78571429 0.72321429]

mean value: 0.7580357142857143

key: train_roc_auc
value: [0.81617647 0.84558824 0.86860614 0.81702899 0.84654731 0.82470162
 0.8471867  0.81756181 0.84676044 0.81766837]

mean value: 0.8347826086956521

key: test_jcc
value: [0.63636364 0.4        0.5        0.7        0.77777778 0.66666667
 0.44444444 0.8        0.72727273 0.63636364]

mean value: 0.6288888888888888

key: train_jcc
value: [0.69135802 0.74390244 0.76923077 0.70930233 0.74074074 0.70731707
 0.74698795 0.69135802 0.73417722 0.69512195]

mean value: 0.7229496515347358

MCC on Blind test: 0.05

Accuracy on Blind test: 0.61

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0097928  0.00793552 0.0076406  0.00769567 0.00771952 0.00770187
 0.00762033 0.00775814 0.00766444 0.00767016]

mean value: 0.007919907569885254

key: score_time
value: [0.00912237 0.00797772 0.00790691 0.00800848 0.00794578 0.00804377
 0.00801921 0.00795007 0.00798464 0.00796342]

mean value: 0.008092236518859864

key: test_mcc
value: [0.75       0.5        0.19642857 0.76376262 0.73214286 0.73214286
 0.66143783 1.         0.64465837 0.60714286]

mean value: 0.6587715957669568

key: train_mcc
value: [0.79446135 0.76470588 0.78182997 0.82480818 0.79590547 0.79590547
 0.79560955 0.79560955 0.78298457 0.78107015]

mean value: 0.7912890152297882

key: test_accuracy
value: [0.875      0.75       0.6        0.86666667 0.86666667 0.86666667
 0.8        1.         0.8        0.8       ]

mean value: 0.8225

key: train_accuracy
value: [0.89705882 0.88235294 0.89051095 0.91240876 0.89781022 0.89781022
 0.89781022 0.89781022 0.89051095 0.89051095]

mean value: 0.8954594246457708

key: test_fscore
value: [0.875      0.75       0.57142857 0.875      0.85714286 0.85714286
 0.76923077 1.         0.84210526 0.8       ]

mean value: 0.819705031810295

key: train_fscore
value: [0.89552239 0.88235294 0.88888889 0.91304348 0.9        0.9
 0.89705882 0.89705882 0.88549618 0.88888889]

mean value: 0.894831041553975

key: test_precision
value: [0.875      0.75       0.57142857 0.77777778 0.85714286 0.85714286
 1.         1.         0.72727273 0.85714286]

mean value: 0.8272907647907648

key: train_precision
value: [0.90909091 0.88235294 0.90909091 0.91304348 0.88732394 0.88732394
 0.89705882 0.89705882 0.92063492 0.89552239]

mean value: 0.8998501080696548

key: test_recall
value: [0.875      0.75       0.57142857 1.         0.85714286 0.85714286
 0.625      1.         1.         0.75      ]

mean value: 0.8285714285714285

key: train_recall
value: [0.88235294 0.88235294 0.86956522 0.91304348 0.91304348 0.91304348
 0.89705882 0.89705882 0.85294118 0.88235294]

mean value: 0.8902813299232737

key: test_roc_auc
value: [0.875      0.75       0.59821429 0.875      0.86607143 0.86607143
 0.8125     1.         0.78571429 0.80357143]

mean value: 0.8232142857142857

key: train_roc_auc
value: [0.89705882 0.88235294 0.89066496 0.91240409 0.89769821 0.89769821
 0.89780477 0.89780477 0.8902387  0.89045183]

mean value: 0.8954177323103154

key: test_jcc
value: [0.77777778 0.6        0.4        0.77777778 0.75       0.75
 0.625      1.         0.72727273 0.66666667]

mean value: 0.7074494949494949

key: train_jcc
value: [0.81081081 0.78947368 0.8        0.84       0.81818182 0.81818182
 0.81333333 0.81333333 0.79452055 0.8       ]

mean value: 0.8097835345996846

MCC on Blind test: 0.11

Accuracy on Blind test: 0.65

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.61664748 0.47898817 0.45502543 0.50637126 0.53669834 0.53992033
 0.47287774 0.47633958 0.47858143 0.62103176]

mean value: 0.5182481527328491

key: score_time
value: [0.01329303 0.01312971 0.01097465 0.01341534 0.01492548 0.01332402
 0.01094341 0.01338291 0.01885128 0.01098609]

mean value: 0.013322591781616211

key: test_mcc
value: [0.8819171  0.51639778 0.37796447 1.         0.60714286 0.60714286
 0.46428571 0.87287156 0.64465837 0.60714286]

mean value: 0.6579523574070305

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.75       0.66666667 1.         0.8        0.8
 0.73333333 0.93333333 0.8        0.8       ]

mean value: 0.8220833333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.71428571 0.70588235 1.         0.8        0.8
 0.75       0.94117647 0.84210526 0.8       ]

mean value: 0.8286783134306354

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.83333333 0.6        1.         0.75       0.75
 0.75       0.88888889 0.72727273 0.85714286]

mean value: 0.8156637806637806

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.625      0.85714286 1.         0.85714286 0.85714286
 0.75       1.         1.         0.75      ]

mean value: 0.8571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.75       0.67857143 1.         0.80357143 0.80357143
 0.73214286 0.92857143 0.78571429 0.80357143]

mean value: 0.8223214285714285

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.55555556 0.54545455 1.         0.66666667 0.66666667
 0.6        0.88888889 0.72727273 0.66666667]

mean value: 0.7192171717171717

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.68

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02302074 0.00763321 0.00718451 0.00732636 0.00716519 0.00725317
 0.00720024 0.00724506 0.00735903 0.00749445]

mean value: 0.00888819694519043

key: score_time
value: [0.01008129 0.00808263 0.00788021 0.00784898 0.00779343 0.00776839
 0.00773787 0.00773025 0.00829577 0.00780368]

mean value: 0.00810225009918213

key: test_mcc
value: [0.8819171  1.         1.         1.         0.6000992  0.73214286
 0.87287156 0.75592895 0.73214286 0.56407607]

mean value: 0.8139178597903081

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     1.         1.         1.         0.8        0.86666667
 0.93333333 0.86666667 0.86666667 0.73333333]

mean value: 0.9004166666666666

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 1.         1.         1.         0.76923077 0.85714286
 0.94117647 0.88888889 0.875      0.66666667]

mean value: 0.8939282123105652

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 1.         1.         1.         0.83333333 0.85714286
 0.88888889 0.8        0.875      1.        ]

mean value: 0.9143253968253968

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.71428571 0.85714286
 1.         1.         0.875      0.5       ]

mean value: 0.8946428571428572

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     1.         1.         1.         0.79464286 0.86607143
 0.92857143 0.85714286 0.86607143 0.75      ]

mean value: 0.9

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 1.         1.         1.         0.625      0.75
 0.88888889 0.8        0.77777778 0.5       ]

mean value: 0.8230555555555555

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.07873535 0.07909012 0.07848859 0.07919955 0.07896852 0.07857132
 0.08103371 0.08165836 0.07918024 0.08250403]

mean value: 0.07974298000335693

key: score_time
value: [0.01622057 0.01643443 0.01677704 0.01642728 0.01640582 0.01631761
 0.01749635 0.01630569 0.01675391 0.01715064]

mean value: 0.01662893295288086

key: test_mcc
value: [0.8819171  0.51639778 0.49099025 1.         0.875      0.73214286
 0.76376262 1.         0.75592895 0.875     ]

mean value: 0.7891139555200787

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.75       0.73333333 1.         0.93333333 0.86666667
 0.86666667 1.         0.86666667 0.93333333]

mean value: 0.88875

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.71428571 0.75       1.         0.93333333 0.85714286
 0.85714286 1.         0.88888889 0.93333333]

mean value: 0.8867460317460317

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.83333333 0.66666667 1.         0.875      0.85714286
 1.         1.         0.8        1.        ]

mean value: 0.9032142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.625      0.85714286 1.         1.         0.85714286
 0.75       1.         1.         0.875     ]

mean value: 0.8839285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.75       0.74107143 1.         0.9375     0.86607143
 0.875      1.         0.85714286 0.9375    ]

mean value: 0.8901785714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.55555556 0.6        1.         0.875      0.75
 0.75       1.         0.8        0.875     ]

mean value: 0.8080555555555555

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.68

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00679111 0.00661206 0.00676751 0.00668883 0.00663257 0.00666237
 0.00663257 0.00670314 0.00696945 0.00672388]

mean value: 0.006718349456787109

key: score_time
value: [0.00769448 0.00768995 0.00776768 0.00772476 0.00775385 0.00774527
 0.00774169 0.00774693 0.00784659 0.00774169]

mean value: 0.007745289802551269

key: test_mcc
value: [0.40451992 0.40451992 0.32732684 1.         0.76376262 0.46428571
 0.13363062 0.87287156 0.73214286 0.21821789]

mean value: 0.5321277929700597

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.6875     0.6875     0.66666667 1.         0.86666667 0.73333333
 0.53333333 0.93333333 0.86666667 0.6       ]

mean value: 0.7575

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.73684211 0.61538462 0.61538462 1.         0.875      0.71428571
 0.36363636 0.94117647 0.875      0.57142857]

mean value: 0.7308138455971274

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.63636364 0.8        0.66666667 1.         0.77777778 0.71428571
 0.66666667 0.88888889 0.875      0.66666667]

mean value: 0.7692316017316018

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.5        0.57142857 1.         1.         0.71428571
 0.25       1.         0.875      0.5       ]

mean value: 0.7285714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.6875     0.6875     0.66071429 1.         0.875      0.73214286
 0.55357143 0.92857143 0.86607143 0.60714286]

mean value: 0.7598214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.58333333 0.44444444 0.44444444 1.         0.77777778 0.55555556
 0.22222222 0.88888889 0.77777778 0.4       ]

mean value: 0.6094444444444445

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.68

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [0.98909354 0.98514724 1.04687738 0.98289633 0.98306084 0.98102474
 0.9808023  0.98257184 0.98120975 0.97967005]

mean value: 0.9892354011535645

key: score_time
value: [0.09175563 0.08826041 0.08760238 0.08777761 0.08745551 0.08774495
 0.08790946 0.08762598 0.08742118 0.08845329]

mean value: 0.08820064067840576

key: test_mcc
value: [0.8819171  0.75       0.76376262 1.         0.875      0.73214286
 0.60714286 0.87287156 0.87287156 0.76376262]

mean value: 0.8119471171513797

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.86666667 1.         0.93333333 0.86666667
 0.8        0.93333333 0.93333333 0.86666667]

mean value: 0.90125

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.875      0.875      1.         0.93333333 0.85714286
 0.8        0.94117647 0.94117647 0.85714286]

mean value: 0.9013305322128852

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.77777778 1.         0.875      0.85714286
 0.85714286 0.88888889 0.88888889 1.        ]

mean value: 0.901984126984127

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 0.75       1.         1.         0.75      ]

mean value: 0.9107142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      0.875      1.         0.9375     0.86607143
 0.80357143 0.92857143 0.92857143 0.875     ]

mean value: 0.9026785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.77777778 0.77777778 1.         0.875      0.75
 0.66666667 0.88888889 0.88888889 0.75      ]

mean value: 0.825

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.8277626  0.8267982  0.83943486 0.95936847 0.89719224 0.9292078
 0.87691498 0.90619445 0.85252666 0.84288502]

mean value: 0.8758285284042359

key: score_time
value: [0.23116565 0.20367575 0.20599627 0.15598726 0.19488597 0.1595974
 0.24725604 0.22806668 0.2303443  0.21640897]

mean value: 0.20733842849731446

key: test_mcc
value: [0.8819171  0.75       0.76376262 1.         0.875      0.73214286
 0.60714286 0.87287156 0.87287156 0.66143783]

mean value: 0.8017146383453971

key: train_mcc
value: [0.98540068 0.98540068 0.95630861 0.98550418 0.98550418 0.98550418
 0.98550418 0.97080136 0.97080136 0.98550418]

mean value: 0.9796233587390223

key: test_accuracy
value: [0.9375     0.875      0.86666667 1.         0.93333333 0.86666667
 0.8        0.93333333 0.93333333 0.8       ]

mean value: 0.8945833333333334

key: train_accuracy
value: [0.99264706 0.99264706 0.97810219 0.99270073 0.99270073 0.99270073
 0.99270073 0.98540146 0.98540146 0.99270073]

mean value: 0.9897702876771146

key: test_fscore
value: [0.93333333 0.875      0.875      1.         0.93333333 0.85714286
 0.8        0.94117647 0.94117647 0.76923077]

mean value: 0.8925393234216764

key: train_fscore
value: [0.99259259 0.99259259 0.97810219 0.99280576 0.99280576 0.99280576
 0.99259259 0.98529412 0.98529412 0.99259259]

mean value: 0.9897478061632561

key: test_precision
value: [1.         0.875      0.77777778 1.         0.875      0.85714286
 0.85714286 0.88888889 0.88888889 1.        ]

mean value: 0.901984126984127

key: train_precision
value: [1.         1.         0.98529412 0.98571429 0.98571429 0.98571429
 1.         0.98529412 0.98529412 1.        ]

mean value: 0.9913025210084034

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 0.75       1.         1.         0.625     ]

mean value: 0.8982142857142857

key: train_recall
value: [0.98529412 0.98529412 0.97101449 1.         1.         1.
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9882779198635976

key: test_roc_auc
value: [0.9375     0.875      0.875      1.         0.9375     0.86607143
 0.80357143 0.92857143 0.92857143 0.8125    ]

mean value: 0.8964285714285715

key: train_roc_auc
value: [0.99264706 0.99264706 0.97815431 0.99264706 0.99264706 0.99264706
 0.99264706 0.98540068 0.98540068 0.99264706]

mean value: 0.9897485080988918

key: test_jcc
value: [0.875      0.77777778 0.77777778 1.         0.875      0.75
 0.66666667 0.88888889 0.88888889 0.625     ]

mean value: 0.8125

key: train_jcc
value: [0.98529412 0.98529412 0.95714286 0.98571429 0.98571429 0.98571429
 0.98529412 0.97101449 0.97101449 0.98529412]

mean value: 0.9797491170381196

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01690888 0.00677323 0.00677204 0.00677943 0.0067265  0.00685477
 0.00684571 0.00681353 0.00681353 0.00683665]

mean value: 0.00781242847442627

key: score_time
value: [0.01041579 0.00778389 0.00794005 0.00778031 0.00778699 0.00781918
 0.00782156 0.00780797 0.00778556 0.0078373 ]

mean value: 0.008077859878540039

key: test_mcc
value: [0.37796447 0.25819889 0.07142857 0.49099025 0.47245559 0.13363062
 0.46428571 0.73214286 0.33928571 0.32732684]

mean value: 0.36677095205019633

key: train_mcc
value: [0.5008673  0.53311399 0.52059257 0.45151662 0.49006025 0.5360985
 0.52559229 0.51215762 0.49197671 0.53517487]

mean value: 0.5097150730382196

key: test_accuracy
value: [0.6875     0.625      0.53333333 0.73333333 0.73333333 0.53333333
 0.73333333 0.86666667 0.66666667 0.66666667]

mean value: 0.6779166666666666

key: train_accuracy
value: [0.75       0.76470588 0.75912409 0.72262774 0.74452555 0.76642336
 0.75912409 0.75182482 0.74452555 0.76642336]

mean value: 0.7529304422498926

key: test_fscore
value: [0.70588235 0.57142857 0.53333333 0.75       0.66666667 0.63157895
 0.75       0.875      0.66666667 0.70588235]

mean value: 0.6856438891346012

key: train_fscore
value: [0.75714286 0.77777778 0.77241379 0.74666667 0.75524476 0.78082192
 0.7755102  0.77027027 0.75524476 0.77464789]

mean value: 0.7665740884664326

key: test_precision
value: [0.66666667 0.66666667 0.5        0.66666667 0.8        0.5
 0.75       0.875      0.71428571 0.66666667]

mean value: 0.680595238095238

key: train_precision
value: [0.73611111 0.73684211 0.73684211 0.69135802 0.72972973 0.74025974
 0.72151899 0.7125     0.72       0.74324324]

mean value: 0.726840504690327

key: test_recall
value: [0.75       0.5        0.57142857 0.85714286 0.57142857 0.85714286
 0.75       0.875      0.625      0.75      ]

mean value: 0.7107142857142857

key: train_recall
value: [0.77941176 0.82352941 0.8115942  0.8115942  0.7826087  0.82608696
 0.83823529 0.83823529 0.79411765 0.80882353]

mean value: 0.8114236999147485

key: test_roc_auc
value: [0.6875     0.625      0.53571429 0.74107143 0.72321429 0.55357143
 0.73214286 0.86607143 0.66964286 0.66071429]

mean value: 0.6794642857142857

key: train_roc_auc
value: [0.75       0.76470588 0.75873828 0.72197357 0.74424552 0.76598465
 0.75969736 0.75245098 0.74488491 0.76673061]

mean value: 0.7529411764705882

key: test_jcc
value: [0.54545455 0.4        0.36363636 0.6        0.5        0.46153846
 0.6        0.77777778 0.5        0.54545455]

mean value: 0.5293861693861693

key: train_jcc
value: [0.6091954  0.63636364 0.62921348 0.59574468 0.60674157 0.64044944
 0.63333333 0.62637363 0.60674157 0.63218391]

mean value: 0.6216340654682218

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09977555 0.03077435 0.03092337 0.03185725 0.03266478 0.20152545
 0.03012586 0.03033113 0.03158212 0.03259635]

mean value: 0.05521562099456787

key: score_time
value: [0.01020741 0.00965858 0.00987267 0.0099175  0.01043272 0.01017642
 0.00950527 0.0099225  0.00961161 0.00984406]

mean value: 0.009914875030517578

key: test_mcc
value: [1.         0.75       1.         1.         0.73214286 1.
 0.87287156 1.         0.87287156 0.76376262]

mean value: 0.8991648594856769

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      1.         1.         0.86666667 1.
 0.93333333 1.         0.93333333 0.86666667]

mean value: 0.9475

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.875      1.         1.         0.85714286 1.
 0.94117647 1.         0.94117647 0.85714286]

mean value: 0.9471638655462185

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      1.         1.         0.85714286 1.
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9509920634920634

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 1.
 1.         1.         1.         0.75      ]

mean value: 0.9482142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      1.         1.         0.86607143 1.
 0.92857143 1.         0.92857143 0.875     ]

mean value: 0.9473214285714285

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.77777778 1.         1.         0.75       1.
 0.88888889 1.         0.88888889 0.75      ]

mean value: 0.9055555555555556

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.00941396 0.01151013 0.01147294 0.01190257 0.0120573  0.01343966
 0.01201916 0.01195407 0.01195812 0.01198363]

mean value: 0.011771154403686524

key: score_time
value: [0.01016879 0.00986719 0.01031709 0.01051497 0.01036811 0.0106349
 0.01084495 0.01056862 0.01060319 0.01060867]

mean value: 0.010449647903442383

key: test_mcc
value: [1.         0.62994079 0.49099025 1.         0.875      0.73214286
 0.87287156 1.         0.75592895 0.75592895]

mean value: 0.811280335150343

key: train_mcc
value: [0.91215932 0.95681396 0.92944673 0.88466669 0.89863497 0.94199209
 0.90025835 0.9139999  0.91281179 0.87099729]

mean value: 0.9121781087453906

key: test_accuracy
value: [1.         0.8125     0.73333333 1.         0.93333333 0.86666667
 0.93333333 1.         0.86666667 0.86666667]

mean value: 0.90125

key: train_accuracy
value: [0.95588235 0.97794118 0.96350365 0.94160584 0.94890511 0.97080292
 0.94890511 0.95620438 0.95620438 0.93430657]

mean value: 0.9554261485616145

key: test_fscore
value: [1.         0.82352941 0.75       1.         0.93333333 0.85714286
 0.94117647 1.         0.88888889 0.88888889]

mean value: 0.908295985060691

key: train_fscore
value: [0.95652174 0.97841727 0.96503497 0.94366197 0.95035461 0.97142857
 0.95035461 0.95714286 0.95652174 0.93617021]

mean value: 0.9565608542509413

key: test_precision
value: [1.         0.77777778 0.66666667 1.         0.875      0.85714286
 0.88888889 1.         0.8        0.8       ]

mean value: 0.866547619047619

key: train_precision
value: [0.94285714 0.95774648 0.93243243 0.91780822 0.93055556 0.95774648
 0.91780822 0.93055556 0.94285714 0.90410959]

mean value: 0.9334476814401569

key: test_recall
value: [1.         0.875      0.85714286 1.         1.         0.85714286
 1.         1.         1.         1.        ]

mean value: 0.9589285714285715

key: train_recall
value: [0.97058824 1.         1.         0.97101449 0.97101449 0.98550725
 0.98529412 0.98529412 0.97058824 0.97058824]

mean value: 0.9809889173060529

key: test_roc_auc
value: [1.         0.8125     0.74107143 1.         0.9375     0.86607143
 0.92857143 1.         0.85714286 0.85714286]

mean value: 0.9

key: train_roc_auc
value: [0.95588235 0.97794118 0.96323529 0.9413896  0.94874254 0.9706948
 0.9491688  0.95641517 0.95630861 0.93456948]

mean value: 0.9554347826086956

key: test_jcc
value: [1.         0.7        0.6        1.         0.875      0.75
 0.88888889 1.         0.8        0.8       ]

mean value: 0.8413888888888889

key: train_jcc
value: [0.91666667 0.95774648 0.93243243 0.89333333 0.90540541 0.94444444
 0.90540541 0.91780822 0.91666667 0.88      ]

mean value: 0.9169909052405676

MCC on Blind test: 0.06

Accuracy on Blind test: 0.65

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02651024 0.0071528  0.00675678 0.00662589 0.00688267 0.00663829
 0.00680137 0.00680256 0.00683355 0.00683928]

mean value: 0.008784341812133788

key: score_time
value: [0.01571369 0.00825262 0.00792074 0.0078783  0.00784111 0.00788522
 0.00770473 0.00788617 0.00793123 0.00775051]

mean value: 0.008676433563232422

key: test_mcc
value: [0.62994079 0.37796447 0.21821789 0.60714286 0.73214286 0.26189246
 0.66143783 0.87287156 0.46428571 0.46428571]

mean value: 0.529018214646944

key: train_mcc
value: [0.55979287 0.57408838 0.62076318 0.57703846 0.54864511 0.60584099
 0.57730871 0.51887407 0.56235346 0.56235346]

mean value: 0.5707058671664582

key: test_accuracy
value: [0.8125     0.6875     0.6        0.8        0.86666667 0.6
 0.8        0.93333333 0.73333333 0.73333333]

mean value: 0.7566666666666667

key: train_accuracy
value: [0.77941176 0.78676471 0.81021898 0.78832117 0.77372263 0.80291971
 0.78832117 0.75912409 0.7810219  0.7810219 ]

mean value: 0.785084800343495

key: test_fscore
value: [0.8        0.66666667 0.625      0.8        0.85714286 0.66666667
 0.76923077 0.94117647 0.75       0.75      ]

mean value: 0.7625883430295195

key: train_fscore
value: [0.78571429 0.79136691 0.80882353 0.79432624 0.78321678 0.8057554
 0.79136691 0.76258993 0.7826087  0.7826087 ]

mean value: 0.7888377367472581

key: test_precision
value: [0.85714286 0.71428571 0.55555556 0.75       0.85714286 0.54545455
 1.         0.88888889 0.75       0.75      ]

mean value: 0.7668470418470419

key: train_precision
value: [0.76388889 0.77464789 0.82089552 0.77777778 0.75675676 0.8
 0.77464789 0.74647887 0.77142857 0.77142857]

mean value: 0.775795073655595

key: test_recall
value: [0.75       0.625      0.71428571 0.85714286 0.85714286 0.85714286
 0.625      1.         0.75       0.75      ]

mean value: 0.7785714285714286

key: train_recall
value: [0.80882353 0.80882353 0.79710145 0.8115942  0.8115942  0.8115942
 0.80882353 0.77941176 0.79411765 0.79411765]

mean value: 0.8026001705029838

key: test_roc_auc
value: [0.8125     0.6875     0.60714286 0.80357143 0.86607143 0.61607143
 0.8125     0.92857143 0.73214286 0.73214286]

mean value: 0.7598214285714285

key: train_roc_auc
value: [0.77941176 0.78676471 0.81031543 0.78815004 0.77344416 0.80285592
 0.78846974 0.7592711  0.78111679 0.78111679]

mean value: 0.7850916453537937

key: test_jcc
value: [0.66666667 0.5        0.45454545 0.66666667 0.75       0.5
 0.625      0.88888889 0.6        0.6       ]

mean value: 0.6251767676767677

key: train_jcc
value: [0.64705882 0.6547619  0.67901235 0.65882353 0.64367816 0.6746988
 0.6547619  0.61627907 0.64285714 0.64285714]

mean value: 0.651478881972599

MCC on Blind test: 0.11

Accuracy on Blind test: 0.62

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00804782 0.00781918 0.00786948 0.00782323 0.00760174 0.00795841
 0.00802374 0.00730324 0.00732517 0.00728822]

mean value: 0.0077060222625732425

key: score_time
value: [0.00777936 0.00796533 0.00840831 0.00785279 0.00842953 0.00844431
 0.00777602 0.00784135 0.00779104 0.00782919]

mean value: 0.008011722564697265

key: test_mcc
value: [0.8819171  0.62994079 0.49099025 1.         0.73214286 0.60714286
 0.6000992  1.         0.64465837 0.6000992 ]

mean value: 0.7186990626871869

key: train_mcc
value: [0.89949371 0.91215932 0.92791659 0.88466669 0.94199209 0.94160273
 0.88938138 0.8687127  0.84688958 0.86000692]

mean value: 0.8972821710057162

key: test_accuracy
value: [0.9375     0.8125     0.73333333 1.         0.86666667 0.8
 0.8        1.         0.8        0.8       ]

mean value: 0.855

key: train_accuracy
value: [0.94852941 0.95588235 0.96350365 0.94160584 0.97080292 0.97080292
 0.94160584 0.93430657 0.91970803 0.9270073 ]

mean value: 0.9473754830399312

key: test_fscore
value: [0.93333333 0.82352941 0.75       1.         0.85714286 0.8
 0.82352941 1.         0.84210526 0.82352941]

mean value: 0.8653169688928203

key: train_fscore
value: [0.94656489 0.95652174 0.96296296 0.94366197 0.97142857 0.97101449
 0.94444444 0.93430657 0.92413793 0.93055556]

mean value: 0.9485599123980311

key: test_precision
value: [1.         0.77777778 0.66666667 1.         0.85714286 0.75
 0.77777778 1.         0.72727273 0.77777778]

mean value: 0.8334415584415584

key: train_precision
value: [0.98412698 0.94285714 0.98484848 0.91780822 0.95774648 0.97101449
 0.89473684 0.92753623 0.87012987 0.88157895]

mean value: 0.9332383694125169

key: test_recall
value: [0.875      0.875      0.85714286 1.         0.85714286 0.85714286
 0.875      1.         1.         0.875     ]

mean value: 0.9071428571428571

key: train_recall
value: [0.91176471 0.97058824 0.94202899 0.97101449 0.98550725 0.97101449
 1.         0.94117647 0.98529412 0.98529412]

mean value: 0.9663682864450128

key: test_roc_auc
value: [0.9375     0.8125     0.74107143 1.         0.86607143 0.80357143
 0.79464286 1.         0.78571429 0.79464286]

mean value: 0.8535714285714285

key: train_roc_auc
value: [0.94852941 0.95588235 0.96366155 0.9413896  0.9706948  0.97080136
 0.94202899 0.93435635 0.92018329 0.92742967]

mean value: 0.9474957374254049

key: test_jcc
value: [0.875      0.7        0.6        1.         0.75       0.66666667
 0.7        1.         0.72727273 0.7       ]

mean value: 0.7718939393939394

key: train_jcc
value: [0.89855072 0.91666667 0.92857143 0.89333333 0.94444444 0.94366197
 0.89473684 0.87671233 0.85897436 0.87012987]

mean value: 0.9025781969461155

MCC on Blind test: 0.07

Accuracy on Blind test: 0.69

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00999784 0.0094552  0.00723362 0.00716352 0.00696826 0.00690985
 0.00689554 0.00771546 0.0079     0.00781608]

mean value: 0.007805538177490234

key: score_time
value: [0.01038742 0.00955176 0.00792933 0.00781178 0.00785041 0.00781822
 0.00781894 0.00777292 0.00842071 0.00786495]

mean value: 0.008322644233703613

key: test_mcc
value: [0.8819171  0.62994079 0.49099025 0.875      0.76376262 0.60714286
 0.46428571 0.53452248 0.46428571 0.47245559]

mean value: 0.6184303121694533

key: train_mcc
value: [0.88580789 0.81600218 0.92791659 0.9001543  0.80787444 0.80014442
 0.8437116  0.64876322 0.87609014 0.86339318]

mean value: 0.836985797579123

key: test_accuracy
value: [0.9375     0.8125     0.73333333 0.93333333 0.86666667 0.8
 0.73333333 0.73333333 0.73333333 0.73333333]

mean value: 0.8016666666666666

key: train_accuracy
value: [0.94117647 0.90441176 0.96350365 0.94890511 0.89781022 0.89051095
 0.91970803 0.79562044 0.93430657 0.9270073 ]

mean value: 0.912296049806784

key: test_fscore
value: [0.93333333 0.82352941 0.75       0.93333333 0.875      0.8
 0.75       0.8        0.75       0.77777778]

mean value: 0.819297385620915

key: train_fscore
value: [0.93846154 0.91034483 0.96296296 0.95104895 0.90666667 0.90196078
 0.91472868 0.82926829 0.92913386 0.93150685]

mean value: 0.9176083413476306

key: test_precision
value: [1.         0.77777778 0.66666667 0.875      0.77777778 0.75
 0.75       0.66666667 0.75       0.7       ]

mean value: 0.7713888888888889

key: train_precision
value: [0.98387097 0.85714286 0.98484848 0.91891892 0.83950617 0.82142857
 0.96721311 0.70833333 1.         0.87179487]

mean value: 0.8953057292802578

key: test_recall
value: [0.875      0.875      0.85714286 1.         1.         0.85714286
 0.75       1.         0.75       0.875     ]

mean value: 0.8839285714285714

key: train_recall
value: [0.89705882 0.97058824 0.94202899 0.98550725 0.98550725 1.
 0.86764706 1.         0.86764706 1.        ]

mean value: 0.9515984654731457

key: test_roc_auc
value: [0.9375     0.8125     0.74107143 0.9375     0.875      0.80357143
 0.73214286 0.71428571 0.73214286 0.72321429]

mean value: 0.8008928571428571

key: train_roc_auc
value: [0.94117647 0.90441176 0.96366155 0.94863598 0.89716539 0.88970588
 0.91933078 0.79710145 0.93382353 0.92753623]

mean value: 0.9122549019607843

key: test_jcc
value: [0.875      0.7        0.6        0.875      0.77777778 0.66666667
 0.6        0.66666667 0.6        0.63636364]

mean value: 0.6997474747474748

key: train_jcc
value: [0.88405797 0.83544304 0.92857143 0.90666667 0.82926829 0.82142857
 0.84285714 0.70833333 0.86764706 0.87179487]

mean value: 0.8496068375147647

MCC on Blind test: 0.06

Accuracy on Blind test: 0.66

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.07770419 0.06228852 0.0625062  0.06266785 0.06289601 0.06246185
 0.06297612 0.06292748 0.06235862 0.06280899]

mean value: 0.06415958404541015

key: score_time
value: [0.01418233 0.01393175 0.01422071 0.01399136 0.01391673 0.01394653
 0.01503801 0.01420355 0.01413107 0.01432395]

mean value: 0.014188599586486817

key: test_mcc
value: [0.8819171  0.75       0.875      0.875      0.73214286 0.87287156
 0.87287156 1.         0.75592895 0.76376262]

mean value: 0.8379494644563421

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.93333333 0.93333333 0.86666667 0.93333333
 0.93333333 1.         0.86666667 0.86666667]

mean value: 0.9145833333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.875      0.93333333 0.93333333 0.85714286 0.92307692
 0.94117647 1.         0.88888889 0.85714286]

mean value: 0.9142427996839761

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.875      0.875      0.85714286 1.
 0.88888889 1.         0.8        1.        ]

mean value: 0.9171031746031746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         1.         0.85714286 0.85714286
 1.         1.         1.         0.75      ]

mean value: 0.9214285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      0.9375     0.9375     0.86607143 0.92857143
 0.92857143 1.         0.85714286 0.875     ]

mean value: 0.9142857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.77777778 0.875      0.875      0.75       0.85714286
 0.88888889 1.         0.8        0.75      ]

mean value: 0.8448809523809524

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.75

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.02706838 0.02781153 0.04628038 0.03850269 0.0461607  0.04655218
 0.04729891 0.04126883 0.03603816 0.0402298 ]

mean value: 0.03972115516662598

key: score_time
value: [0.02073336 0.02294326 0.03598142 0.040658   0.03594398 0.03722
 0.03625917 0.02713251 0.02583647 0.03715944]

mean value: 0.03198676109313965

key: test_mcc
value: [0.8819171  0.8819171  1.         1.         0.73214286 0.73214286
 0.87287156 0.87287156 0.73214286 1.        ]

mean value: 0.8706005900692904

key: train_mcc
value: [0.98540068 1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9970907922626642

key: test_accuracy
value: [0.9375     0.9375     1.         1.         0.86666667 0.86666667
 0.93333333 0.93333333 0.86666667 1.        ]

mean value: 0.9341666666666667

key: train_accuracy
value: [0.99264706 1.         1.         1.         1.         1.
 1.         0.99270073 1.         1.        ]

mean value: 0.9985347788750537

key: test_fscore
value: [0.94117647 0.94117647 1.         1.         0.85714286 0.85714286
 0.94117647 0.94117647 0.875      1.        ]

mean value: 0.9353991596638656

key: train_fscore
value: [0.99259259 1.         1.         1.         1.         1.
 1.         0.99270073 1.         1.        ]

mean value: 0.99852933225196

key: test_precision
value: [0.88888889 0.88888889 1.         1.         0.85714286 0.85714286
 0.88888889 0.88888889 0.875      1.        ]

mean value: 0.914484126984127

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9985507246376811

key: test_recall
value: [1.         1.         1.         1.         0.85714286 0.85714286
 1.         1.         0.875      1.        ]

mean value: 0.9589285714285715

key: train_recall
value: [0.98529412 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9985294117647059

key: test_roc_auc
value: [0.9375     0.9375     1.         1.         0.86607143 0.86607143
 0.92857143 0.92857143 0.86607143 1.        ]

mean value: 0.9330357142857143

key: train_roc_auc
value: [0.99264706 1.         1.         1.         1.         1.
 1.         0.99275362 1.         1.        ]

mean value: 0.9985400682011936

key: test_jcc
value: [0.88888889 0.88888889 1.         1.         0.75       0.75
 0.88888889 0.88888889 0.77777778 1.        ]

mean value: 0.8833333333333333

key: train_jcc
value: [0.98529412 1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.997080136402387

MCC on Blind test: 0.12

Accuracy on Blind test: 0.85

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03373861 0.03912592 0.04229856 0.04023004 0.04611397 0.04011154
 0.03928065 0.04038763 0.04050183 0.04010868]

mean value: 0.04018974304199219

key: score_time
value: [0.0198133  0.01117086 0.01123762 0.02080536 0.02091765 0.01118398
 0.02124166 0.02203465 0.01984    0.02217436]

mean value: 0.01804194450378418

key: test_mcc
value: [0.77459667 0.37796447 0.33928571 0.56407607 0.76376262 0.73214286
 0.37796447 0.87287156 0.64465837 0.46428571]

mean value: 0.5911608523782237

key: train_mcc
value: [0.94117647 0.95598573 0.98550418 0.95630861 0.94160273 0.97080136
 0.97080136 0.97080136 0.97080136 0.94201665]

mean value: 0.9605799824099576

key: test_accuracy
value: [0.875      0.6875     0.66666667 0.73333333 0.86666667 0.86666667
 0.66666667 0.93333333 0.8        0.73333333]

mean value: 0.7829166666666667

key: train_accuracy
value: [0.97058824 0.97794118 0.99270073 0.97810219 0.97080292 0.98540146
 0.98540146 0.98540146 0.98540146 0.97080292]

mean value: 0.9802544010304852

key: test_fscore
value: [0.88888889 0.66666667 0.66666667 0.77777778 0.875      0.85714286
 0.61538462 0.94117647 0.84210526 0.75      ]

mean value: 0.7880809206273602

key: train_fscore
value: [0.97058824 0.97810219 0.99280576 0.97810219 0.97101449 0.98550725
 0.98529412 0.98529412 0.98529412 0.97101449]

mean value: 0.980301695507708

key: test_precision
value: [0.8        0.71428571 0.625      0.63636364 0.77777778 0.85714286
 0.8        0.88888889 0.72727273 0.75      ]

mean value: 0.7576731601731602

key: train_precision
value: [0.97058824 0.97101449 0.98571429 0.98529412 0.97101449 0.98550725
 0.98529412 0.98529412 0.98529412 0.95714286]

mean value: 0.9782158080623554

key: test_recall
value: [1.         0.625      0.71428571 1.         1.         0.85714286
 0.5        1.         1.         0.75      ]

mean value: 0.8446428571428571

key: train_recall
value: [0.97058824 0.98529412 1.         0.97101449 0.97101449 0.98550725
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.982459505541347

key: test_roc_auc
value: [0.875      0.6875     0.66964286 0.75       0.875      0.86607143
 0.67857143 0.92857143 0.78571429 0.73214286]

mean value: 0.7848214285714286

key: train_roc_auc
value: [0.97058824 0.97794118 0.99264706 0.97815431 0.97080136 0.98540068
 0.98540068 0.98540068 0.98540068 0.97090793]

mean value: 0.9802642796248935

key: test_jcc
value: [0.8        0.5        0.5        0.63636364 0.77777778 0.75
 0.44444444 0.88888889 0.72727273 0.6       ]

mean value: 0.6624747474747474

key: train_jcc
value: [0.94285714 0.95714286 0.98571429 0.95714286 0.94366197 0.97142857
 0.97101449 0.97101449 0.97101449 0.94366197]

mean value: 0.9614653136208555

MCC on Blind test: 0.05

Accuracy on Blind test: 0.62

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.09781337 0.10118818 0.09096408 0.09063625 0.08830929 0.08863807
 0.1010282  0.0922606  0.0915482  0.09096527]

mean value: 0.09333515167236328

key: score_time
value: [0.00950933 0.00844288 0.00881338 0.00852418 0.00897932 0.00888801
 0.00875974 0.00871754 0.00904465 0.00866079]

mean value: 0.008833980560302735

key: test_mcc
value: [0.8819171  0.8819171  1.         1.         0.73214286 0.73214286
 0.87287156 0.87287156 0.73214286 1.        ]

mean value: 0.8706005900692904

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     1.         1.         0.86666667 0.86666667
 0.93333333 0.93333333 0.86666667 1.        ]

mean value: 0.9341666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.94117647 1.         1.         0.85714286 0.85714286
 0.94117647 0.94117647 0.875      1.        ]

mean value: 0.9353991596638656

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.88888889 1.         1.         0.85714286 0.85714286
 0.88888889 0.88888889 0.875      1.        ]

mean value: 0.914484126984127

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.85714286 0.85714286
 1.         1.         0.875      1.        ]

mean value: 0.9589285714285715

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     1.         1.         0.86607143 0.86607143
 0.92857143 0.92857143 0.86607143 1.        ]

mean value: 0.9330357142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.88888889 1.         1.         0.75       0.75
 0.88888889 0.88888889 0.77777778 1.        ]

mean value: 0.8833333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.82

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00983596 0.01095533 0.01153588 0.01127434 0.01293206 0.01323128
 0.01175475 0.01181364 0.01138139 0.01201797]

mean value: 0.011673259735107421

key: score_time
value: [0.01050639 0.01042032 0.01051211 0.01094747 0.01169777 0.01332498
 0.01089931 0.01096082 0.01095772 0.01388907]

mean value: 0.011411595344543456

key: test_mcc
value: [0.75       0.62994079 0.64465837 0.64465837 0.6000992  0.34247476
 0.46770717 0.49099025 0.33928571 0.66143783]

mean value: 0.5571252457078674

key: train_mcc
value: [0.84051051 0.92737353 0.90259957 0.80073303 0.88938138 0.71739374
 0.94318882 0.82498207 0.90246052 0.92944673]

mean value: 0.8678069912939567

key: test_accuracy
value: [0.875      0.8125     0.8        0.8        0.8        0.66666667
 0.66666667 0.73333333 0.66666667 0.8       ]

mean value: 0.7620833333333333

key: train_accuracy
value: [0.91911765 0.96323529 0.94890511 0.89051095 0.94160584 0.83941606
 0.97080292 0.90510949 0.94890511 0.96350365]

mean value: 0.9291112065264062

key: test_fscore
value: [0.875      0.82352941 0.72727273 0.72727273 0.76923077 0.54545455
 0.54545455 0.71428571 0.66666667 0.76923077]

mean value: 0.7163397876633171

key: train_fscore
value: [0.91603053 0.96240602 0.94656489 0.87804878 0.93846154 0.81034483
 0.96969697 0.89430894 0.94573643 0.96183206]

mean value: 0.9223430989384103

key: test_precision
value: [0.875      0.77777778 1.         1.         0.83333333 0.75
 1.         0.83333333 0.71428571 1.        ]

mean value: 0.8783730158730159

key: train_precision
value: [0.95238095 0.98461538 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9936996336996337

key: test_recall
value: [0.875      0.875      0.57142857 0.57142857 0.71428571 0.42857143
 0.375      0.625      0.625      0.625     ]

mean value: 0.6285714285714286

key: train_recall
value: [0.88235294 0.94117647 0.89855072 0.7826087  0.88405797 0.68115942
 0.94117647 0.80882353 0.89705882 0.92647059]

mean value: 0.8643435635123615

key: test_roc_auc
value: [0.875      0.8125     0.78571429 0.78571429 0.79464286 0.65178571
 0.6875     0.74107143 0.66964286 0.8125    ]

mean value: 0.7616071428571428

key: train_roc_auc
value: [0.91911765 0.96323529 0.94927536 0.89130435 0.94202899 0.84057971
 0.97058824 0.90441176 0.94852941 0.96323529]

mean value: 0.9292306052855925

key: test_jcc
value: [0.77777778 0.7        0.57142857 0.57142857 0.625      0.375
 0.375      0.55555556 0.5        0.625     ]

mean value: 0.5676190476190476

key: train_jcc
value: [0.84507042 0.92753623 0.89855072 0.7826087  0.88405797 0.68115942
 0.94117647 0.80882353 0.89705882 0.92647059]

mean value: 0.8592512877778178

MCC on Blind test: 0.13

Accuracy on Blind test: 0.85

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01431322 0.01028609 0.0085125  0.00834036 0.00857472 0.00834465
 0.00830102 0.00752926 0.0077374  0.00805783]

mean value: 0.00899970531463623

key: score_time
value: [0.01112556 0.00929952 0.00890088 0.00855279 0.0085485  0.00859904
 0.00831628 0.00797892 0.00823665 0.00807309]

mean value: 0.00876312255859375

key: test_mcc
value: [0.8819171  0.62994079 0.66143783 1.         0.875      0.73214286
 0.6000992  1.         0.75592895 0.6000992 ]

mean value: 0.7736565919262326

key: train_mcc
value: [0.86849267 0.89715584 0.89791134 0.88355744 0.88355744 0.89863497
 0.85440207 0.85440207 0.89791134 0.86948194]

mean value: 0.8805507116446566

key: test_accuracy
value: [0.9375     0.8125     0.8        1.         0.93333333 0.86666667
 0.8        1.         0.86666667 0.8       ]

mean value: 0.8816666666666667

key: train_accuracy
value: [0.93382353 0.94852941 0.94890511 0.94160584 0.94160584 0.94890511
 0.9270073  0.9270073  0.94890511 0.93430657]

mean value: 0.9400601116358952

key: test_fscore
value: [0.93333333 0.82352941 0.82352941 1.         0.93333333 0.85714286
 0.82352941 1.         0.88888889 0.82352941]

mean value: 0.8906816059757237

key: train_fscore
value: [0.9352518  0.94890511 0.94890511 0.94285714 0.94285714 0.95035461
 0.92753623 0.92753623 0.94890511 0.9352518 ]

mean value: 0.9408360285000935

key: test_precision
value: [1.         0.77777778 0.7        1.         0.875      0.85714286
 0.77777778 1.         0.8        0.77777778]

mean value: 0.856547619047619

key: train_precision
value: [0.91549296 0.94202899 0.95588235 0.92957746 0.92957746 0.93055556
 0.91428571 0.91428571 0.94202899 0.91549296]

mean value: 0.9289208153153076

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 0.875      1.         1.         0.875     ]

mean value: 0.9357142857142857

key: train_recall
value: [0.95588235 0.95588235 0.94202899 0.95652174 0.95652174 0.97101449
 0.94117647 0.94117647 0.95588235 0.95588235]

mean value: 0.9531969309462915

key: test_roc_auc
value: [0.9375     0.8125     0.8125     1.         0.9375     0.86607143
 0.79464286 1.         0.85714286 0.79464286]

mean value: 0.88125

key: train_roc_auc
value: [0.93382353 0.94852941 0.94895567 0.94149616 0.94149616 0.94874254
 0.92710997 0.92710997 0.94895567 0.93446292]

mean value: 0.940068201193521

key: test_jcc
value: [0.875 0.7   0.7   1.    0.875 0.75  0.7   1.    0.8   0.7  ]

mean value: 0.8099999999999999

key: train_jcc
value: [0.87837838 0.90277778 0.90277778 0.89189189 0.89189189 0.90540541
 0.86486486 0.86486486 0.90277778 0.87837838]

mean value: 0.888400900900901

MCC on Blind test: 0.06

Accuracy on Blind test: 0.67

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.07313299 0.06227994 0.06231952 0.06033921 0.06083584 0.06107545
 0.06096387 0.06110859 0.06186771 0.06140947]

mean value: 0.06253325939178467

key: score_time
value: [0.00833368 0.00824118 0.00828338 0.00820613 0.00824714 0.00827527
 0.00827289 0.00825977 0.00888276 0.00831437]

mean value: 0.008331656455993652

key: test_mcc
value: [0.8819171  0.62994079 0.66143783 1.         0.875      0.73214286
 0.75592895 1.         0.75592895 0.6000992 ]

mean value: 0.7892395667131802

key: train_mcc
value: [0.86849267 0.89715584 0.8978896  0.89863497 0.88355744 0.92709446
 0.89869927 0.85440207 0.92710997 0.87099729]

mean value: 0.8924033569902855

key: test_accuracy
value: [0.9375     0.8125     0.8        1.         0.93333333 0.86666667
 0.86666667 1.         0.86666667 0.8       ]

mean value: 0.8883333333333333

key: train_accuracy
value: [0.93382353 0.94852941 0.94890511 0.94890511 0.94160584 0.96350365
 0.94890511 0.9270073  0.96350365 0.93430657]

mean value: 0.9458995276942894

key: test_fscore
value: [0.93333333 0.82352941 0.82352941 1.         0.93333333 0.85714286
 0.88888889 1.         0.88888889 0.82352941]

mean value: 0.8972175536881419

key: train_fscore
value: [0.9352518  0.94890511 0.94964029 0.95035461 0.94285714 0.96402878
 0.94964029 0.92753623 0.96350365 0.93617021]

mean value: 0.946788810763946

key: test_precision
value: [1.         0.77777778 0.7        1.         0.875      0.85714286
 0.8        1.         0.8        0.77777778]

mean value: 0.8587698412698412

key: train_precision
value: [0.91549296 0.94202899 0.94285714 0.93055556 0.92957746 0.95714286
 0.92957746 0.91428571 0.95652174 0.90410959]

mean value: 0.932214947084399

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9482142857142857

key: train_recall
value: [0.95588235 0.95588235 0.95652174 0.97101449 0.95652174 0.97101449
 0.97058824 0.94117647 0.97058824 0.97058824]

mean value: 0.9619778346121057

key: test_roc_auc
value: [0.9375     0.8125     0.8125     1.         0.9375     0.86607143
 0.85714286 1.         0.85714286 0.79464286]

mean value: 0.8875000000000001

key: train_roc_auc
value: [0.93382353 0.94852941 0.9488491  0.94874254 0.94149616 0.96344842
 0.94906223 0.92710997 0.96355499 0.93456948]

mean value: 0.9459185848252345

key: test_jcc
value: [0.875 0.7   0.7   1.    0.875 0.75  0.8   1.    0.8   0.7  ]

mean value: 0.82

key: train_jcc
value: [0.87837838 0.90277778 0.90410959 0.90540541 0.89189189 0.93055556
 0.90410959 0.86486486 0.92957746 0.88      ]

mean value: 0.8991670516744799

MCC on Blind test: 0.06

Accuracy on Blind test: 0.67

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01616359 0.01377511 0.01260805 0.01199722 0.01317811 0.01211739
 0.01303506 0.01296759 0.01235151 0.01292706]

mean value: 0.013112068176269531

key: score_time
value: [0.01072264 0.00871634 0.00817013 0.00809884 0.00805712 0.0079968
 0.00806427 0.00803137 0.00803447 0.00811362]

mean value: 0.008400559425354004

key: test_mcc
value: [0.8819171  0.5        0.37796447 0.73214286 0.87287156 0.60714286
 0.60714286 0.60714286 0.64465837 0.6000992 ]

mean value: 0.6431082135582106

key: train_mcc
value: [0.77949606 0.80961181 0.82629176 0.78182997 0.81031543 0.82480818
 0.75186529 0.81092683 0.82614456 0.79560955]

mean value: 0.8016899442942331

key: test_accuracy
value: [0.9375     0.75       0.66666667 0.86666667 0.93333333 0.8
 0.8        0.8        0.8        0.8       ]

mean value: 0.8154166666666667

key: train_accuracy
value: [0.88970588 0.90441176 0.91240876 0.89051095 0.90510949 0.91240876
 0.87591241 0.90510949 0.91240876 0.89781022]

mean value: 0.9005796479175612

key: test_fscore
value: [0.93333333 0.75       0.70588235 0.85714286 0.92307692 0.8
 0.8        0.8        0.84210526 0.82352941]

mean value: 0.823507014141689

key: train_fscore
value: [0.88888889 0.90225564 0.91044776 0.88888889 0.90510949 0.91304348
 0.87407407 0.90225564 0.90909091 0.89705882]

mean value: 0.8991113591173656

key: test_precision
value: [1.         0.75       0.6        0.85714286 1.         0.75
 0.85714286 0.85714286 0.72727273 0.77777778]

mean value: 0.8176479076479076

key: train_precision
value: [0.89552239 0.92307692 0.93846154 0.90909091 0.91176471 0.91304348
 0.88059701 0.92307692 0.9375     0.89705882]

mean value: 0.9129192704364003

key: test_recall
value: [0.875      0.75       0.85714286 0.85714286 0.85714286 0.85714286
 0.75       0.75       1.         0.875     ]

mean value: 0.8428571428571429

key: train_recall
value: [0.88235294 0.88235294 0.88405797 0.86956522 0.89855072 0.91304348
 0.86764706 0.88235294 0.88235294 0.89705882]

mean value: 0.8859335038363171

key: test_roc_auc
value: [0.9375     0.75       0.67857143 0.86607143 0.92857143 0.80357143
 0.80357143 0.80357143 0.78571429 0.79464286]

mean value: 0.8151785714285714

key: train_roc_auc
value: [0.88970588 0.90441176 0.91261722 0.89066496 0.90515772 0.91240409
 0.87585251 0.90494459 0.91219096 0.89780477]

mean value: 0.9005754475703325

key: test_jcc
value: [0.875      0.6        0.54545455 0.75       0.85714286 0.66666667
 0.66666667 0.66666667 0.72727273 0.7       ]

mean value: 0.705487012987013

key: train_jcc
value: [0.8        0.82191781 0.83561644 0.8        0.82666667 0.84
 0.77631579 0.82191781 0.83333333 0.81333333]

mean value: 0.8169101177601538

MCC on Blind test: 0.12

Accuracy on Blind test: 0.66

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.37280297 0.37843227 0.38014102 0.37847543 0.37922144 0.38759017
 0.37933087 0.39223146 0.38670659 0.38647294]

mean value: 0.38214051723480225

key: score_time
value: [0.0084753  0.00828695 0.00884271 0.00918055 0.00932026 0.00898337
 0.00927162 0.00885415 0.00936317 0.00934863]

mean value: 0.008992671966552734

key: test_mcc
value: [1.         0.77459667 0.66143783 0.76376262 0.73214286 0.60714286
 0.75592895 0.87287156 0.75592895 0.6000992 ]

mean value: 0.7523911478249176

key: train_mcc
value: [0.94158382 1.         0.95629932 0.94199209 0.95629932 0.98550418
 0.95713391 1.         1.         1.        ]

mean value: 0.9738812635764046

key: test_accuracy
value: [1.         0.875      0.8        0.86666667 0.86666667 0.8
 0.86666667 0.93333333 0.86666667 0.8       ]

mean value: 0.8675

key: train_accuracy
value: [0.97058824 1.         0.97810219 0.97080292 0.97810219 0.99270073
 0.97810219 1.         1.         1.        ]

mean value: 0.986839845427222

key: test_fscore
value: [1.         0.88888889 0.82352941 0.875      0.85714286 0.8
 0.88888889 0.94117647 0.88888889 0.82352941]

mean value: 0.8787044817927171

key: train_fscore
value: [0.97101449 1.         0.97841727 0.97142857 0.97841727 0.99280576
 0.97841727 1.         1.         1.        ]

mean value: 0.9870500618139029

key: test_precision
value: [1.         0.8        0.7        0.77777778 0.85714286 0.75
 0.8        0.88888889 0.8        0.77777778]

mean value: 0.8151587301587302

key: train_precision
value: [0.95714286 1.         0.97142857 0.95774648 0.97142857 0.98571429
 0.95774648 1.         1.         1.        ]

mean value: 0.9801207243460764

key: test_recall
value: [1.         1.         1.         1.         0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9589285714285715

key: train_recall
value: [0.98529412 1.         0.98550725 0.98550725 0.98550725 1.
 1.         1.         1.         1.        ]

mean value: 0.9941815856777494

key: test_roc_auc
value: [1.         0.875      0.8125     0.875      0.86607143 0.80357143
 0.85714286 0.92857143 0.85714286 0.79464286]

mean value: 0.8669642857142857

key: train_roc_auc
value: [0.97058824 1.         0.97804774 0.9706948  0.97804774 0.99264706
 0.97826087 1.         1.         1.        ]

mean value: 0.9868286445012788

key: test_jcc
value: [1.         0.8        0.7        0.77777778 0.75       0.66666667
 0.8        0.88888889 0.8        0.7       ]

mean value: 0.7883333333333333

key: train_jcc
value: [0.94366197 1.         0.95774648 0.94444444 0.95774648 0.98571429
 0.95774648 1.         1.         1.        ]

mean value: 0.9747060138609435

MCC on Blind test: 0.0

Accuracy on Blind test: 0.68

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00959182 0.00908065 0.00727654 0.0070405  0.0074923  0.00699615
 0.00736642 0.00702    0.00749421 0.0074172 ]

mean value: 0.0076775789260864254

key: score_time
value: [0.01065612 0.01025677 0.00826311 0.0082767  0.00856185 0.00839043
 0.00823283 0.00838685 0.00851226 0.00863767]

mean value: 0.008817458152770996

key: test_mcc
value: [0.77459667 0.37796447 0.49099025 0.37796447 0.21821789 0.49099025
 0.18898224 0.46428571 0.64465837 0.20044593]

mean value: 0.42290962650028463

key: train_mcc
value: [0.57208135 0.54899485 0.52400868 0.47754676 0.56162481 0.60455208
 0.60096088 0.6802431  0.57604541 0.66161034]

mean value: 0.5807668254236807

key: test_accuracy
value: [0.875      0.6875     0.73333333 0.66666667 0.6        0.73333333
 0.6        0.73333333 0.8        0.6       ]

mean value: 0.7029166666666666

key: train_accuracy
value: [0.77205882 0.76470588 0.74452555 0.72992701 0.76642336 0.79562044
 0.78832117 0.83211679 0.77372263 0.81751825]

mean value: 0.7784939888364105

key: test_fscore
value: [0.88888889 0.66666667 0.75       0.70588235 0.625      0.75
 0.66666667 0.75       0.84210526 0.7       ]

mean value: 0.7345209838321294

key: train_fscore
value: [0.80254777 0.79220779 0.78527607 0.76433121 0.8        0.77419355
 0.81290323 0.80991736 0.80254777 0.83870968]

mean value: 0.7982634424404584

key: test_precision
value: [0.8        0.71428571 0.66666667 0.6        0.55555556 0.66666667
 0.6        0.75       0.72727273 0.58333333]

mean value: 0.6663780663780664

key: train_precision
value: [0.70786517 0.70930233 0.68085106 0.68181818 0.7032967  0.87272727
 0.72413793 0.9245283  0.70786517 0.74712644]

mean value: 0.7459518554034876

key: test_recall
value: [1.         0.625      0.85714286 0.85714286 0.71428571 0.85714286
 0.75       0.75       1.         0.875     ]

mean value: 0.8285714285714285

key: train_recall
value: [0.92647059 0.89705882 0.92753623 0.86956522 0.92753623 0.69565217
 0.92647059 0.72058824 0.92647059 0.95588235]

mean value: 0.8773231031543052

key: test_roc_auc
value: [0.875      0.6875     0.74107143 0.67857143 0.60714286 0.74107143
 0.58928571 0.73214286 0.78571429 0.58035714]

mean value: 0.7017857142857142

key: train_roc_auc
value: [0.77205882 0.76470588 0.74317988 0.72890026 0.7652387  0.7963555
 0.78932225 0.83130861 0.7748295  0.81852089]

mean value: 0.7784420289855073

key: test_jcc
value: [0.8        0.5        0.6        0.54545455 0.45454545 0.6
 0.5        0.6        0.72727273 0.53846154]

mean value: 0.5865734265734266

key: train_jcc
value: [0.67021277 0.65591398 0.64646465 0.6185567  0.66666667 0.63157895
 0.68478261 0.68055556 0.67021277 0.72222222]

mean value: 0.6647166858413609

MCC on Blind test: 0.02

Accuracy on Blind test: 0.47

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00780797 0.00748897 0.00779843 0.00788808 0.007725   0.00771189
 0.00735712 0.00735378 0.0075953  0.00707674]

mean value: 0.007580327987670899

key: score_time
value: [0.00868344 0.00823379 0.00862813 0.00852036 0.00870013 0.00808978
 0.00818658 0.0080483  0.00845337 0.0080812 ]

mean value: 0.008362507820129395

key: test_mcc
value: [0.25       0.25819889 0.07142857 0.33928571 0.46428571 0.13363062
 0.33928571 0.46428571 0.33928571 0.49099025]

mean value: 0.3150676906591499

key: train_mcc
value: [0.48788604 0.49441323 0.48933032 0.47900717 0.52059257 0.46076782
 0.4312221  0.41698711 0.44522592 0.43208129]

mean value: 0.46575135687893415

key: test_accuracy
value: [0.625      0.625      0.53333333 0.66666667 0.73333333 0.53333333
 0.66666667 0.73333333 0.66666667 0.73333333]

mean value: 0.6516666666666666

key: train_accuracy
value: [0.74264706 0.74264706 0.74452555 0.73722628 0.75912409 0.72992701
 0.71532847 0.7080292  0.72262774 0.71532847]

mean value: 0.7317410905968227

key: test_fscore
value: [0.625      0.57142857 0.53333333 0.66666667 0.71428571 0.63157895
 0.66666667 0.75       0.66666667 0.71428571]

mean value: 0.6539912280701754

key: train_fscore
value: [0.75524476 0.76510067 0.75177305 0.75675676 0.77241379 0.74125874
 0.71942446 0.71428571 0.72058824 0.72340426]

mean value: 0.7420250432480666

key: test_precision
value: [0.625      0.66666667 0.5        0.625      0.71428571 0.5
 0.71428571 0.75       0.71428571 0.83333333]

mean value: 0.6642857142857143

key: train_precision
value: [0.72       0.7037037  0.73611111 0.70886076 0.73684211 0.71621622
 0.70422535 0.69444444 0.72058824 0.69863014]

mean value: 0.7139622064625399

key: test_recall
value: [0.625      0.5        0.57142857 0.71428571 0.71428571 0.85714286
 0.625      0.75       0.625      0.625     ]

mean value: 0.6607142857142857

key: train_recall
value: [0.79411765 0.83823529 0.76811594 0.8115942  0.8115942  0.76811594
 0.73529412 0.73529412 0.72058824 0.75      ]

mean value: 0.7732949701619778

key: test_roc_auc
value: [0.625      0.625      0.53571429 0.66964286 0.73214286 0.55357143
 0.66964286 0.73214286 0.66964286 0.74107143]

mean value: 0.6553571428571429

key: train_roc_auc
value: [0.74264706 0.74264706 0.74435209 0.73667945 0.75873828 0.72964621
 0.71547315 0.70822677 0.72261296 0.71557971]

mean value: 0.7316602728047741

key: test_jcc
value: [0.45454545 0.4        0.36363636 0.5        0.55555556 0.46153846
 0.5        0.6        0.5        0.55555556]

mean value: 0.4890831390831391

key: train_jcc
value: [0.60674157 0.61956522 0.60227273 0.60869565 0.62921348 0.58888889
 0.56179775 0.55555556 0.56321839 0.56666667]

mean value: 0.5902615907742418

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00691152 0.00645924 0.00683975 0.00714922 0.00726128 0.00734305
 0.00735426 0.00752354 0.00730586 0.00730991]

mean value: 0.0071457624435424805

key: score_time
value: [0.00938153 0.00884461 0.00903273 0.0093987  0.00931835 0.00955343
 0.00963545 0.0096395  0.00957155 0.00971961]

mean value: 0.009409546852111816

key: test_mcc
value: [ 0.62994079  0.5         0.49099025  0.6000992   0.49099025  0.32732684
 -0.02620712  0.46428571  0.32732684  0.32732684]

mean value: 0.4132079591989289

key: train_mcc
value: [0.69731096 0.6918501  0.75815907 0.66971076 0.69510727 0.70910029
 0.6523446  0.71313464 0.68163703 0.66616982]

mean value: 0.6934524542628495

key: test_accuracy
value: [0.8125     0.75       0.73333333 0.8        0.73333333 0.66666667
 0.46666667 0.73333333 0.66666667 0.66666667]

mean value: 0.7029166666666666

key: train_accuracy
value: [0.84558824 0.84558824 0.87591241 0.83211679 0.84671533 0.8540146
 0.82481752 0.8540146  0.83941606 0.83211679]

mean value: 0.8450300558179475

key: test_fscore
value: [0.82352941 0.75       0.75       0.76923077 0.75       0.61538462
 0.2        0.75       0.70588235 0.70588235]

mean value: 0.6819909502262443

key: train_fscore
value: [0.85517241 0.84892086 0.88435374 0.84353741 0.85314685 0.85915493
 0.83098592 0.86111111 0.84507042 0.83687943]

mean value: 0.8518333098052753

key: test_precision
value: [0.77777778 0.75       0.66666667 0.83333333 0.66666667 0.66666667
 0.5        0.75       0.66666667 0.66666667]

mean value: 0.6944444444444444

key: train_precision
value: [0.80519481 0.83098592 0.83333333 0.79487179 0.82432432 0.83561644
 0.7972973  0.81578947 0.81081081 0.80821918]

mean value: 0.815644337144789

key: test_recall
value: [0.875      0.75       0.85714286 0.71428571 0.85714286 0.57142857
 0.125      0.75       0.75       0.75      ]

mean value: 0.7

key: train_recall
value: [0.91176471 0.86764706 0.94202899 0.89855072 0.88405797 0.88405797
 0.86764706 0.91176471 0.88235294 0.86764706]

mean value: 0.8917519181585678

key: test_roc_auc
value: [0.8125     0.75       0.74107143 0.79464286 0.74107143 0.66071429
 0.49107143 0.73214286 0.66071429 0.66071429]

mean value: 0.7044642857142858

key: train_roc_auc
value: [0.84558824 0.84558824 0.87542626 0.8316283  0.84644075 0.85379369
 0.82512788 0.85443308 0.8397272  0.83237425]

mean value: 0.8450127877237852

key: test_jcc
value: [0.7        0.6        0.6        0.625      0.6        0.44444444
 0.11111111 0.6        0.54545455 0.54545455]

mean value: 0.5371464646464646

key: train_jcc
value: [0.74698795 0.7375     0.79268293 0.72941176 0.74390244 0.75308642
 0.71084337 0.75609756 0.73170732 0.7195122 ]

mean value: 0.7421731948784563

MCC on Blind test: 0.06

Accuracy on Blind test: 0.68

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00869989 0.00870991 0.00899339 0.00884175 0.00880337 0.00869131
 0.00819468 0.00880289 0.00880551 0.00904202]

mean value: 0.008758473396301269

key: score_time
value: [0.00901675 0.00891495 0.00872993 0.00879502 0.00886464 0.00874734
 0.00867009 0.00894928 0.00889111 0.00890279]

mean value: 0.008848190307617188

key: test_mcc
value: [0.75       0.62994079 0.49099025 0.76376262 0.60714286 0.60714286
 0.76376262 0.60714286 0.75592895 0.73214286]

mean value: 0.6707956647621525

key: train_mcc
value: [0.85294118 0.88235294 0.86868474 0.81027501 0.8687127  0.85434012
 0.89869927 0.89869927 0.89863497 0.85440207]

mean value: 0.8687742253690149

key: test_accuracy
value: [0.875      0.8125     0.73333333 0.86666667 0.8        0.8
 0.86666667 0.8        0.86666667 0.86666667]

mean value: 0.82875

key: train_accuracy
value: [0.92647059 0.94117647 0.93430657 0.90510949 0.93430657 0.9270073
 0.94890511 0.94890511 0.94890511 0.9270073 ]

mean value: 0.9342099613568055

key: test_fscore
value: [0.875      0.8        0.75       0.875      0.8        0.8
 0.85714286 0.8        0.88888889 0.875     ]

mean value: 0.8321031746031746

key: train_fscore
value: [0.92647059 0.94117647 0.9352518  0.90647482 0.93430657 0.92857143
 0.94964029 0.94964029 0.94736842 0.92753623]

mean value: 0.9346436903919317

key: test_precision
value: [0.875      0.85714286 0.66666667 0.77777778 0.75       0.75
 1.         0.85714286 0.8        0.875     ]

mean value: 0.8208730158730159

key: train_precision
value: [0.92647059 0.94117647 0.92857143 0.9        0.94117647 0.91549296
 0.92957746 0.92957746 0.96923077 0.91428571]

mean value: 0.929555932882362

key: test_recall
value: [0.875      0.75       0.85714286 1.         0.85714286 0.85714286
 0.75       0.75       1.         0.875     ]

mean value: 0.8571428571428571

key: train_recall
value: [0.92647059 0.94117647 0.94202899 0.91304348 0.92753623 0.94202899
 0.97058824 0.97058824 0.92647059 0.94117647]

mean value: 0.9401108269394715

key: test_roc_auc
value: [0.875      0.8125     0.74107143 0.875      0.80357143 0.80357143
 0.875      0.80357143 0.85714286 0.86607143]

mean value: 0.83125

key: train_roc_auc
value: [0.92647059 0.94117647 0.93424979 0.90505115 0.93435635 0.92689685
 0.94906223 0.94906223 0.94874254 0.92710997]

mean value: 0.9342178175618073

key: test_jcc
value: [0.77777778 0.66666667 0.6        0.77777778 0.66666667 0.66666667
 0.75       0.66666667 0.8        0.77777778]

mean value: 0.715

key: train_jcc
value: [0.8630137  0.88888889 0.87837838 0.82894737 0.87671233 0.86666667
 0.90410959 0.90410959 0.9        0.86486486]

mean value: 0.8775691372699304

MCC on Blind test: 0.13

Accuracy on Blind test: 0.69

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.47774863 0.47219229 0.59577179 0.48569918 0.47681928 0.47980237
 0.52681231 0.59629416 0.46796393 0.48238492]

mean value: 0.506148886680603

key: score_time
value: [0.01098704 0.01345611 0.01107907 0.01333833 0.01326942 0.01334047
 0.01134348 0.01388001 0.01111221 0.01353312]

mean value: 0.012533926963806152

key: test_mcc
value: [1.         0.77459667 0.37796447 0.60714286 0.76376262 0.60714286
 0.46428571 0.60714286 0.75592895 0.73214286]

mean value: 0.6690109846952281

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      0.66666667 0.8        0.86666667 0.8
 0.73333333 0.8        0.86666667 0.86666667]

mean value: 0.8275

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.88888889 0.70588235 0.8        0.875      0.8
 0.75       0.8        0.88888889 0.875     ]

mean value: 0.8383660130718954

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.8        0.6        0.75       0.77777778 0.75
 0.75       0.85714286 0.8        0.875     ]

mean value: 0.7959920634920635

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.85714286 0.85714286 1.         0.85714286
 0.75       0.75       1.         0.875     ]

mean value: 0.8946428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      0.67857143 0.80357143 0.875      0.80357143
 0.73214286 0.80357143 0.85714286 0.86607143]

mean value: 0.8294642857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.8        0.54545455 0.66666667 0.77777778 0.66666667
 0.6        0.66666667 0.8        0.77777778]

mean value: 0.7301010101010101

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.69

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01036239 0.00926948 0.00805068 0.00823069 0.00814795 0.00783706
 0.00812101 0.00809956 0.00801969 0.00834656]

mean value: 0.008448505401611328

key: score_time
value: [0.01827431 0.00897479 0.00929928 0.00881314 0.00856185 0.00861549
 0.00854754 0.00869775 0.00881147 0.00856495]

mean value: 0.009716057777404785

key: test_mcc
value: [1.         0.8819171  1.         1.         0.875      0.87287156
 0.87287156 0.75592895 0.87287156 0.875     ]

mean value: 0.9006460732538559

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     1.         1.         0.93333333 0.93333333
 0.93333333 0.86666667 0.93333333 0.93333333]

mean value: 0.9470833333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 1.         1.         0.93333333 0.92307692
 0.94117647 0.88888889 0.94117647 0.93333333]

mean value: 0.9502161890397185

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 1.         1.         0.875      1.
 0.88888889 0.8        0.88888889 1.        ]

mean value: 0.9341666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     1.         1.         0.9375     0.92857143
 0.92857143 0.85714286 0.92857143 0.9375    ]

mean value: 0.9455357142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 1.         1.         0.875      0.85714286
 0.88888889 0.8        0.88888889 0.875     ]

mean value: 0.9073809523809524

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.85

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08768535 0.08900571 0.08186841 0.087538   0.08398795 0.08419561
 0.083009   0.08226275 0.08182836 0.0807085 ]

mean value: 0.08420896530151367

key: score_time
value: [0.01827073 0.01797938 0.01798153 0.01794004 0.01733184 0.01720476
 0.01783872 0.01791549 0.017555   0.01719594]

mean value: 0.017721343040466308

key: test_mcc
value: [1.         0.75       0.73214286 1.         0.875      0.73214286
 0.60714286 0.76376262 0.87287156 0.76376262]

mean value: 0.8096825364024487

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      0.86666667 1.         0.93333333 0.86666667
 0.8        0.86666667 0.93333333 0.86666667]

mean value: 0.9008333333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.875      0.85714286 1.         0.93333333 0.85714286
 0.8        0.85714286 0.94117647 0.85714286]

mean value: 0.8978081232492997

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.85714286 1.         0.875      0.85714286
 0.85714286 1.         0.88888889 1.        ]

mean value: 0.921031746031746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 1.         1.         0.85714286
 0.75       0.75       1.         0.75      ]

mean value: 0.8839285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      0.86607143 1.         0.9375     0.86607143
 0.80357143 0.875      0.92857143 0.875     ]

mean value: 0.9026785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.77777778 0.75       1.         0.875      0.75
 0.66666667 0.75       0.88888889 0.75      ]

mean value: 0.8208333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.81

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00700188 0.00755453 0.00768661 0.00771928 0.00732183 0.00702024
 0.00708175 0.00722766 0.00722528 0.00710702]

mean value: 0.007294607162475586

key: score_time
value: [0.00804567 0.00840735 0.0089283  0.0083468  0.0084908  0.00807238
 0.00812387 0.0082643  0.00818753 0.00800323]

mean value: 0.00828702449798584

key: test_mcc
value: [0.8819171  0.8819171  0.73214286 1.         0.76376262 0.46428571
 0.49099025 0.60714286 0.875      0.87287156]

mean value: 0.7570030065748747

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     0.86666667 1.         0.86666667 0.73333333
 0.73333333 0.8        0.93333333 0.93333333]

mean value: 0.8741666666666666

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.93333333 0.85714286 1.         0.875      0.71428571
 0.71428571 0.8        0.93333333 0.94117647]

mean value: 0.8709733893557423

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 1.         0.85714286 1.         0.77777778 0.71428571
 0.83333333 0.85714286 1.         0.88888889]

mean value: 0.8817460317460317

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 1.         1.         0.71428571
 0.625      0.75       0.875      1.        ]

mean value: 0.8696428571428572

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     0.86607143 1.         0.875      0.73214286
 0.74107143 0.80357143 0.9375     0.92857143]

mean value: 0.8758928571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.875      0.75       1.         0.77777778 0.55555556
 0.55555556 0.66666667 0.875      0.88888889]

mean value: 0.7833333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.73

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.0097611  1.00532484 1.01266861 1.00928307 1.0162096  1.00374866
 1.01397824 1.01551723 1.02490425 1.04691529]

mean value: 1.0158310890197755

key: score_time
value: [0.15017748 0.09301543 0.09229612 0.09591055 0.09012294 0.09085989
 0.09002423 0.09411788 0.09723639 0.09498525]

mean value: 0.09887461662292481

key: test_mcc
value: [1.         0.8819171  0.76376262 1.         0.875      0.73214286
 0.60714286 0.73214286 0.87287156 0.875     ]

mean value: 0.8339979851886711

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     0.86666667 1.         0.93333333 0.86666667
 0.8        0.86666667 0.93333333 0.93333333]

mean value: 0.9137500000000001

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 0.875      1.         0.93333333 0.85714286
 0.8        0.875      0.94117647 0.93333333]

mean value: 0.9156162464985994

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 0.77777778 1.         0.875      0.85714286
 0.85714286 0.875      0.88888889 1.        ]

mean value: 0.901984126984127

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 0.75       0.875      1.         0.875     ]

mean value: 0.9357142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     0.875      1.         0.9375     0.86607143
 0.80357143 0.86607143 0.92857143 0.9375    ]

mean value: 0.9151785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 0.77777778 1.         0.875      0.75
 0.66666667 0.77777778 0.88888889 0.875     ]

mean value: 0.85

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.84915781 0.96325994 0.88296103 0.89262009 0.85590529 0.8617866
 0.86219358 0.88421559 0.87171268 0.90677834]

mean value: 0.8830590963363647

key: score_time
value: [0.23183656 0.20871425 0.23059011 0.22569108 0.22029448 0.24542952
 0.22994447 0.24367285 0.24643469 0.23650432]

mean value: 0.23191123008728026

key: test_mcc
value: [1.         0.75       0.76376262 1.         0.73214286 0.60714286
 0.60714286 0.73214286 0.87287156 0.875     ]

mean value: 0.794020560534137

key: train_mcc
value: [0.98540068 0.94117647 0.98550418 0.97120941 0.94160273 0.98550418
 0.98550725 0.98550725 0.97122151 0.97122151]

mean value: 0.9723855158091337

key: test_accuracy
value: [1.         0.875      0.86666667 1.         0.86666667 0.8
 0.8        0.86666667 0.93333333 0.93333333]

mean value: 0.8941666666666667

key: train_accuracy
value: [0.99264706 0.97058824 0.99270073 0.98540146 0.97080292 0.99270073
 0.99270073 0.99270073 0.98540146 0.98540146]

mean value: 0.986104551309575

key: test_fscore
value: [1.         0.875      0.875      1.         0.85714286 0.8
 0.8        0.875      0.94117647 0.93333333]

mean value: 0.8956652661064426

key: train_fscore
value: [0.99270073 0.97058824 0.99280576 0.98571429 0.97101449 0.99280576
 0.99270073 0.99270073 0.98550725 0.98550725]

mean value: 0.9862045207088039

key: test_precision
value: [1.         0.875      0.77777778 1.         0.85714286 0.75
 0.85714286 0.875      0.88888889 1.        ]

mean value: 0.888095238095238

key: train_precision
value: [0.98550725 0.97058824 0.98571429 0.97183099 0.97101449 0.98571429
 0.98550725 0.98550725 0.97142857 0.97142857]

mean value: 0.9784241167379383

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 0.75       0.875      1.         0.875     ]

mean value: 0.9089285714285714

key: train_recall
value: [1.         0.97058824 1.         1.         0.97101449 1.
 1.         1.         1.         1.        ]

mean value: 0.994160272804774

key: test_roc_auc
value: [1.         0.875      0.875      1.         0.86607143 0.80357143
 0.80357143 0.86607143 0.92857143 0.9375    ]

mean value: 0.8955357142857143

key: train_roc_auc
value: [0.99264706 0.97058824 0.99264706 0.98529412 0.97080136 0.99264706
 0.99275362 0.99275362 0.98550725 0.98550725]

mean value: 0.986114663256607

key: test_jcc
value: [1.         0.77777778 0.77777778 1.         0.75       0.66666667
 0.66666667 0.77777778 0.88888889 0.875     ]

mean value: 0.8180555555555555

key: train_jcc
value: [0.98550725 0.94285714 0.98571429 0.97183099 0.94366197 0.98571429
 0.98550725 0.98550725 0.97142857 0.97142857]

mean value: 0.9729157554019771

MCC on Blind test: 0.1

Accuracy on Blind test: 0.8

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01659274 0.00693059 0.00675678 0.0067389  0.00669909 0.00674319
 0.00680494 0.00672269 0.00675416 0.00671124]

mean value: 0.007745432853698731

key: score_time
value: [0.01080561 0.00839496 0.00837088 0.00777602 0.00775194 0.00776267
 0.00777245 0.00776482 0.00776577 0.00774288]

mean value: 0.008190798759460449

key: test_mcc
value: [0.25       0.25819889 0.07142857 0.33928571 0.46428571 0.13363062
 0.33928571 0.46428571 0.33928571 0.49099025]

mean value: 0.3150676906591499

key: train_mcc
value: [0.48788604 0.49441323 0.48933032 0.47900717 0.52059257 0.46076782
 0.4312221  0.41698711 0.44522592 0.43208129]

mean value: 0.46575135687893415

key: test_accuracy
value: [0.625      0.625      0.53333333 0.66666667 0.73333333 0.53333333
 0.66666667 0.73333333 0.66666667 0.73333333]

mean value: 0.6516666666666666

key: train_accuracy
value: [0.74264706 0.74264706 0.74452555 0.73722628 0.75912409 0.72992701
 0.71532847 0.7080292  0.72262774 0.71532847]

mean value: 0.7317410905968227

key: test_fscore
value: [0.625      0.57142857 0.53333333 0.66666667 0.71428571 0.63157895
 0.66666667 0.75       0.66666667 0.71428571]

mean value: 0.6539912280701754

key: train_fscore
value: [0.75524476 0.76510067 0.75177305 0.75675676 0.77241379 0.74125874
 0.71942446 0.71428571 0.72058824 0.72340426]

mean value: 0.7420250432480666

key: test_precision
value: [0.625      0.66666667 0.5        0.625      0.71428571 0.5
 0.71428571 0.75       0.71428571 0.83333333]

mean value: 0.6642857142857143

key: train_precision
value: [0.72       0.7037037  0.73611111 0.70886076 0.73684211 0.71621622
 0.70422535 0.69444444 0.72058824 0.69863014]

mean value: 0.7139622064625399

key: test_recall
value: [0.625      0.5        0.57142857 0.71428571 0.71428571 0.85714286
 0.625      0.75       0.625      0.625     ]

mean value: 0.6607142857142857

key: train_recall
value: [0.79411765 0.83823529 0.76811594 0.8115942  0.8115942  0.76811594
 0.73529412 0.73529412 0.72058824 0.75      ]

mean value: 0.7732949701619778

key: test_roc_auc
value: [0.625      0.625      0.53571429 0.66964286 0.73214286 0.55357143
 0.66964286 0.73214286 0.66964286 0.74107143]

mean value: 0.6553571428571429

key: train_roc_auc
value: [0.74264706 0.74264706 0.74435209 0.73667945 0.75873828 0.72964621
 0.71547315 0.70822677 0.72261296 0.71557971]

mean value: 0.7316602728047741

key: test_jcc
value: [0.45454545 0.4        0.36363636 0.5        0.55555556 0.46153846
 0.5        0.6        0.5        0.55555556]

mean value: 0.4890831390831391

key: train_jcc
value: [0.60674157 0.61956522 0.60227273 0.60869565 0.62921348 0.58888889
 0.56179775 0.55555556 0.56321839 0.56666667]

mean value: 0.5902615907742418

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.05043268 0.03555036 0.05953455 0.03447628 0.03820968 0.0348382
 0.03491855 0.03489041 0.03481722 0.03513861]

mean value: 0.03928065299987793

key: score_time
value: [0.01032662 0.01029825 0.0103786  0.01031733 0.01061702 0.01034212
 0.01034379 0.01036835 0.01033378 0.01031613]

mean value: 0.010364198684692382

key: test_mcc
value: [1.         0.8819171  1.         1.         0.875      1.
 0.87287156 1.         0.87287156 0.875     ]

mean value: 0.9377660225576135

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     1.         1.         0.93333333 1.
 0.93333333 1.         0.93333333 0.93333333]

mean value: 0.9670833333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 1.         1.         0.93333333 1.
 0.94117647 1.         0.94117647 0.93333333]

mean value: 0.9690196078431372

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 1.         1.         0.875      1.
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9541666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.    1.    1.    1.    1.    1.    1.    1.    1.    0.875]

mean value: 0.9875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     1.         1.         0.9375     1.
 0.92857143 1.         0.92857143 0.9375    ]

mean value: 0.9669642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 1.         1.         0.875      1.
 0.88888889 1.         0.88888889 0.875     ]

mean value: 0.9416666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01003599 0.01160932 0.01194644 0.01213264 0.0120163  0.01201224
 0.01192141 0.01221108 0.01222897 0.01220608]

mean value: 0.011832046508789062

key: score_time
value: [0.01034403 0.01014495 0.01055765 0.01064253 0.01057649 0.01061487
 0.01058221 0.01066971 0.01062822 0.01060128]

mean value: 0.01053619384765625

key: test_mcc
value: [0.8819171  0.77459667 0.49099025 1.         0.73214286 0.73214286
 0.87287156 0.76376262 0.75592895 0.75592895]

mean value: 0.7760281809053229

key: train_mcc
value: [0.89949371 0.91533482 0.90246052 0.91392776 0.92787101 0.95710706
 0.9139999  0.92951942 0.92791659 0.92951942]

mean value: 0.9217150203470457

key: test_accuracy
value: [0.9375     0.875      0.73333333 1.         0.86666667 0.86666667
 0.93333333 0.86666667 0.86666667 0.86666667]

mean value: 0.88125

key: train_accuracy
value: [0.94852941 0.95588235 0.94890511 0.95620438 0.96350365 0.97810219
 0.95620438 0.96350365 0.96350365 0.96350365]

mean value: 0.9597842421640189

key: test_fscore
value: [0.94117647 0.88888889 0.75       1.         0.85714286 0.85714286
 0.94117647 0.85714286 0.88888889 0.88888889]

mean value: 0.8870448179271708

key: train_fscore
value: [0.95035461 0.95774648 0.95172414 0.95774648 0.96453901 0.9787234
 0.95714286 0.96453901 0.96402878 0.96453901]

mean value: 0.961108376525978

key: test_precision
value: [0.88888889 0.8        0.66666667 1.         0.85714286 0.85714286
 0.88888889 1.         0.8        0.8       ]

mean value: 0.8558730158730159

key: train_precision
value: [0.91780822 0.91891892 0.90789474 0.93150685 0.94444444 0.95833333
 0.93055556 0.93150685 0.94366197 0.93150685]

mean value: 0.9316137728048631

key: test_recall
value: [1.         1.         0.85714286 1.         0.85714286 0.85714286
 1.         0.75       1.         1.        ]

mean value: 0.9321428571428572

key: train_recall
value: [0.98529412 1.         1.         0.98550725 0.98550725 1.
 0.98529412 1.         0.98529412 1.        ]

mean value: 0.99268968456948

key: test_roc_auc
value: [0.9375     0.875      0.74107143 1.         0.86607143 0.86607143
 0.92857143 0.875      0.85714286 0.85714286]

mean value: 0.8803571428571428

key: train_roc_auc
value: [0.94852941 0.95588235 0.94852941 0.95598892 0.96334186 0.97794118
 0.95641517 0.96376812 0.96366155 0.96376812]

mean value: 0.9597826086956522

key: test_jcc
value: [0.88888889 0.8        0.6        1.         0.75       0.75
 0.88888889 0.75       0.8        0.8       ]

mean value: 0.8027777777777778

key: train_jcc
value: [0.90540541 0.91891892 0.90789474 0.91891892 0.93150685 0.95833333
 0.91780822 0.93150685 0.93055556 0.93150685]

mean value: 0.9252355636097525

MCC on Blind test: 0.05

Accuracy on Blind test: 0.64

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00948811 0.00723052 0.00743365 0.00692177 0.00683284 0.00688004
 0.00742602 0.00687981 0.00724554 0.00696445]

mean value: 0.00733027458190918

key: score_time
value: [0.01050973 0.00838256 0.00804853 0.00779343 0.00792456 0.00785303
 0.00853586 0.00783825 0.00832796 0.00790906]

mean value: 0.008312296867370606

key: test_mcc
value: [0.37796447 0.25819889 0.37796447 0.32732684 0.60714286 0.37796447
 0.49099025 0.33928571 0.33928571 0.46428571]

mean value: 0.3960409397159814

key: train_mcc
value: [0.47243088 0.54894692 0.5182264  0.47592003 0.46076782 0.5335339
 0.4599318  0.4312221  0.47473887 0.47442455]

mean value: 0.4850143267959903

key: test_accuracy
value: [0.6875     0.625      0.66666667 0.66666667 0.8        0.66666667
 0.73333333 0.66666667 0.66666667 0.73333333]

mean value: 0.6912499999999999

key: train_accuracy
value: [0.73529412 0.77205882 0.75912409 0.73722628 0.72992701 0.76642336
 0.72992701 0.71532847 0.73722628 0.73722628]

mean value: 0.7419761700300558

key: test_fscore
value: [0.66666667 0.57142857 0.70588235 0.61538462 0.8        0.70588235
 0.71428571 0.66666667 0.66666667 0.75      ]

mean value: 0.6862863606981253

key: train_fscore
value: [0.74647887 0.7862069  0.76258993 0.75       0.74125874 0.77464789
 0.72992701 0.71942446 0.73913043 0.73529412]

mean value: 0.7484958346591992

key: test_precision
value: [0.71428571 0.66666667 0.6        0.66666667 0.75       0.6
 0.83333333 0.71428571 0.71428571 0.75      ]

mean value: 0.700952380952381

key: train_precision
value: [0.71621622 0.74025974 0.75714286 0.72       0.71621622 0.75342466
 0.72463768 0.70422535 0.72857143 0.73529412]

mean value: 0.729598826685986

key: test_recall
value: [0.625      0.5        0.85714286 0.57142857 0.85714286 0.85714286
 0.625      0.625      0.625      0.75      ]

mean value: 0.6892857142857143

key: train_recall
value: [0.77941176 0.83823529 0.76811594 0.7826087  0.76811594 0.79710145
 0.73529412 0.73529412 0.75       0.73529412]

mean value: 0.7689471440750213

key: test_roc_auc
value: [0.6875     0.625      0.67857143 0.66071429 0.80357143 0.67857143
 0.74107143 0.66964286 0.66964286 0.73214286]

mean value: 0.6946428571428571

key: train_roc_auc
value: [0.73529412 0.77205882 0.75905797 0.73689258 0.72964621 0.76619778
 0.7299659  0.71547315 0.73731884 0.73721228]

mean value: 0.7419117647058824

key: test_jcc
value: [0.5        0.4        0.54545455 0.44444444 0.66666667 0.54545455
 0.55555556 0.5        0.5        0.6       ]

mean value: 0.5257575757575758

key: train_jcc
value: [0.59550562 0.64772727 0.61627907 0.6        0.58888889 0.63218391
 0.57471264 0.56179775 0.5862069  0.58139535]

mean value: 0.5984697399283192

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00826144 0.00782967 0.00737524 0.00810552 0.00831509 0.00756001
 0.00788832 0.00767446 0.00802422 0.00808358]

mean value: 0.007911753654479981

key: score_time
value: [0.00887156 0.00873542 0.00787902 0.00805974 0.00852108 0.00807381
 0.00853586 0.00868702 0.00864601 0.00846553]

mean value: 0.008447504043579102

key: test_mcc
value: [0.77459667 0.5        0.47245559 0.64465837 0.73214286 0.60714286
 0.64465837 0.87287156 0.64465837 0.6000992 ]

mean value: 0.6493283847542592

key: train_mcc
value: [0.76894131 0.91334626 0.54803747 0.87326937 0.94160273 0.83757093
 0.91597649 0.88476385 0.87099729 0.88476385]

mean value: 0.8439269536443883

key: test_accuracy
value: [0.875      0.75       0.73333333 0.8        0.86666667 0.8
 0.8        0.93333333 0.8        0.8       ]

mean value: 0.8158333333333334

key: train_accuracy
value: [0.875      0.95588235 0.72992701 0.93430657 0.97080292 0.91240876
 0.95620438 0.94160584 0.93430657 0.94160584]

mean value: 0.9152050236152856

key: test_fscore
value: [0.85714286 0.75       0.66666667 0.72727273 0.85714286 0.8
 0.84210526 0.94117647 0.84210526 0.82352941]

mean value: 0.8107141516893839

key: train_fscore
value: [0.85950413 0.95454545 0.63366337 0.93129771 0.97101449 0.92
 0.95774648 0.94285714 0.93617021 0.94285714]

mean value: 0.9049656133144264

key: test_precision
value: [1.         0.75       0.8        1.         0.85714286 0.75
 0.72727273 0.88888889 0.72727273 0.77777778]

mean value: 0.8278354978354978

key: train_precision
value: [0.98113208 0.984375   1.         0.98387097 0.97101449 0.85185185
 0.91891892 0.91666667 0.90410959 0.91666667]

mean value: 0.9428606229112457

key: test_recall
value: [0.75       0.75       0.57142857 0.57142857 0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.8232142857142857

key: train_recall
value: [0.76470588 0.92647059 0.46376812 0.88405797 0.97101449 1.
 1.         0.97058824 0.97058824 0.97058824]

mean value: 0.8921781756180733

key: test_roc_auc
value: [0.875      0.75       0.72321429 0.78571429 0.86607143 0.80357143
 0.78571429 0.92857143 0.78571429 0.79464286]

mean value: 0.8098214285714286

key: train_roc_auc
value: [0.875      0.95588235 0.73188406 0.93467604 0.97080136 0.91176471
 0.95652174 0.94181586 0.93456948 0.94181586]

mean value: 0.9154731457800511

key: test_jcc
value: [0.75       0.6        0.5        0.57142857 0.75       0.66666667
 0.72727273 0.88888889 0.72727273 0.7       ]

mean value: 0.6881529581529582

key: train_jcc
value: [0.75362319 0.91304348 0.46376812 0.87142857 0.94366197 0.85185185
 0.91891892 0.89189189 0.88       0.89189189]

mean value: 0.8380079880422807

MCC on Blind test: 0.06

Accuracy on Blind test: 0.89

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00997114 0.01002455 0.00783682 0.00783896 0.00782728 0.00736952
 0.00789356 0.00743914 0.00764585 0.0072844 ]

mean value: 0.00811312198638916

key: score_time
value: [0.01067495 0.00957513 0.00809073 0.00825882 0.0082314  0.00791526
 0.00799417 0.00835299 0.00847554 0.00832438]

mean value: 0.008589339256286622

key: test_mcc
value: [0.77459667 0.37796447 0.36689969 0.60714286 0.49099025 0.73214286
 0.6000992  0.73214286 0.75592895 0.73214286]

mean value: 0.6170050660873226

key: train_mcc
value: [0.72669793 0.88580789 0.78788403 0.74493056 0.77817796 0.91597649
 0.92951942 0.85434012 0.86000692 0.91240409]

mean value: 0.8395745411348854

key: test_accuracy
value: [0.875      0.6875     0.6        0.8        0.73333333 0.86666667
 0.8        0.86666667 0.86666667 0.86666667]

mean value: 0.79625

key: train_accuracy
value: [0.84558824 0.94117647 0.88321168 0.86861314 0.88321168 0.95620438
 0.96350365 0.9270073  0.9270073  0.95620438]

mean value: 0.9151728209531989

key: test_fscore
value: [0.85714286 0.66666667 0.7        0.8        0.75       0.85714286
 0.82352941 0.875      0.88888889 0.875     ]

mean value: 0.8093370681605976

key: train_fscore
value: [0.8173913  0.93846154 0.8961039  0.87837838 0.89333333 0.95454545
 0.96453901 0.92537313 0.93055556 0.95588235]

mean value: 0.9154563955087716

key: test_precision
value: [1.         0.71428571 0.53846154 0.75       0.66666667 0.85714286
 0.77777778 0.875      0.8        0.875     ]

mean value: 0.7854334554334554

key: train_precision
value: [1.         0.98387097 0.81176471 0.82278481 0.82716049 1.
 0.93150685 0.93939394 0.88157895 0.95588235]

mean value: 0.9153943066596637

key: test_recall
value: [0.75       0.625      1.         0.85714286 0.85714286 0.85714286
 0.875      0.875      1.         0.875     ]

mean value: 0.8571428571428571

key: train_recall
value: [0.69117647 0.89705882 1.         0.94202899 0.97101449 0.91304348
 1.         0.91176471 0.98529412 0.95588235]

mean value: 0.9267263427109974

key: test_roc_auc
value: [0.875      0.6875     0.625      0.80357143 0.74107143 0.86607143
 0.79464286 0.86607143 0.85714286 0.86607143]

mean value: 0.7982142857142858

key: train_roc_auc
value: [0.84558824 0.94117647 0.88235294 0.86807332 0.88256607 0.95652174
 0.96376812 0.92689685 0.92742967 0.95620205]

mean value: 0.9150575447570333

key: test_jcc
value: [0.75       0.5        0.53846154 0.66666667 0.6        0.75
 0.7        0.77777778 0.8        0.77777778]

mean value: 0.6860683760683761

key: train_jcc
value: [0.69117647 0.88405797 0.81176471 0.78313253 0.80722892 0.91304348
 0.93150685 0.86111111 0.87012987 0.91549296]

mean value: 0.8468644859831611

MCC on Blind test: 0.04

Accuracy on Blind test: 0.85

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.07661414 0.06280541 0.0673337  0.06278038 0.06312895 0.06578207
 0.06545353 0.06469059 0.06454372 0.06606507]

mean value: 0.06591975688934326

key: score_time
value: [0.01440525 0.01476049 0.01512003 0.01427507 0.01462126 0.01519179
 0.01491117 0.01524901 0.01458573 0.01450872]

mean value: 0.01476285457611084

key: test_mcc
value: [1.         0.8819171  0.76376262 0.875      0.73214286 0.87287156
 0.87287156 1.         0.87287156 0.73214286]

mean value: 0.8603580116631793

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     0.86666667 0.93333333 0.86666667 0.93333333
 0.93333333 1.         0.93333333 0.86666667]

mean value: 0.9270833333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 0.875      0.93333333 0.85714286 0.92307692
 0.94117647 1.         0.94117647 0.875     ]

mean value: 0.9287082525317819

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 0.77777778 0.875      0.85714286 1.
 0.88888889 1.         0.88888889 0.875     ]

mean value: 0.9051587301587302

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9589285714285715

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     0.875      0.9375     0.86607143 0.92857143
 0.92857143 1.         0.92857143 0.86607143]

mean value: 0.9267857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 0.77777778 0.875      0.75       0.85714286
 0.88888889 1.         0.88888889 0.77777778]

mean value: 0.870436507936508

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.76

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.02507782 0.02816606 0.04579067 0.0453043  0.04246235 0.02354765
 0.0451479  0.02337193 0.02406144 0.03010583]

mean value: 0.0333035945892334

key: score_time
value: [0.0173595  0.02051401 0.03634977 0.03514004 0.01607704 0.03413272
 0.02627468 0.01672935 0.02149177 0.03459334]

mean value: 0.025866222381591798

key: test_mcc
value: [1.         0.8819171  1.         1.         0.875      0.73214286
 0.87287156 1.         0.87287156 0.875     ]

mean value: 0.9109803082718992

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9985507246376811

key: test_accuracy
value: [1.         0.9375     1.         1.         0.93333333 0.86666667
 0.93333333 1.         0.93333333 0.93333333]

mean value: 0.95375

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         0.99270073 1.         1.        ]

mean value: 0.9992700729927008

key: test_fscore
value: [1.         0.94117647 1.         1.         0.93333333 0.85714286
 0.94117647 1.         0.94117647 0.93333333]

mean value: 0.954733893557423

key: train_fscore
value: [1.         1.         1.         1.         1.         1.
 1.         0.99270073 1.         1.        ]

mean value: 0.9992700729927008

key: test_precision
value: [1.         0.88888889 1.         1.         0.875      0.85714286
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9398809523809524

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9985507246376811

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     1.         1.         0.9375     0.86607143
 0.92857143 1.         0.92857143 0.9375    ]

mean value: 0.9535714285714286

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         0.99275362 1.         1.        ]

mean value: 0.9992753623188406

key: test_jcc
value: [1.         0.88888889 1.         1.         0.875      0.75
 0.88888889 1.         0.88888889 0.875     ]

mean value: 0.9166666666666666

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9985507246376811

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03250933 0.03897357 0.01707721 0.01700234 0.01710153 0.01721501
 0.04007459 0.0396409  0.03977489 0.04000473]

mean value: 0.02993741035461426

key: score_time
value: [0.01946115 0.01910114 0.01107264 0.01099253 0.01101065 0.01094913
 0.02085972 0.01942563 0.01113796 0.02103782]

mean value: 0.015504837036132812

key: test_mcc
value: [0.8819171  0.75       0.37796447 0.73214286 0.76376262 0.73214286
 0.76376262 0.60714286 0.6000992  0.60714286]

mean value: 0.6816077435069778

key: train_mcc
value: [0.95598573 0.98540068 0.98550418 0.98550725 0.97080136 0.97080136
 0.98550725 0.97080136 0.97120941 0.97080136]

mean value: 0.9752319946905791

key: test_accuracy
value: [0.9375     0.875      0.66666667 0.86666667 0.86666667 0.86666667
 0.86666667 0.8        0.8        0.8       ]

mean value: 0.8345833333333333

key: train_accuracy
value: [0.97794118 0.99264706 0.99270073 0.99270073 0.98540146 0.98540146
 0.99270073 0.98540146 0.98540146 0.98540146]

mean value: 0.9875697724345213

key: test_fscore
value: [0.94117647 0.875      0.70588235 0.85714286 0.875      0.85714286
 0.85714286 0.8        0.82352941 0.8       ]

mean value: 0.8392016806722689

key: train_fscore
value: [0.97777778 0.99259259 0.99280576 0.99270073 0.98550725 0.98550725
 0.99270073 0.98529412 0.98507463 0.98529412]

mean value: 0.9875254940533481

key: test_precision
value: [0.88888889 0.875      0.6        0.85714286 0.77777778 0.85714286
 1.         0.85714286 0.77777778 0.85714286]

mean value: 0.8348015873015873

key: train_precision
value: [0.98507463 1.         0.98571429 1.         0.98550725 0.98550725
 0.98550725 0.98529412 1.         0.98529412]

mean value: 0.989789888700451

key: test_recall
value: [1.         0.875      0.85714286 0.85714286 1.         0.85714286
 0.75       0.75       0.875      0.75      ]

mean value: 0.8571428571428571

key: train_recall
value: [0.97058824 0.98529412 1.         0.98550725 0.98550725 0.98550725
 1.         0.98529412 0.97058824 0.98529412]

mean value: 0.9853580562659847

key: test_roc_auc
value: [0.9375     0.875      0.67857143 0.86607143 0.875      0.86607143
 0.875      0.80357143 0.79464286 0.80357143]

mean value: 0.8375

key: train_roc_auc
value: [0.97794118 0.99264706 0.99264706 0.99275362 0.98540068 0.98540068
 0.99275362 0.98540068 0.98529412 0.98540068]

mean value: 0.9875639386189259

key: test_jcc
value: [0.88888889 0.77777778 0.54545455 0.75       0.77777778 0.75
 0.75       0.66666667 0.7        0.66666667]

mean value: 0.7273232323232323

key: train_jcc
value: [0.95652174 0.98529412 0.98571429 0.98550725 0.97142857 0.97142857
 0.98550725 0.97101449 0.97058824 0.97101449]

mean value: 0.9754018998903909

MCC on Blind test: 0.06

Accuracy on Blind test: 0.66

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.10531926 0.09862638 0.1012013  0.09502292 0.09023929 0.08958817
 0.09062433 0.07970119 0.08648419 0.07744169]

mean value: 0.09142487049102783

key: score_time
value: [0.00943542 0.00918198 0.00938845 0.00950336 0.00970459 0.00936961
 0.00830388 0.00854349 0.00833607 0.00825047]

mean value: 0.009001731872558594

key: test_mcc
value: [0.8819171  0.8819171  1.         1.         0.875      0.87287156
 0.87287156 0.87287156 0.87287156 0.875     ]

mean value: 0.9005320451152271

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     1.         1.         0.93333333 0.93333333
 0.93333333 0.93333333 0.93333333 0.93333333]

mean value: 0.9475

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.94117647 1.         1.         0.93333333 0.92307692
 0.94117647 0.94117647 0.94117647 0.93333333]

mean value: 0.9495625942684767

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.88888889 1.         1.         0.875      1.
 0.88888889 0.88888889 0.88888889 1.        ]

mean value: 0.9319444444444445

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     1.         1.         0.9375     0.92857143
 0.92857143 0.92857143 0.92857143 0.9375    ]

mean value: 0.9464285714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.88888889 1.         1.         0.875      0.85714286
 0.88888889 0.88888889 0.88888889 0.875     ]

mean value: 0.9051587301587302

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00895786 0.01093102 0.01075339 0.01098084 0.01611209 0.01137829
 0.01158309 0.01124573 0.01122022 0.01167703]

mean value: 0.01148395538330078

key: score_time
value: [0.01024771 0.01018643 0.01022196 0.01059794 0.010957   0.01306653
 0.01069069 0.01379347 0.01385617 0.01327443]

mean value: 0.011689233779907226

key: test_mcc
value: [1.         0.67419986 0.75592895 0.75592895 0.75592895 0.53452248
 0.56407607 0.60714286 0.76376262 0.76376262]

mean value: 0.7175253347956024

key: train_mcc
value: [0.98540068 1.         1.         1.         1.         1.
 0.87609014 1.         1.         1.        ]

mean value: 0.9861490818102587

key: test_accuracy
value: [1.         0.8125     0.86666667 0.86666667 0.86666667 0.73333333
 0.73333333 0.8        0.86666667 0.86666667]

mean value: 0.84125

key: train_accuracy
value: [0.99264706 1.         1.         1.         1.         1.
 0.93430657 1.         1.         1.        ]

mean value: 0.9926953628166595

key: test_fscore
value: [1.         0.76923077 0.83333333 0.83333333 0.83333333 0.6
 0.66666667 0.8        0.85714286 0.85714286]

mean value: 0.805018315018315

key: train_fscore
value: [0.99259259 1.         1.         1.         1.         1.
 0.92913386 1.         1.         1.        ]

mean value: 0.9921726450860309

key: test_precision
value: [1.         1.         1.         1.         1.         1.
 1.         0.85714286 1.         1.        ]

mean value: 0.9857142857142858

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.625      0.71428571 0.71428571 0.71428571 0.42857143
 0.5        0.75       0.75       0.75      ]

mean value: 0.6946428571428571

key: train_recall
value: [0.98529412 1.         1.         1.         1.         1.
 0.86764706 1.         1.         1.        ]

mean value: 0.9852941176470589

key: test_roc_auc
value: [1.         0.8125     0.85714286 0.85714286 0.85714286 0.71428571
 0.75       0.80357143 0.875      0.875     ]

mean value: 0.8401785714285714

key: train_roc_auc
value: [0.99264706 1.         1.         1.         1.         1.
 0.93382353 1.         1.         1.        ]

mean value: 0.9926470588235294

key: test_jcc
value: [1.         0.625      0.71428571 0.71428571 0.71428571 0.42857143
 0.5        0.66666667 0.75       0.75      ]

mean value: 0.6863095238095238

key: train_jcc
value: [0.98529412 1.         1.         1.         1.         1.
 0.86764706 1.         1.         1.        ]

mean value: 0.9852941176470589

MCC on Blind test: -0.02

Accuracy on Blind test: 0.95

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01463962 0.00981808 0.00781131 0.00768018 0.00760412 0.00761986
 0.00743604 0.00750971 0.00748968 0.00746846]

mean value: 0.008507704734802246

key: score_time
value: [0.01040816 0.0082798  0.00810122 0.00809526 0.00800514 0.0080471
 0.00783396 0.00794506 0.00791073 0.00810385]

mean value: 0.008273029327392578

key: test_mcc
value: [0.8819171  0.62994079 0.37796447 0.87287156 0.73214286 0.73214286
 0.75592895 1.         0.75592895 0.6000992 ]

mean value: 0.7338936730461708

key: train_mcc
value: [0.82388584 0.88273483 0.85440207 0.85434012 0.89863497 0.88320546
 0.90025835 0.84026462 0.88360693 0.86948194]

mean value: 0.8690815123547234

key: test_accuracy
value: [0.9375     0.8125     0.66666667 0.93333333 0.86666667 0.86666667
 0.86666667 1.         0.86666667 0.8       ]

mean value: 0.8616666666666667

key: train_accuracy
value: [0.91176471 0.94117647 0.9270073  0.9270073  0.94890511 0.94160584
 0.94890511 0.91970803 0.94160584 0.93430657]

mean value: 0.9341992271361099

key: test_fscore
value: [0.93333333 0.82352941 0.70588235 0.92307692 0.85714286 0.85714286
 0.88888889 1.         0.88888889 0.82352941]

mean value: 0.8701414924944337

key: train_fscore
value: [0.91304348 0.94202899 0.92647059 0.92857143 0.95035461 0.94202899
 0.95035461 0.92086331 0.94202899 0.9352518 ]

mean value: 0.9350996779361157

key: test_precision
value: [1.         0.77777778 0.6        1.         0.85714286 0.85714286
 0.8        1.         0.8        0.77777778]

mean value: 0.846984126984127

key: train_precision
value: [0.9        0.92857143 0.94029851 0.91549296 0.93055556 0.94202899
 0.91780822 0.90140845 0.92857143 0.91549296]

mean value: 0.9220228491043612

key: test_recall
value: [0.875      0.875      0.85714286 0.85714286 0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9053571428571429

key: train_recall
value: [0.92647059 0.95588235 0.91304348 0.94202899 0.97101449 0.94202899
 0.98529412 0.94117647 0.95588235 0.95588235]

mean value: 0.9488704177323103

key: test_roc_auc
value: [0.9375     0.8125     0.67857143 0.92857143 0.86607143 0.86607143
 0.85714286 1.         0.85714286 0.79464286]

mean value: 0.8598214285714286

key: train_roc_auc
value: [0.91176471 0.94117647 0.92710997 0.92689685 0.94874254 0.94160273
 0.9491688  0.9198636  0.94170929 0.93446292]

mean value: 0.9342497868712702

key: test_jcc
value: [0.875      0.7        0.54545455 0.85714286 0.75       0.75
 0.8        1.         0.8        0.7       ]

mean value: 0.7777597402597403

key: train_jcc
value: [0.84       0.89041096 0.8630137  0.86666667 0.90540541 0.89041096
 0.90540541 0.85333333 0.89041096 0.87837838]

mean value: 0.8783435764531655

MCC on Blind test: 0.07

Accuracy on Blind test: 0.7

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.07311392 0.06029248 0.06152678 0.05939054 0.05958748 0.05935431
 0.05937529 0.05921316 0.0613966  0.06374431]

mean value: 0.06169948577880859

key: score_time
value: [0.00807023 0.00803971 0.00803089 0.00806856 0.00811672 0.00805664
 0.00809479 0.00807714 0.00883532 0.0086627 ]

mean value: 0.008205270767211914

key: test_mcc
value: [0.8819171  0.62994079 0.49099025 0.87287156 0.73214286 0.73214286
 0.75592895 1.         0.75592895 0.6000992 ]

mean value: 0.7451962510483463

key: train_mcc
value: [0.85442069 0.87000211 0.89863497 0.85434012 0.92787101 0.91277477
 0.90025835 0.8555278  0.88360693 0.88668406]

mean value: 0.8844120809526788

key: test_accuracy
value: [0.9375     0.8125     0.73333333 0.93333333 0.86666667 0.86666667
 0.86666667 1.         0.86666667 0.8       ]

mean value: 0.8683333333333334

key: train_accuracy
value: [0.92647059 0.93382353 0.94890511 0.9270073  0.96350365 0.95620438
 0.94890511 0.9270073  0.94160584 0.94160584]

mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9415038643194504

key: test_fscore
value: [0.94117647 0.82352941 0.75       0.92307692 0.85714286 0.85714286
 0.88888889 1.         0.88888889 0.82352941]

mean value: 0.8753375709258062

key: train_fscore
value: [0.92857143 0.93617021 0.95035461 0.92857143 0.96453901 0.95714286
 0.95035461 0.92857143 0.94202899 0.94366197]

mean value: 0.9429966539911687

key: test_precision
value: [0.88888889 0.77777778 0.66666667 1.         0.85714286 0.85714286
 0.8        1.         0.8        0.77777778]

mean value: 0.8425396825396825

key: train_precision
value: [0.90277778 0.90410959 0.93055556 0.91549296 0.94444444 0.94366197
 0.91780822 0.90277778 0.92857143 0.90540541]

mean value: 0.9195605127329032

key: test_recall
value: [1.         0.875      0.85714286 0.85714286 0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9178571428571428

key: train_recall
value: [0.95588235 0.97058824 0.97101449 0.94202899 0.98550725 0.97101449
 0.98529412 0.95588235 0.95588235 0.98529412]

mean value: 0.9678388746803069

key: test_roc_auc
value: [0.9375     0.8125     0.74107143 0.92857143 0.86607143 0.86607143
 0.85714286 1.         0.85714286 0.79464286]

mean value: 0.8660714285714286

key: train_roc_auc
value: [0.92647059 0.93382353 0.94874254 0.92689685 0.96334186 0.95609548
 0.9491688  0.92721654 0.94170929 0.94192242]

mean value: 0.941538789428815

key: test_jcc
value: [0.88888889 0.7        0.6        0.85714286 0.75       0.75
 0.8        1.         0.8        0.7       ]

mean value: 0.7846031746031746

key: train_jcc
value: [0.86666667 0.88       0.90540541 0.86666667 0.93150685 0.91780822
 0.90540541 0.86666667 0.89041096 0.89333333]

mean value: 0.8923870171541405

MCC on Blind test: 0.06

Accuracy on Blind test: 0.66

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01548505 0.01318932 0.01122952 0.01181602 0.01175117 0.01169848
 0.01112366 0.01132369 0.0109632  0.01237416]

mean value: 0.012095427513122559

key: score_time
value: [0.01040673 0.00819731 0.0085001  0.00783849 0.00782514 0.00842071
 0.0079546  0.00785255 0.00782204 0.00846028]

mean value: 0.008327794075012208

key: test_mcc
value: [0.35       0.35       0.8        1.         0.79056942 0.8
 0.5        0.5        0.25819889 1.        ]

mean value: 0.6348768304789256

key: train_mcc
value: [0.87044534 0.87035806 0.87044534 0.81836616 0.81836616 0.84412955
 0.84615385 0.84615385 0.84615385 0.84615385]

mean value: 0.8476726003234742

key: test_accuracy
value: [0.66666667 0.66666667 0.88888889 1.         0.88888889 0.88888889
 0.75       0.75       0.625      1.        ]

mean value: 0.8125

key: train_accuracy
value: [0.93506494 0.93506494 0.93506494 0.90909091 0.90909091 0.92207792
 0.92307692 0.92307692 0.92307692 0.92307692]

mean value: 0.9237762237762238

key: test_fscore
value: [0.66666667 0.66666667 0.88888889 1.         0.90909091 0.88888889
 0.75       0.75       0.57142857 1.        ]

mean value: 0.8091630591630592

key: train_fscore
value: [0.93506494 0.93670886 0.93506494 0.90666667 0.90666667 0.92105263
 0.92307692 0.92307692 0.92307692 0.92307692]

mean value: 0.9233532388109337

key: test_precision
value: [0.6        0.6        0.8        1.         0.83333333 1.
 0.75       0.75       0.66666667 1.        ]

mean value: 0.8

key: train_precision
value: [0.94736842 0.925      0.94736842 0.91891892 0.91891892 0.92105263
 0.92307692 0.92307692 0.92307692 0.92307692]

mean value: 0.9270935003829741

key: test_recall
value: [0.75 0.75 1.   1.   1.   0.8  0.75 0.75 0.5  1.  ]

mean value: 0.83

key: train_recall
value: [0.92307692 0.94871795 0.92307692 0.89473684 0.89473684 0.92105263
 0.92307692 0.92307692 0.92307692 0.92307692]

mean value: 0.9197705802968961

key: test_roc_auc
value: [0.675 0.675 0.9   1.    0.875 0.9   0.75  0.75  0.625 1.   ]

mean value: 0.8150000000000001

key: train_roc_auc
value: [0.93522267 0.93488529 0.93522267 0.90890688 0.90890688 0.92206478
 0.92307692 0.92307692 0.92307692 0.92307692]

mean value: 0.9237516869095818

key: test_jcc
value: [0.5        0.5        0.8        1.         0.83333333 0.8
 0.6        0.6        0.4        1.        ]

mean value: 0.7033333333333334

key: train_jcc
value: [0.87804878 0.88095238 0.87804878 0.82926829 0.82926829 0.85365854
 0.85714286 0.85714286 0.85714286 0.85714286]

mean value: 0.8577816492450638

MCC on Blind test: 0.1

Accuracy on Blind test: 0.57

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.28448343 0.26222992 0.30907059 0.30741835 0.29078984 0.30970097
 0.2858026  0.2821455  0.30858493 0.30905151]

mean value: 0.2949277639389038

key: score_time
value: [0.00840044 0.00826621 0.00892925 0.00815058 0.00974989 0.00875688
 0.00868368 0.00955057 0.00915575 0.00842237]

mean value: 0.008806562423706055

key: test_mcc
value: [0.1        0.35       0.8        0.79056942 1.         1.
 1.         0.5        0.57735027 1.        ]

mean value: 0.711791968423172

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.66666667 0.88888889 0.88888889 1.         1.
 1.         0.75       0.75       1.        ]

mean value: 0.85

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.66666667 0.88888889 0.90909091 1.         1.
 1.         0.75       0.66666667 1.        ]

mean value: 0.8381313131313131

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.6        0.8        0.83333333 1.         1.
 1.         0.75       1.         1.        ]

mean value: 0.8483333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.75 1.   1.   1.   1.   1.   0.75 0.5  1.  ]

mean value: 0.85

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.55  0.675 0.9   0.875 1.    1.    1.    0.75  0.75  1.   ]

mean value: 0.85

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.5        0.8        0.83333333 1.         1.
 1.         0.6        0.5        1.        ]

mean value: 0.7566666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.63

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.0089488  0.00875568 0.0074203  0.00725079 0.00705767 0.00712848
 0.00707436 0.00722599 0.00702119 0.00702357]

mean value: 0.007490682601928711

key: score_time
value: [0.0106101  0.01032877 0.00850701 0.00852227 0.0084753  0.00805879
 0.00836706 0.00811815 0.00836396 0.00837588]

mean value: 0.008772730827331543

key: test_mcc
value: [ 0.39528471  0.5976143   0.5976143   0.47809144  0.15811388  0.35
  0.25819889  0.37796447  0.25819889 -0.25819889]

mean value: 0.3212882006354006

key: train_mcc
value: [0.54521744 0.52542209 0.52542209 0.53924899 0.54085245 0.53924899
 0.52790958 0.54772256 0.58722022 0.60697698]

mean value: 0.5485241391056351

key: test_accuracy
value: [0.66666667 0.77777778 0.77777778 0.66666667 0.55555556 0.66666667
 0.625      0.625      0.625      0.375     ]

mean value: 0.6361111111111111

key: train_accuracy
value: [0.72727273 0.71428571 0.71428571 0.72727273 0.74025974 0.72727273
 0.71794872 0.73076923 0.75641026 0.76923077]

mean value: 0.7325008325008325

key: test_fscore
value: [0.4        0.66666667 0.66666667 0.57142857 0.5        0.66666667
 0.57142857 0.4        0.57142857 0.28571429]

mean value: 0.53

key: train_fscore
value: [0.63157895 0.60714286 0.60714286 0.61818182 0.65517241 0.61818182
 0.60714286 0.63157895 0.6779661  0.7       ]

mean value: 0.6354088618017069

key: test_precision
value: [1.         1.         1.         1.         0.66666667 0.75
 0.66666667 1.         0.66666667 0.33333333]

mean value: 0.8083333333333333

key: train_precision
value: [1.   1.   1.   1.   0.95 1.   1.   1.   1.   1.  ]

mean value: 0.995

key: test_recall
value: [0.25 0.5  0.5  0.4  0.4  0.6  0.5  0.25 0.5  0.25]

mean value: 0.415

key: train_recall
value: [0.46153846 0.43589744 0.43589744 0.44736842 0.5        0.44736842
 0.43589744 0.46153846 0.51282051 0.53846154]

mean value: 0.4676788124156545

key: test_roc_auc
value: [0.625 0.75  0.75  0.7   0.575 0.675 0.625 0.625 0.625 0.375]

mean value: 0.6325

key: train_roc_auc
value: [0.73076923 0.71794872 0.71794872 0.72368421 0.73717949 0.72368421
 0.71794872 0.73076923 0.75641026 0.76923077]

mean value: 0.732557354925776

key: test_jcc
value: [0.25       0.5        0.5        0.4        0.33333333 0.5
 0.4        0.25       0.4        0.16666667]

mean value: 0.37

key: train_jcc
value: [0.46153846 0.43589744 0.43589744 0.44736842 0.48717949 0.44736842
 0.43589744 0.46153846 0.51282051 0.53846154]

mean value: 0.46639676113360323

MCC on Blind test: 0.08

Accuracy on Blind test: 0.73

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00678349 0.00722671 0.00688362 0.00727487 0.00721765 0.00729203
 0.00733399 0.00746918 0.00723958 0.00735378]

mean value: 0.007207489013671875

key: score_time
value: [0.00807953 0.00803757 0.00839972 0.00815082 0.00826144 0.0078733
 0.00901914 0.0084033  0.00845194 0.00837016]

mean value: 0.008304691314697266

key: test_mcc
value: [-0.31622777  0.63245553  0.63245553  0.15811388  0.31622777  0.35
  0.57735027  0.77459667 -0.25819889  0.        ]

mean value: 0.2866772995759719

key: train_mcc
value: [0.53279352 0.50745677 0.5064147  0.45639039 0.53591229 0.42943967
 0.51298918 0.46537892 0.59684919 0.41367015]

mean value: 0.49572947743384127

key: test_accuracy
value: [0.33333333 0.77777778 0.77777778 0.55555556 0.66666667 0.66666667
 0.75       0.875      0.375      0.5       ]

mean value: 0.6277777777777778

key: train_accuracy
value: [0.76623377 0.75324675 0.75324675 0.72727273 0.76623377 0.71428571
 0.75641026 0.73076923 0.79487179 0.70512821]

mean value: 0.7467698967698968

key: test_fscore
value: [0.4        0.8        0.8        0.5        0.72727273 0.66666667
 0.66666667 0.85714286 0.44444444 0.5       ]

mean value: 0.6362193362193362

key: train_fscore
value: [0.775      0.7654321  0.75949367 0.73417722 0.775      0.71794872
 0.75949367 0.74698795 0.80952381 0.72289157]

mean value: 0.7565948701272275

key: test_precision
value: [0.33333333 0.66666667 0.66666667 0.66666667 0.66666667 0.75
 1.         1.         0.4        0.5       ]

mean value: 0.665

key: train_precision
value: [0.75609756 0.73809524 0.75       0.70731707 0.73809524 0.7
 0.75       0.70454545 0.75555556 0.68181818]

mean value: 0.7281524302256009

key: test_recall
value: [0.5  1.   1.   0.4  0.8  0.6  0.5  0.75 0.5  0.5 ]

mean value: 0.655

key: train_recall
value: [0.79487179 0.79487179 0.76923077 0.76315789 0.81578947 0.73684211
 0.76923077 0.79487179 0.87179487 0.76923077]

mean value: 0.7879892037786774

key: test_roc_auc
value: [0.35  0.8   0.8   0.575 0.65  0.675 0.75  0.875 0.375 0.5  ]

mean value: 0.635

key: train_roc_auc
value: [0.76585695 0.75269906 0.75303644 0.72773279 0.7668691  0.7145749
 0.75641026 0.73076923 0.79487179 0.70512821]

mean value: 0.7467948717948718

key: test_jcc
value: [0.25       0.66666667 0.66666667 0.33333333 0.57142857 0.5
 0.5        0.75       0.28571429 0.33333333]

mean value: 0.4857142857142857

key: train_jcc
value: [0.63265306 0.62       0.6122449  0.58       0.63265306 0.56
 0.6122449  0.59615385 0.68       0.56603774]

mean value: 0.609198750037025

MCC on Blind test: 0.08

Accuracy on Blind test: 0.5

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00730515 0.00695133 0.00699258 0.00709629 0.00697374 0.00705719
 0.0069747  0.00722718 0.00697184 0.00727534]

mean value: 0.007082533836364746

key: score_time
value: [0.00949669 0.00922227 0.00921607 0.00929904 0.00929213 0.00919104
 0.00917578 0.00927401 0.00928402 0.00928831]

mean value: 0.009273934364318847

key: test_mcc
value: [-0.15811388  0.15811388  0.8         0.63245553  0.31622777  0.8
  0.5         0.25819889  0.25819889  0.57735027]

mean value: 0.4142431346734462

key: train_mcc
value: [0.58541539 0.66239043 0.61039852 0.61039852 0.55870445 0.61066127
 0.64187021 0.64102564 0.62050523 0.56577895]

mean value: 0.6107148606822335

key: test_accuracy
value: [0.44444444 0.55555556 0.88888889 0.77777778 0.66666667 0.88888889
 0.75       0.625      0.625      0.75      ]

mean value: 0.6972222222222222

key: train_accuracy
value: [0.79220779 0.83116883 0.80519481 0.80519481 0.77922078 0.80519481
 0.82051282 0.82051282 0.80769231 0.78205128]

mean value: 0.804895104895105

key: test_fscore
value: [0.28571429 0.6        0.88888889 0.75       0.72727273 0.88888889
 0.75       0.57142857 0.57142857 0.66666667]

mean value: 0.6700288600288601

key: train_fscore
value: [0.78947368 0.83544304 0.81012658 0.8        0.77922078 0.80519481
 0.825      0.82051282 0.81927711 0.79012346]

mean value: 0.8074372274615954

key: test_precision
value: [0.33333333 0.5        0.8        1.         0.66666667 1.
 0.75       0.66666667 0.66666667 1.        ]

mean value: 0.7383333333333333

key: train_precision
value: [0.81081081 0.825      0.8        0.81081081 0.76923077 0.79487179
 0.80487805 0.82051282 0.77272727 0.76190476]

mean value: 0.7970747089649529

key: test_recall
value: [0.25 0.75 1.   0.6  0.8  0.8  0.75 0.5  0.5  0.5 ]

mean value: 0.645

key: train_recall
value: [0.76923077 0.84615385 0.82051282 0.78947368 0.78947368 0.81578947
 0.84615385 0.82051282 0.87179487 0.82051282]

mean value: 0.8189608636977058

key: test_roc_auc
value: [0.425 0.575 0.9   0.8   0.65  0.9   0.75  0.625 0.625 0.75 ]

mean value: 0.7

key: train_roc_auc
value: [0.79251012 0.83097166 0.80499325 0.80499325 0.77935223 0.80533063
 0.82051282 0.82051282 0.80769231 0.78205128]

mean value: 0.8048920377867747

key: test_jcc
value: [0.16666667 0.42857143 0.8        0.6        0.57142857 0.8
 0.6        0.4        0.4        0.5       ]

mean value: 0.5266666666666666

key: train_jcc
value: [0.65217391 0.7173913  0.68085106 0.66666667 0.63829787 0.67391304
 0.70212766 0.69565217 0.69387755 0.65306122]

mean value: 0.6774012472704161

MCC on Blind test: 0.06

Accuracy on Blind test: 0.65

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00880122 0.00814819 0.00756073 0.00777268 0.00736904 0.00765371
 0.00792193 0.00778627 0.00790763 0.00790238]

mean value: 0.007882380485534668

key: score_time
value: [0.0086937  0.00912237 0.00853276 0.00825572 0.00864816 0.00843048
 0.0086689  0.00874853 0.00819087 0.00858378]

mean value: 0.008587527275085449

key: test_mcc
value: [0.35       0.1        0.8        1.         0.5976143  0.8
 0.77459667 1.         0.         0.77459667]

mean value: 0.6196807643150164

key: train_mcc
value: [0.84516739 0.82485566 0.84852502 0.848923   0.79675455 0.87044534
 0.74456944 0.8720816  0.77563153 0.84726867]

mean value: 0.8274222215949533

key: test_accuracy
value: [0.66666667 0.55555556 0.88888889 1.         0.77777778 0.88888889
 0.875      1.         0.5        0.875     ]

mean value: 0.8027777777777778

key: train_accuracy
value: [0.92207792 0.90909091 0.92207792 0.92207792 0.8961039  0.93506494
 0.87179487 0.93589744 0.88461538 0.92307692]

mean value: 0.9121878121878122

key: test_fscore
value: [0.66666667 0.5        0.88888889 1.         0.83333333 0.88888889
 0.85714286 1.         0.5        0.85714286]

mean value: 0.7992063492063491

key: train_fscore
value: [0.925      0.91566265 0.92682927 0.925      0.9        0.93506494
 0.875      0.93670886 0.89156627 0.925     ]

mean value: 0.9155831979779763

key: test_precision
value: [0.6        0.5        0.8        1.         0.71428571 1.
 1.         1.         0.5        1.        ]

mean value: 0.8114285714285714

key: train_precision
value: [0.90243902 0.86363636 0.88372093 0.88095238 0.85714286 0.92307692
 0.85365854 0.925      0.84090909 0.90243902]

mean value: 0.8832975131316028

key: test_recall
value: [0.75 0.5  1.   1.   1.   0.8  0.75 1.   0.5  0.75]

mean value: 0.805

key: train_recall
value: [0.94871795 0.97435897 0.97435897 0.97368421 0.94736842 0.94736842
 0.8974359  0.94871795 0.94871795 0.94871795]

mean value: 0.950944669365722

key: test_roc_auc
value: [0.675 0.55  0.9   1.    0.75  0.9   0.875 1.    0.5   0.875]

mean value: 0.8025

key: train_roc_auc
value: [0.9217274  0.90823212 0.92139001 0.92273954 0.89676113 0.93522267
 0.87179487 0.93589744 0.88461538 0.92307692]

mean value: 0.9121457489878543

key: test_jcc
value: [0.5        0.33333333 0.8        1.         0.71428571 0.8
 0.75       1.         0.33333333 0.75      ]

mean value: 0.6980952380952381

key: train_jcc
value: [0.86046512 0.84444444 0.86363636 0.86046512 0.81818182 0.87804878
 0.77777778 0.88095238 0.80434783 0.86046512]

mean value: 0.8448784740404756

MCC on Blind test: 0.09

Accuracy on Blind test: 0.52

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.32428074 0.29084134 0.39761615 0.38947153 0.38101339 0.47083735
 0.4072361  0.57479668 0.46769285 0.39441442]

mean value: 0.40982005596160886

key: score_time
value: [0.01101065 0.01088691 0.01111293 0.01090026 0.0111537  0.01554298
 0.0109508  0.01096511 0.01096678 0.01099515]

mean value: 0.011448526382446289

key: test_mcc
value: [0.1        0.35       0.8        0.8        0.31622777 0.8
 0.5        0.77459667 0.25819889 0.77459667]

mean value: 0.5473619994246965

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.66666667 0.88888889 0.88888889 0.66666667 0.88888889
 0.75       0.875      0.625      0.875     ]

mean value: 0.7680555555555555

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.66666667 0.88888889 0.88888889 0.72727273 0.88888889
 0.75       0.88888889 0.57142857 0.85714286]

mean value: 0.7628066378066378

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.6        0.8        1.         0.66666667 1.
 0.75       0.8        0.66666667 1.        ]

mean value: 0.7783333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.75 1.   0.8  0.8  0.8  0.75 1.   0.5  0.75]

mean value: 0.765

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.55  0.675 0.9   0.9   0.65  0.9   0.75  0.875 0.625 0.875]

mean value: 0.77

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.5        0.8        0.8        0.57142857 0.8
 0.6        0.8        0.4        0.75      ]

mean value: 0.6354761904761905

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.00931001 0.00951576 0.00988293 0.00786901 0.00728393 0.01152086
 0.00701046 0.00675249 0.00750613 0.01119971]

mean value: 0.008785128593444824

key: score_time
value: [0.01047301 0.01033044 0.00877452 0.00874352 0.00872278 0.01278138
 0.00794363 0.00788903 0.00790906 0.0122633 ]

mean value: 0.009583067893981934

key: test_mcc
value: [0.63245553 1.         1.         1.         0.63245553 1.
 1.         1.         0.77459667 1.        ]

mean value: 0.9039507733308835

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 1.         1.         1.         0.77777778 1.
 1.         1.         0.875      1.        ]

mean value: 0.9430555555555555

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        1.         1.         1.         0.75       1.
 1.         1.         0.85714286 1.        ]

mean value: 0.9407142857142857

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9666666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   1.   1.   0.6  1.   1.   1.   0.75 1.  ]

mean value: 0.935

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8   1.    1.    1.    0.8   1.    1.    1.    0.875 1.   ]

mean value: 0.9475

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 1.         1.         1.         0.6        1.
 1.         1.         0.75       1.        ]

mean value: 0.9016666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.07991552 0.07531953 0.07922387 0.07759547 0.07554555 0.07641506
 0.08248544 0.07638526 0.07602501 0.07815456]

mean value: 0.07770652770996093

key: score_time
value: [0.01661062 0.01769233 0.01668501 0.01735091 0.01669407 0.01689482
 0.01721072 0.01734948 0.0171845  0.01668453]

mean value: 0.017035698890686034

key: test_mcc
value: [0.55       0.35       0.8        0.8        0.79056942 0.8
 1.         0.77459667 0.25819889 1.        ]

mean value: 0.7123364974030739

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.66666667 0.88888889 0.88888889 0.88888889 0.88888889
 1.         0.875      0.625      1.        ]

mean value: 0.85

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.66666667 0.88888889 0.88888889 0.90909091 0.88888889
 1.         0.88888889 0.57142857 1.        ]

mean value: 0.8452741702741703

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.6        0.8        1.         0.83333333 1.
 1.         0.8        0.66666667 1.        ]

mean value: 0.845

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.75 1.   0.8  1.   0.8  1.   1.   0.5  1.  ]

mean value: 0.86

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 0.675 0.9   0.9   0.875 0.9   1.    0.875 0.625 1.   ]

mean value: 0.8525

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.5        0.8        0.8        0.83333333 0.8
 1.         0.8        0.4        1.        ]

mean value: 0.7533333333333334

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.63

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00761938 0.00663662 0.00656319 0.00661063 0.00666404 0.00655985
 0.006706   0.00655651 0.00683618 0.00663257]

mean value: 0.006738495826721191

key: score_time
value: [0.00826526 0.00768614 0.0077951  0.00779033 0.00775886 0.00782132
 0.00778651 0.00774288 0.00777555 0.00771451]

mean value: 0.007813644409179688

key: test_mcc
value: [ 0.35        0.1        -0.15811388  0.1        -0.1        -0.5976143
  0.25819889  0.          0.          0.25819889]

mean value: 0.021066959181870643

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.55555556 0.44444444 0.55555556 0.44444444 0.22222222
 0.625      0.5        0.5        0.625     ]

mean value: 0.5138888888888888

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.5        0.28571429 0.6        0.44444444 0.
 0.57142857 0.5        0.5        0.57142857]

mean value: 0.463968253968254

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.6        0.5        0.33333333 0.6        0.5        0.
 0.66666667 0.5        0.5        0.66666667]

mean value: 0.48666666666666664

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.25 0.6  0.4  0.   0.5  0.5  0.5  0.5 ]

mean value: 0.45

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.675 0.55  0.425 0.55  0.45  0.25  0.625 0.5   0.5   0.625]

mean value: 0.515

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.33333333 0.16666667 0.42857143 0.28571429 0.
 0.4        0.33333333 0.33333333 0.4       ]

mean value: 0.3180952380952381

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.52

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [0.95179796 0.94567418 0.94950175 0.96822572 0.93672466 0.96356082
 0.97707057 1.03818297 1.03070283 1.00553799]

mean value: 0.9766979455947876

key: score_time
value: [0.09188795 0.09431767 0.08775377 0.08800793 0.09016871 0.08714199
 0.09610558 0.09580159 0.09604168 0.09132028]

mean value: 0.09185471534729003

key: test_mcc
value: [0.8        0.55       0.8        1.         0.55       1.
 1.         0.77459667 0.77459667 1.        ]

mean value: 0.8249193338482967

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 0.77777778 0.88888889 1.         0.77777778 1.
 1.         0.875      0.875      1.        ]

mean value: 0.9083333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.75       0.88888889 1.         0.8        1.
 1.         0.88888889 0.85714286 1.        ]

mean value: 0.9073809523809524

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8  0.75 0.8  1.   0.8  1.   1.   0.8  1.   1.  ]

mean value: 0.895

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 1.   1.   0.8  1.   1.   1.   0.75 1.  ]

mean value: 0.93

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9   0.775 0.9   1.    0.775 1.    1.    0.875 0.875 1.   ]

mean value: 0.91

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.6        0.8        1.         0.66666667 1.
 1.         0.8        0.75       1.        ]

mean value: 0.8416666666666667

key: train_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.75

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.92752123 0.89317346 0.80733633 0.90238857 0.82283401 0.9298923
 0.91012096 0.82151413 0.85942483 0.84982991]

mean value: 0.8724035739898681

key: score_time
value: [0.19612741 0.17702603 0.17312717 0.23580909 0.18489385 0.20000648
 0.20895576 0.13845778 0.27115655 0.17170072]

mean value: 0.19572608470916747

key: test_mcc
value: [0.35       0.55       0.8        1.         0.55       1.
 1.         0.5        0.77459667 1.        ]

mean value: 0.7524596669241483

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.97467943 1.        ]

mean value: 0.9974679434480896

key: test_accuracy
value: [0.66666667 0.77777778 0.88888889 1.         0.77777778 1.
 1.         0.75       0.875      1.        ]

mean value: 0.8736111111111111

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.98717949 1.        ]

mean value: 0.9987179487179487

key: test_fscore
value: [0.66666667 0.75       0.88888889 1.         0.8        1.
 1.         0.75       0.85714286 1.        ]

mean value: 0.8712698412698413

key: train_fscore
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.98701299 1.        ]

mean value: 0.9987012987012986

key: test_precision
value: [0.6  0.75 0.8  1.   0.8  1.   1.   0.75 1.   1.  ]

mean value: 0.87

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.75 1.   1.   0.8  1.   1.   0.75 0.75 1.  ]

mean value: 0.88

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.97435897 1.        ]

mean value: 0.9974358974358974

key: test_roc_auc
value: [0.675 0.775 0.9   1.    0.775 1.    1.    0.75  0.875 1.   ]

mean value: 0.875

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.98717949 1.        ]

mean value: 0.9987179487179487

key: test_jcc
value: [0.5        0.6        0.8        1.         0.66666667 1.
 1.         0.6        0.75       1.        ]

mean value: 0.7916666666666666

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.97435897 1.        ]

mean value: 0.9974358974358974

MCC on Blind test: 0.14

Accuracy on Blind test: 0.74

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01836395 0.00678182 0.00684881 0.00723457 0.00742078 0.00704598
 0.00751448 0.00682497 0.00756073 0.0067997 ]

mean value: 0.00823957920074463

key: score_time
value: [0.01078892 0.00827122 0.00816345 0.00804162 0.00847411 0.00859022
 0.00832844 0.00801682 0.00818849 0.00833344]

mean value: 0.008519673347473144

key: test_mcc
value: [-0.31622777  0.63245553  0.63245553  0.15811388  0.31622777  0.35
  0.57735027  0.77459667 -0.25819889  0.        ]

mean value: 0.2866772995759719

key: train_mcc
value: [0.53279352 0.50745677 0.5064147  0.45639039 0.53591229 0.42943967
 0.51298918 0.46537892 0.59684919 0.41367015]

mean value: 0.49572947743384127

key: test_accuracy
value: [0.33333333 0.77777778 0.77777778 0.55555556 0.66666667 0.66666667
 0.75       0.875      0.375      0.5       ]

mean value: 0.6277777777777778

key: train_accuracy
value: [0.76623377 0.75324675 0.75324675 0.72727273 0.76623377 0.71428571
 0.75641026 0.73076923 0.79487179 0.70512821]

mean value: 0.7467698967698968

key: test_fscore
value: [0.4        0.8        0.8        0.5        0.72727273 0.66666667
 0.66666667 0.85714286 0.44444444 0.5       ]

mean value: 0.6362193362193362

key: train_fscore
value: [0.775      0.7654321  0.75949367 0.73417722 0.775      0.71794872
 0.75949367 0.74698795 0.80952381 0.72289157]

mean value: 0.7565948701272275

key: test_precision
value: [0.33333333 0.66666667 0.66666667 0.66666667 0.66666667 0.75
 1.         1.         0.4        0.5       ]

mean value: 0.665

key: train_precision
value: [0.75609756 0.73809524 0.75       0.70731707 0.73809524 0.7
 0.75       0.70454545 0.75555556 0.68181818]

mean value: 0.7281524302256009

key: test_recall
value: [0.5  1.   1.   0.4  0.8  0.6  0.5  0.75 0.5  0.5 ]

mean value: 0.655

key: train_recall
value: [0.79487179 0.79487179 0.76923077 0.76315789 0.81578947 0.73684211
 0.76923077 0.79487179 0.87179487 0.76923077]

mean value: 0.7879892037786774

key: test_roc_auc
value: [0.35  0.8   0.8   0.575 0.65  0.675 0.75  0.875 0.375 0.5  ]

mean value: 0.635

key: train_roc_auc
value: [0.76585695 0.75269906 0.75303644 0.72773279 0.7668691  0.7145749
 0.75641026 0.73076923 0.79487179 0.70512821]

mean value: 0.7467948717948718

key: test_jcc
value: [0.25       0.66666667 0.66666667 0.33333333 0.57142857 0.5
 0.5        0.75       0.28571429 0.33333333]

mean value: 0.4857142857142857

key: train_jcc
value: [0.63265306 0.62       0.6122449  0.58       0.63265306 0.56
 0.6122449  0.59615385 0.68       0.56603774]

mean value: 0.609198750037025

MCC on Blind test: 0.08

Accuracy on Blind test: 0.5

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.0487535  0.02997184 0.05316138 0.02903056 0.02904987 0.0322907
 0.18433547 0.02987671 0.02673578 0.02919006]

mean value: 0.049239587783813474

key: score_time
value: [0.01463509 0.00997066 0.00984716 0.00954914 0.00983858 0.01014209
 0.01031256 0.01060939 0.01125598 0.00966144]

mean value: 0.010582208633422852

key: test_mcc
value: [0.8        1.         1.         1.         0.8        1.
 1.         0.77459667 0.57735027 1.        ]

mean value: 0.8951946938431109

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 1.         1.         1.         0.88888889 1.
 1.         0.875      0.75       1.        ]

mean value: 0.9402777777777778

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         1.         1.         0.88888889 1.
 1.         0.88888889 0.66666667 1.        ]

mean value: 0.9333333333333333

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 1.  1.  1.  1.  1.  1.  0.8 1.  1. ]

mean value: 0.96

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  0.8 1.  1.  1.  0.5 1. ]

mean value: 0.93

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9   1.    1.    1.    0.9   1.    1.    0.875 0.75  1.   ]

mean value: 0.9425

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8 1.  1.  1.  0.8 1.  1.  0.8 0.5 1. ]

mean value: 0.89

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.77

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.00983548 0.01052451 0.0106616  0.01091266 0.010957   0.01091719
 0.01095319 0.01184487 0.01091933 0.01090574]

mean value: 0.010843157768249512

key: score_time
value: [0.0105567  0.01009798 0.01041722 0.01042056 0.01047301 0.01049972
 0.01047277 0.01058149 0.01053119 0.01043415]

mean value: 0.010448479652404785

key: test_mcc
value: [0.8        0.8        0.8        0.79056942 1.         0.55
 1.         0.77459667 0.25819889 0.57735027]

mean value: 0.7350715243220365

key: train_mcc
value: [1.         1.         0.97434188 0.97435897 1.         0.97435897
 0.94996791 1.         1.         1.        ]

mean value: 0.987302773890115

key: test_accuracy
value: [0.88888889 0.88888889 0.88888889 0.88888889 1.         0.77777778
 1.         0.875      0.625      0.75      ]

mean value: 0.8583333333333333

key: train_accuracy
value: [1.         1.         0.98701299 0.98701299 1.         0.98701299
 0.97435897 1.         1.         1.        ]

mean value: 0.9935397935397935

key: test_fscore
value: [0.88888889 0.88888889 0.88888889 0.90909091 1.         0.8
 1.         0.88888889 0.57142857 0.66666667]

mean value: 0.8502741702741703

key: train_fscore
value: [1.         1.         0.98734177 0.98701299 1.         0.98701299
 0.975      1.         1.         1.        ]

mean value: 0.9936367746177872

key: test_precision
value: [0.8        0.8        0.8        0.83333333 1.         0.8
 1.         0.8        0.66666667 1.        ]

mean value: 0.85

key: train_precision
value: [1.         1.         0.975      0.97435897 1.         0.97435897
 0.95121951 1.         1.         1.        ]

mean value: 0.987493746091307

key: test_recall
value: [1.  1.  1.  1.  1.  0.8 1.  1.  0.5 0.5]

mean value: 0.88

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9   0.9   0.9   0.875 1.    0.775 1.    0.875 0.625 0.75 ]

mean value: 0.86

key: train_roc_auc
value: [1.         1.         0.98684211 0.98717949 1.         0.98717949
 0.97435897 1.         1.         1.        ]

mean value: 0.9935560053981106

key: test_jcc
value: [0.8        0.8        0.8        0.83333333 1.         0.66666667
 1.         0.8        0.4        0.5       ]

mean value: 0.76

key: train_jcc
value: [1.         1.         0.975      0.97435897 1.         0.97435897
 0.95121951 1.         1.         1.        ]

mean value: 0.987493746091307

MCC on Blind test: 0.06

Accuracy on Blind test: 0.67

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00919414 0.00788784 0.00760937 0.00748897 0.00734115 0.00743103
 0.00754428 0.00746417 0.00744605 0.00730228]

mean value: 0.007670927047729492

key: score_time
value: [0.01072121 0.0088973  0.00895858 0.00848174 0.00847363 0.00847983
 0.00860381 0.00846124 0.00856113 0.00795674]

mean value: 0.008759522438049316

key: test_mcc
value: [0.55       0.1        0.8        0.8        0.31622777 0.55
 0.57735027 0.57735027 0.         0.57735027]

mean value: 0.4848278573585716

key: train_mcc
value: [0.61257733 0.66463964 0.6374073  0.55962522 0.63928106 0.58485583
 0.64102564 0.56577895 0.66688593 0.56428809]

mean value: 0.6136364988556005

key: test_accuracy
value: [0.77777778 0.55555556 0.88888889 0.88888889 0.66666667 0.77777778
 0.75       0.75       0.5        0.75      ]

mean value: 0.7305555555555555

key: train_accuracy
value: [0.80519481 0.83116883 0.81818182 0.77922078 0.81818182 0.79220779
 0.82051282 0.78205128 0.83333333 0.78205128]

mean value: 0.8062104562104563

key: test_fscore
value: [0.75       0.5        0.88888889 0.88888889 0.72727273 0.8
 0.66666667 0.66666667 0.5        0.66666667]

mean value: 0.7055050505050505

key: train_fscore
value: [0.8        0.82666667 0.81578947 0.76712329 0.80555556 0.78378378
 0.82051282 0.77333333 0.83544304 0.78481013]

mean value: 0.8013018085764565

key: test_precision
value: [0.75       0.5        0.8        1.         0.66666667 0.8
 1.         1.         0.5        1.        ]

mean value: 0.8016666666666666

key: train_precision
value: [0.83333333 0.86111111 0.83783784 0.8        0.85294118 0.80555556
 0.82051282 0.80555556 0.825      0.775     ]

mean value: 0.8216847390376802

key: test_recall
value: [0.75 0.5  1.   0.8  0.8  0.8  0.5  0.5  0.5  0.5 ]

mean value: 0.665

key: train_recall
value: [0.76923077 0.79487179 0.79487179 0.73684211 0.76315789 0.76315789
 0.82051282 0.74358974 0.84615385 0.79487179]

mean value: 0.7827260458839406

key: test_roc_auc
value: [0.775 0.55  0.9   0.9   0.65  0.775 0.75  0.75  0.5   0.75 ]

mean value: 0.73

key: train_roc_auc
value: [0.80566802 0.83164642 0.81848853 0.77867746 0.81747638 0.79183536
 0.82051282 0.78205128 0.83333333 0.78205128]

mean value: 0.8061740890688258

key: test_jcc
value: [0.6        0.33333333 0.8        0.8        0.57142857 0.66666667
 0.5        0.5        0.33333333 0.5       ]

mean value: 0.5604761904761905

key: train_jcc
value: [0.66666667 0.70454545 0.68888889 0.62222222 0.6744186  0.64444444
 0.69565217 0.63043478 0.7173913  0.64583333]

mean value: 0.6690497875621738

MCC on Blind test: 0.09

Accuracy on Blind test: 0.56

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00780702 0.00762272 0.00795984 0.00789499 0.00773883 0.00785589
 0.00779939 0.00809336 0.00800991 0.00795817]

mean value: 0.007874011993408203

key: score_time
value: [0.00883174 0.00864387 0.00875735 0.00865006 0.00847197 0.00864172
 0.00876093 0.00859165 0.00871468 0.00850964]

mean value: 0.008657360076904297

key: test_mcc
value: [0.31622777 0.15811388 0.8        0.79056942 1.         1.
 0.77459667 0.5        0.5        1.        ]

mean value: 0.6839507733308835

key: train_mcc
value: [1.         0.92480439 0.94935876 1.         0.94804318 0.94935876
 0.87904907 0.85634884 0.97467943 0.90219371]

mean value: 0.93838361447064

key: test_accuracy
value: [0.66666667 0.55555556 0.88888889 0.88888889 1.         1.
 0.875      0.75       0.75       1.        ]

mean value: 0.8375

key: train_accuracy
value: [1.         0.96103896 0.97402597 1.         0.97402597 0.97402597
 0.93589744 0.92307692 0.98717949 0.94871795]

mean value: 0.9677988677988678

key: test_fscore
value: [0.57142857 0.6        0.88888889 0.90909091 1.         1.
 0.88888889 0.75       0.75       1.        ]

mean value: 0.8358297258297258

key: train_fscore
value: [1.         0.96296296 0.97368421 1.         0.97368421 0.97435897
 0.93975904 0.92857143 0.98734177 0.95121951]

mean value: 0.9691582107437596

key: test_precision
value: [0.66666667 0.5        0.8        0.83333333 1.         1.
 0.8        0.75       0.75       1.        ]

mean value: 0.81

key: train_precision
value: [1.         0.92857143 1.         1.         0.97368421 0.95
 0.88636364 0.86666667 0.975      0.90697674]

mean value: 0.9487262686314094

key: test_recall
value: [0.5  0.75 1.   1.   1.   1.   1.   0.75 0.75 1.  ]

mean value: 0.875

key: train_recall
value: [1.         1.         0.94871795 1.         0.97368421 1.
 1.         1.         1.         1.        ]

mean value: 0.9922402159244265

key: test_roc_auc
value: [0.65  0.575 0.9   0.875 1.    1.    0.875 0.75  0.75  1.   ]

mean value: 0.8375

key: train_roc_auc
value: [1.         0.96052632 0.97435897 1.         0.97402159 0.97435897
 0.93589744 0.92307692 0.98717949 0.94871795]

mean value: 0.9678137651821862

key: test_jcc
value: [0.4        0.42857143 0.8        0.83333333 1.         1.
 0.8        0.6        0.6        1.        ]

mean value: 0.7461904761904762

key: train_jcc
value: [1.         0.92857143 0.94871795 1.         0.94871795 0.95
 0.88636364 0.86666667 0.975      0.90697674]

mean value: 0.9411014373223675

MCC on Blind test: 0.09

Accuracy on Blind test: 0.53

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00939584 0.00922012 0.00787425 0.00773025 0.00760698 0.00704384
 0.00760889 0.00750351 0.00746274 0.00742221]

mean value: 0.007886862754821778

key: score_time
value: [0.01045752 0.01001    0.00872231 0.0086472  0.00853968 0.00859547
 0.0085783  0.00867987 0.00851488 0.008075  ]

mean value: 0.0088820219039917

key: test_mcc
value: [0.5976143  0.55       0.8        0.79056942 0.63245553 1.
 0.77459667 0.5        0.         1.        ]

mean value: 0.6645235920984451

key: train_mcc
value: [0.90109146 1.         0.97435897 0.90109146 0.70243936 0.75611265
 0.94996791 1.         0.46770717 0.9258201 ]

mean value: 0.8578589074717703

key: test_accuracy
value: [0.77777778 0.77777778 0.88888889 0.88888889 0.77777778 1.
 0.875      0.75       0.5        1.        ]

mean value: 0.8236111111111111

key: train_accuracy
value: [0.94805195 1.         0.98701299 0.94805195 0.83116883 0.87012987
 0.97435897 1.         0.67948718 0.96153846]

mean value: 0.91998001998002

key: test_fscore
value: [0.66666667 0.75       0.88888889 0.90909091 0.75       1.
 0.88888889 0.75       0.6        1.        ]

mean value: 0.8203535353535354

key: train_fscore
value: [0.94594595 1.         0.98701299 0.95       0.79365079 0.85294118
 0.975      1.         0.75728155 0.96296296]

mean value: 0.9224795419441336

key: test_precision
value: [1.         0.75       0.8        0.83333333 1.         1.
 0.8        0.75       0.5        1.        ]

mean value: 0.8433333333333334

key: train_precision
value: [1.         1.         1.         0.9047619  1.         0.96666667
 0.95121951 1.         0.609375   0.92857143]

mean value: 0.9360594512195122

key: test_recall
value: [0.5  0.75 1.   1.   0.6  1.   1.   0.75 0.75 1.  ]

mean value: 0.835

key: train_recall
value: [0.8974359  1.         0.97435897 1.         0.65789474 0.76315789
 1.         1.         1.         1.        ]

mean value: 0.9292847503373819

key: test_roc_auc
value: [0.75  0.775 0.9   0.875 0.8   1.    0.875 0.75  0.5   1.   ]

mean value: 0.8225

key: train_roc_auc
value: [0.94871795 1.         0.98717949 0.94871795 0.82894737 0.86875843
 0.97435897 1.         0.67948718 0.96153846]

mean value: 0.9197705802968961

key: test_jcc
value: [0.5        0.6        0.8        0.83333333 0.6        1.
 0.8        0.6        0.42857143 1.        ]

mean value: 0.7161904761904762

key: train_jcc
value: [0.8974359  1.         0.97435897 0.9047619  0.65789474 0.74358974
 0.95121951 1.         0.609375   0.92857143]

mean value: 0.8667207197755176

MCC on Blind test: 0.11

Accuracy on Blind test: 0.65

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.07140326 0.05798793 0.05990863 0.05765224 0.0598948  0.06153941
 0.05800366 0.05805969 0.05758524 0.05807018]

mean value: 0.06001050472259521

key: score_time
value: [0.0154562  0.01460052 0.01546645 0.01415467 0.01506758 0.01572824
 0.01406693 0.01429629 0.01435161 0.01549411]

mean value: 0.01486825942993164

key: test_mcc
value: [0.8        1.         1.         1.         0.63245553 1.
 1.         1.         0.77459667 1.        ]

mean value: 0.9207052201275159

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 1.         1.         1.         0.77777778 1.
 1.         1.         0.875      1.        ]

mean value: 0.9541666666666666

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         1.         1.         0.75       1.
 1.         1.         0.88888889 1.        ]

mean value: 0.9527777777777777

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 1.  1.  1.  1.  1.  1.  1.  0.8 1. ]

mean value: 0.96

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  0.6 1.  1.  1.  1.  1. ]

mean value: 0.96

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9   1.    1.    1.    0.8   1.    1.    1.    0.875 1.   ]

mean value: 0.9575

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8 1.  1.  1.  0.6 1.  1.  1.  0.8 1. ]

mean value: 0.92

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.81

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.02970028 0.02384472 0.02287769 0.03229713 0.02531958 0.02179098
 0.02337933 0.03095937 0.02274895 0.02188182]

mean value: 0.025479984283447266

key: score_time
value: [0.0158186  0.0168395  0.01607776 0.02253652 0.02111721 0.01882815
 0.02102876 0.02306414 0.01536655 0.0155549 ]

mean value: 0.018623208999633788

key: test_mcc
value: [0.8        1.         1.         1.         0.63245553 1.
 1.         0.77459667 0.57735027 1.        ]

mean value: 0.8784402470464785

key: train_mcc
value: [1.         1.         0.97435897 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9974358974358974

key: test_accuracy
value: [0.88888889 1.         1.         1.         0.77777778 1.
 1.         0.875      0.75       1.        ]

mean value: 0.9291666666666667

key: train_accuracy
value: [1.         1.         0.98701299 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9987012987012986

key: test_fscore
value: [0.88888889 1.         1.         1.         0.75       1.
 1.         0.88888889 0.66666667 1.        ]

mean value: 0.9194444444444444

key: train_fscore
value: [1.         1.         0.98701299 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9987012987012986

key: test_precision
value: [0.8 1.  1.  1.  1.  1.  1.  0.8 1.  1. ]

mean value: 0.96

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  0.6 1.  1.  1.  0.5 1. ]

mean value: 0.91

key: train_recall
value: [1.         1.         0.97435897 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9974358974358974

key: test_roc_auc
value: [0.9   1.    1.    1.    0.8   1.    1.    0.875 0.75  1.   ]

mean value: 0.9325

key: train_roc_auc
value: [1.         1.         0.98717949 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9987179487179487

key: test_jcc
value: [0.8 1.  1.  1.  0.6 1.  1.  0.8 0.5 1. ]

mean value: 0.87

key: train_jcc
value: [1.         1.         0.97435897 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9974358974358974

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.01199746 0.01234484 0.01242924 0.01289916 0.01300168 0.01250005
 0.01238441 0.01239324 0.01242089 0.01329017]

mean value: 0.012566113471984863

key: score_time
value: [0.0101943  0.01012278 0.01048732 0.01058817 0.0106461  0.01056409
 0.01057553 0.01059437 0.01054907 0.01066661]

mean value: 0.010498833656311036

key: test_mcc
value: [0.55       0.35       0.8        0.79056942 0.79056942 0.8
 0.77459667 0.5        0.25819889 0.77459667]

mean value: 0.6388531058314317

key: train_mcc
value: [1.         1.         0.97434188 1.         1.         0.97435897
 0.97467943 1.         0.97467943 1.        ]

mean value: 0.9898059726472239

key: test_accuracy
value: [0.77777778 0.66666667 0.88888889 0.88888889 0.88888889 0.88888889
 0.875      0.75       0.625      0.875     ]

mean value: 0.8125

key: train_accuracy
value: [1.         1.         0.98701299 1.         1.         0.98701299
 0.98717949 1.         0.98717949 1.        ]

mean value: 0.9948384948384948

key: test_fscore
value: [0.75       0.66666667 0.88888889 0.90909091 0.90909091 0.88888889
 0.85714286 0.75       0.57142857 0.85714286]

mean value: 0.8048340548340548

key: train_fscore
value: [1.         1.         0.98734177 1.         1.         0.98701299
 0.98734177 1.         0.98701299 1.        ]

mean value: 0.9948709518329771

key: test_precision
value: [0.75       0.6        0.8        0.83333333 0.83333333 1.
 1.         0.75       0.66666667 1.        ]

mean value: 0.8233333333333334

key: train_precision
value: [1.         1.         0.975      1.         1.         0.97435897
 0.975      1.         1.         1.        ]

mean value: 0.9924358974358974

key: test_recall
value: [0.75 0.75 1.   1.   1.   0.8  0.75 0.75 0.5  0.75]

mean value: 0.805

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.97435897 1.        ]

mean value: 0.9974358974358974

key: test_roc_auc
value: [0.775 0.675 0.9   0.875 0.875 0.9   0.875 0.75  0.625 0.875]

mean value: 0.8125

key: train_roc_auc
value: [1.         1.         0.98684211 1.         1.         0.98717949
 0.98717949 1.         0.98717949 1.        ]

mean value: 0.9948380566801619

key: test_jcc
value: [0.6        0.5        0.8        0.83333333 0.83333333 0.8
 0.75       0.6        0.4        0.75      ]

mean value: 0.6866666666666666

key: train_jcc
value: [1.         1.         0.975      1.         1.         0.97435897
 0.975      1.         0.97435897 1.        ]

mean value: 0.9898717948717949

MCC on Blind test: 0.1

Accuracy on Blind test: 0.59

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.0684216  0.06198835 0.06150103 0.06125021 0.05035377 0.06086302
 0.06419754 0.05684948 0.05581045 0.06373596]

mean value: 0.06049714088439941

key: score_time
value: [0.00865507 0.00866818 0.00822926 0.008461   0.00909662 0.00912499
 0.00889039 0.00892878 0.0091598  0.00874305]

mean value: 0.008795714378356934

key: test_mcc
value: [0.63245553 1.         1.         1.         0.63245553 1.
 1.         1.         0.77459667 1.        ]

mean value: 0.9039507733308835

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 1.         1.         1.         0.77777778 1.
 1.         1.         0.875      1.        ]

mean value: 0.9430555555555555

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        1.         1.         1.         0.75       1.
 1.         1.         0.85714286 1.        ]

mean value: 0.9407142857142857

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9666666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   1.   1.   0.6  1.   1.   1.   0.75 1.  ]

mean value: 0.935

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8   1.    1.    1.    0.8   1.    1.    1.    0.875 1.   ]

mean value: 0.9475

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 1.         1.         1.         0.6        1.
 1.         1.         0.75       1.        ]

mean value: 0.9016666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.76

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00811529 0.00809407 0.01050186 0.00907612 0.00753951 0.0079627
 0.00725293 0.00733447 0.00738692 0.00783324]

mean value: 0.008109712600708007

key: score_time
value: [0.01106501 0.01026535 0.00954747 0.00802541 0.0085423  0.00829649
 0.00797725 0.00805521 0.00838113 0.00803828]

mean value: 0.008819389343261718

key: test_mcc
value: [ 0.05976143 -0.31622777  0.31622777  0.          0.47809144 -0.05976143
  0.25819889  0.57735027  0.          0.        ]

mean value: 0.13136406026705444

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.44444444 0.66666667 0.44444444 0.66666667 0.44444444
 0.625      0.75       0.5        0.5       ]

mean value: 0.5597222222222222

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.33333333 0.         0.57142857 0.         0.57142857 0.28571429
 0.57142857 0.66666667 0.33333333 0.        ]

mean value: 0.33333333333333337

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.         0.66666667 0.         1.         0.5
 0.66666667 1.         0.5        0.        ]

mean value: 0.48333333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.25 0.   0.5  0.   0.4  0.2  0.5  0.5  0.25 0.  ]

mean value: 0.26

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.525 0.4   0.65  0.5   0.7   0.475 0.625 0.75  0.5   0.5  ]

mean value: 0.5625

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.2        0.         0.4        0.         0.4        0.16666667
 0.4        0.5        0.2        0.        ]

mean value: 0.22666666666666668

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.03

Accuracy on Blind test: 0.51

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01002216 0.0099225  0.00757122 0.00743675 0.00746179 0.00744224
 0.00728893 0.00753307 0.00745034 0.00752497]

mean value: 0.007965397834777833

key: score_time
value: [0.01054311 0.00975561 0.008003   0.00801349 0.00793147 0.0078249
 0.00790191 0.00796008 0.00788474 0.00783968]

mean value: 0.008365797996520995

key: test_mcc
value: [0.55       0.35       0.8        1.         1.         1.
 0.77459667 0.5        0.5        1.        ]

mean value: 0.7474596669241483

key: train_mcc
value: [0.97435897 0.94804318 0.89608637 0.92240216 0.94804318 0.92240216
 0.89861829 0.94871795 1.         0.97467943]

mean value: 0.9433351706022929

key: test_accuracy
value: [0.77777778 0.66666667 0.88888889 1.         1.         1.
 0.875      0.75       0.75       1.        ]

mean value: 0.8708333333333333

key: train_accuracy
value: [0.98701299 0.97402597 0.94805195 0.96103896 0.97402597 0.96103896
 0.94871795 0.97435897 1.         0.98717949]

mean value: 0.9715451215451215

key: test_fscore
value: [0.75       0.66666667 0.88888889 1.         1.         1.
 0.88888889 0.75       0.75       1.        ]

mean value: 0.8694444444444445

key: train_fscore
value: [0.98701299 0.97435897 0.94871795 0.96103896 0.97368421 0.96103896
 0.95       0.97435897 1.         0.98701299]

mean value: 0.971722400406611

key: test_precision
value: [0.75 0.6  0.8  1.   1.   1.   0.8  0.75 0.75 1.  ]

mean value: 0.845

key: train_precision
value: [1.         0.97435897 0.94871795 0.94871795 0.97368421 0.94871795
 0.92682927 0.97435897 1.         1.        ]

mean value: 0.9695385273690793

key: test_recall
value: [0.75 0.75 1.   1.   1.   1.   1.   0.75 0.75 1.  ]

mean value: 0.9

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.97435897 0.97435897 0.94871795 0.97368421 0.97368421 0.97368421
 0.97435897 0.97435897 1.         0.97435897]

mean value: 0.9741565452091768

key: test_roc_auc
value: [0.775 0.675 0.9   1.    1.    1.    0.875 0.75  0.75  1.   ]

mean value: 0.8725

key: train_roc_auc
value: [0.98717949 0.97402159 0.94804318 0.96120108 0.97402159 0.96120108
 0.94871795 0.97435897 1.         0.98717949]

mean value: 0.9715924426450743

key: test_jcc
value: [0.6 0.5 0.8 1.  1.  1.  0.8 0.6 0.6 1. ]

mean value: 0.79

key: train_jcc
value: [0.97435897 0.95       0.90243902 0.925      0.94871795 0.925
 0.9047619  0.95       1.         0.97435897]

mean value: 0.9454636826588046

MCC on Blind test: 0.06

Accuracy on Blind test: 0.67

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.07551575 0.0629611  0.06268406 0.06178474 0.06372309 0.06233025
 0.06263781 0.06301665 0.06373763 0.06232142]

mean value: 0.06407124996185302

key: score_time
value: [0.00872803 0.00875998 0.00883865 0.0087626  0.00883889 0.00868249
 0.00857091 0.00899076 0.00884104 0.00882053]

mean value: 0.008783388137817382

key: test_mcc
value: [0.8        0.35       0.8        1.         1.         0.8
 1.         0.77459667 0.5        0.77459667]

mean value: 0.7799193338482967

key: train_mcc
value: [0.94804318 0.94804318 0.94804318 0.92240216 0.94804318 0.94804318
 0.94871795 1.         1.         0.94871795]

mean value: 0.9560053981106613

key: test_accuracy
value: [0.88888889 0.66666667 0.88888889 1.         1.         0.88888889
 1.         0.875      0.75       0.875     ]

mean value: 0.8833333333333333

key: train_accuracy
value: [0.97402597 0.97402597 0.97402597 0.96103896 0.97402597 0.97402597
 0.97435897 1.         1.         0.97435897]

mean value: 0.977988677988678

key: test_fscore
value: [0.88888889 0.66666667 0.88888889 1.         1.         0.88888889
 1.         0.88888889 0.75       0.85714286]

mean value: 0.8829365079365079

key: train_fscore
value: [0.97435897 0.97435897 0.97435897 0.96103896 0.97368421 0.97368421
 0.97435897 1.         1.         0.97435897]

mean value: 0.9780202253886464

key: test_precision
value: [0.8  0.6  0.8  1.   1.   1.   1.   0.8  0.75 1.  ]

mean value: 0.875

key: train_precision
value: [0.97435897 0.97435897 0.97435897 0.94871795 0.97368421 0.97368421
 0.97435897 1.         1.         0.97435897]

mean value: 0.9767881241565453

key: test_recall
value: [1.   0.75 1.   1.   1.   0.8  1.   1.   0.75 0.75]

mean value: 0.905

key: train_recall
value: [0.97435897 0.97435897 0.97435897 0.97368421 0.97368421 0.97368421
 0.97435897 1.         1.         0.97435897]

mean value: 0.9792847503373819

key: test_roc_auc
value: [0.9   0.675 0.9   1.    1.    0.9   1.    0.875 0.75  0.875]

mean value: 0.8875

key: train_roc_auc
value: [0.97402159 0.97402159 0.97402159 0.96120108 0.97402159 0.97402159
 0.97435897 1.         1.         0.97435897]

mean value: 0.9780026990553307

key: test_jcc
value: [0.8  0.5  0.8  1.   1.   0.8  1.   0.8  0.6  0.75]

mean value: 0.805

key: train_jcc
value: [0.95       0.95       0.95       0.925      0.94871795 0.94871795
 0.95       1.         1.         0.95      ]

mean value: 0.9572435897435897

MCC on Blind test: 0.06

Accuracy on Blind test: 0.68

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.0179987  0.01539111 0.01446462 0.01311445 0.01305366 0.01297665
 0.01468205 0.01300359 0.01401901 0.01403546]

mean value: 0.014273929595947265

key: score_time
value: [0.01068687 0.00844073 0.00901723 0.00842404 0.00852823 0.00848007
 0.00915885 0.00892854 0.00850725 0.00879216]

mean value: 0.008896398544311523

key: test_mcc
value: [0.51639778 0.62994079 0.73214286 0.49099025 0.87287156 0.87287156
 0.46428571 0.32732684 0.64465837 0.875     ]

mean value: 0.6426485720764821

key: train_mcc
value: [0.808911   0.79446135 0.78111679 0.82629176 0.83951407 0.76668815
 0.81031543 0.8251228  0.81092683 0.81027501]

mean value: 0.8073623185403057

key: test_accuracy
value: [0.75       0.8125     0.86666667 0.73333333 0.93333333 0.93333333
 0.73333333 0.66666667 0.8        0.93333333]

mean value: 0.81625

key: train_accuracy
value: [0.90441176 0.89705882 0.89051095 0.91240876 0.91970803 0.88321168
 0.90510949 0.91240876 0.90510949 0.90510949]

mean value: 0.903504723057106

key: test_fscore
value: [0.77777778 0.8        0.85714286 0.75       0.92307692 0.92307692
 0.75       0.70588235 0.84210526 0.93333333]

mean value: 0.8262395430506886

key: train_fscore
value: [0.9037037  0.89552239 0.89051095 0.91044776 0.91970803 0.88571429
 0.90510949 0.91044776 0.90225564 0.9037037 ]

mean value: 0.9027123709820484

key: test_precision
value: [0.7        0.85714286 0.85714286 0.66666667 1.         1.
 0.75       0.66666667 0.72727273 1.        ]

mean value: 0.8224891774891775

key: train_precision
value: [0.91044776 0.90909091 0.89705882 0.93846154 0.92647059 0.87323944
 0.89855072 0.92424242 0.92307692 0.91044776]

mean value: 0.911108689028196

key: test_recall
value: [0.875      0.75       0.85714286 0.85714286 0.85714286 0.85714286
 0.75       0.75       1.         0.875     ]

mean value: 0.8428571428571429

key: train_recall
value: [0.89705882 0.88235294 0.88405797 0.88405797 0.91304348 0.89855072
 0.91176471 0.89705882 0.88235294 0.89705882]

mean value: 0.8947357203751065

key: test_roc_auc
value: [0.75       0.8125     0.86607143 0.74107143 0.92857143 0.92857143
 0.73214286 0.66071429 0.78571429 0.9375    ]

mean value: 0.8142857142857143

key: train_roc_auc
value: [0.90441176 0.89705882 0.8905584  0.91261722 0.91975703 0.88309889
 0.90515772 0.91229753 0.90494459 0.90505115]

mean value: 0.9034953111679455

key: test_jcc
value: [0.63636364 0.66666667 0.75       0.6        0.85714286 0.85714286
 0.6        0.54545455 0.72727273 0.875     ]

mean value: 0.711504329004329

key: train_jcc
value: [0.82432432 0.81081081 0.80263158 0.83561644 0.85135135 0.79487179
 0.82666667 0.83561644 0.82191781 0.82432432]

mean value: 0.8228131536228147

MCC on Blind test: 0.12

Accuracy on Blind test: 0.66

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.36799169 0.37216139 0.3838408  0.37895703 0.3886342  0.3847723
 0.39137197 0.39352298 0.38147902 0.38831353]

mean value: 0.3831044912338257

key: score_time
value: [0.00858855 0.00922418 0.00917625 0.00932384 0.00937819 0.00943565
 0.00947142 0.00946307 0.00953293 0.00940537]

mean value: 0.009299945831298829

key: test_mcc
value: [0.62994079 0.8819171  0.875      0.49099025 1.         0.73214286
 0.6000992  0.87287156 0.64465837 0.875     ]

mean value: 0.7602620132524002

key: train_mcc
value: [0.85294118 1.         1.         0.88360693 1.         1.
 1.         1.         0.88355744 1.        ]

mean value: 0.9620105545903546

key: test_accuracy
value: [0.8125     0.9375     0.93333333 0.73333333 1.         0.86666667
 0.8        0.93333333 0.8        0.93333333]

mean value: 0.875

key: train_accuracy
value: [0.92647059 1.         1.         0.94160584 1.         1.
 1.         1.         0.94160584 1.        ]

mean value: 0.9809682267067411

key: test_fscore
value: [0.82352941 0.94117647 0.93333333 0.75       1.         0.85714286
 0.82352941 0.94117647 0.84210526 0.93333333]

mean value: 0.8845326551673302

key: train_fscore
value: [0.92647059 1.         1.         0.94117647 1.         1.
 1.         1.         0.94029851 1.        ]

mean value: 0.9807945566286216

key: test_precision
value: [0.77777778 0.88888889 0.875      0.66666667 1.         0.85714286
 0.77777778 0.88888889 0.72727273 1.        ]

mean value: 0.8459415584415584

key: train_precision
value: [0.92647059 1.         1.         0.95522388 1.         1.
 1.         1.         0.95454545 1.        ]

mean value: 0.9836239923377763

key: test_recall
value: [0.875      1.         1.         0.85714286 1.         0.85714286
 0.875      1.         1.         0.875     ]

mean value: 0.9339285714285714

key: train_recall
value: [0.92647059 1.         1.         0.92753623 1.         1.
 1.         1.         0.92647059 1.        ]

mean value: 0.9780477408354646

key: test_roc_auc
value: [0.8125     0.9375     0.9375     0.74107143 1.         0.86607143
 0.79464286 0.92857143 0.78571429 0.9375    ]

mean value: 0.8741071428571429

key: train_roc_auc
value: [0.92647059 1.         1.         0.94170929 1.         1.
 1.         1.         0.94149616 1.        ]

mean value: 0.9809676044330776

key: test_jcc
value: [0.7        0.88888889 0.875      0.6        1.         0.75
 0.7        0.88888889 0.72727273 0.875     ]

mean value: 0.8005050505050505

key: train_jcc
value: [0.8630137  1.         1.         0.88888889 1.         1.
 1.         1.         0.88732394 1.        ]

mean value: 0.9639226531180998

MCC on Blind test: 0.0

Accuracy on Blind test: 0.68

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00949454 0.00907326 0.00779319 0.00751305 0.00748301 0.0074904
 0.00753403 0.00772738 0.00752568 0.00749397]

mean value: 0.007912850379943848

key: score_time
value: [0.01055598 0.01037431 0.00883818 0.00855374 0.00868988 0.00856733
 0.00868702 0.00864244 0.00872922 0.00876641]

mean value: 0.009040451049804688

key: test_mcc
value: [0.37796447 0.25       0.60714286 0.26189246 0.46428571 0.56407607
 0.19642857 0.41931393 0.21821789 0.34247476]

mean value: 0.3701796738627931

key: train_mcc
value: [0.59233863 0.52313884 0.49254979 0.53036644 0.56781069 0.53654458
 0.71021843 0.58848522 0.56432157 0.58903512]

mean value: 0.5694809310571065

key: test_accuracy
value: [0.625      0.625      0.8        0.6        0.73333333 0.73333333
 0.6        0.66666667 0.6        0.66666667]

mean value: 0.665

key: train_accuracy
value: [0.78676471 0.75       0.72992701 0.75912409 0.76642336 0.75182482
 0.84671533 0.7810219  0.77372263 0.77372263]

mean value: 0.7719246457707171

key: test_fscore
value: [0.72727273 0.625      0.8        0.66666667 0.71428571 0.77777778
 0.625      0.76190476 0.57142857 0.73684211]

mean value: 0.7006178324599377

key: train_fscore
value: [0.81045752 0.78205128 0.77300613 0.78431373 0.80246914 0.79012346
 0.82644628 0.80769231 0.7394958  0.80745342]

mean value: 0.7923509054595705

key: test_precision
value: [0.57142857 0.625      0.75       0.54545455 0.71428571 0.63636364
 0.625      0.61538462 0.66666667 0.63636364]

mean value: 0.6385947385947386

key: train_precision
value: [0.72941176 0.69318182 0.67021277 0.71428571 0.69892473 0.68817204
 0.94339623 0.71590909 0.8627451  0.69892473]

mean value: 0.7415163983870607

key: test_recall
value: [1.         0.625      0.85714286 0.85714286 0.71428571 1.
 0.625      1.         0.5        0.875     ]

mean value: 0.8053571428571429

key: train_recall
value: [0.91176471 0.89705882 0.91304348 0.86956522 0.94202899 0.92753623
 0.73529412 0.92647059 0.64705882 0.95588235]

mean value: 0.8725703324808184

key: test_roc_auc
value: [0.625      0.625      0.80357143 0.61607143 0.73214286 0.75
 0.59821429 0.64285714 0.60714286 0.65178571]

mean value: 0.6651785714285714

key: train_roc_auc
value: [0.78676471 0.75       0.72858056 0.75831202 0.76513214 0.75053282
 0.84590793 0.78207587 0.77280477 0.77504263]

mean value: 0.7715153452685422

key: test_jcc
value: [0.57142857 0.45454545 0.66666667 0.5        0.55555556 0.63636364
 0.45454545 0.61538462 0.4        0.58333333]

mean value: 0.5437823287823288

key: train_jcc
value: [0.68131868 0.64210526 0.63       0.64516129 0.67010309 0.65306122
 0.70422535 0.67741935 0.58666667 0.67708333]

mean value: 0.6567144259023844

MCC on Blind test: 0.02

Accuracy on Blind test: 0.47

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00795698 0.00770211 0.00773144 0.00778985 0.00778174 0.00784087
 0.00769806 0.0066731  0.00672388 0.0066855 ]

mean value: 0.007458353042602539

key: score_time
value: [0.00865579 0.00877738 0.00871015 0.00862479 0.00876021 0.00872946
 0.00876045 0.00781512 0.00778174 0.00782132]

mean value: 0.008443641662597656

key: test_mcc
value: [ 0.25       -0.25        0.73214286  0.09449112  0.75592895  0.49099025
  0.33928571 -0.13363062  0.33928571  0.19642857]

mean value: 0.2814922553488389

key: train_mcc
value: [0.50195781 0.54894692 0.44946013 0.47724794 0.37278745 0.44522592
 0.41602728 0.48933032 0.41632915 0.44553401]

mean value: 0.4562846929723249

key: test_accuracy
value: [0.625      0.375      0.86666667 0.53333333 0.86666667 0.73333333
 0.66666667 0.46666667 0.66666667 0.6       ]

mean value: 0.64

key: train_accuracy
value: [0.75       0.77205882 0.72262774 0.73722628 0.68613139 0.72262774
 0.7080292  0.74452555 0.7080292  0.72262774]

mean value: 0.727388364104766

key: test_fscore
value: [0.625      0.375      0.85714286 0.58823529 0.83333333 0.75
 0.66666667 0.6        0.66666667 0.625     ]

mean value: 0.658704481792717

key: train_fscore
value: [0.76056338 0.7862069  0.74324324 0.75342466 0.68148148 0.72463768
 0.70588235 0.73684211 0.71014493 0.72463768]

mean value: 0.7327064407151792

key: test_precision
value: [0.625      0.375      0.85714286 0.5        1.         0.66666667
 0.71428571 0.5        0.71428571 0.625     ]

mean value: 0.6577380952380952

key: train_precision
value: [0.72972973 0.74025974 0.69620253 0.71428571 0.6969697  0.72463768
 0.70588235 0.75384615 0.7        0.71428571]

mean value: 0.7176099315122916

key: test_recall
value: [0.625      0.375      0.85714286 0.71428571 0.71428571 0.85714286
 0.625      0.75       0.625      0.625     ]

mean value: 0.6767857142857143

key: train_recall
value: [0.79411765 0.83823529 0.79710145 0.79710145 0.66666667 0.72463768
 0.70588235 0.72058824 0.72058824 0.73529412]

mean value: 0.7500213128729752

key: test_roc_auc
value: [0.625      0.375      0.86607143 0.54464286 0.85714286 0.74107143
 0.66964286 0.44642857 0.66964286 0.59821429]

mean value: 0.6392857142857143

key: train_roc_auc
value: [0.75       0.77205882 0.72208014 0.73678602 0.68627451 0.72261296
 0.70801364 0.74435209 0.7081202  0.72271952]

mean value: 0.7273017902813299

key: test_jcc
value: [0.45454545 0.23076923 0.75       0.41666667 0.71428571 0.6
 0.5        0.42857143 0.5        0.45454545]

mean value: 0.504938394938395

key: train_jcc
value: [0.61363636 0.64772727 0.59139785 0.6043956  0.51685393 0.56818182
 0.54545455 0.58333333 0.5505618  0.56818182]

mean value: 0.57897243357102

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00700665 0.00714684 0.00702357 0.007092   0.00712657 0.0071609
 0.00703955 0.00731921 0.00701737 0.00706315]

mean value: 0.007099580764770508

key: score_time
value: [0.00979042 0.00942588 0.00939441 0.00933671 0.00949192 0.00942111
 0.009372   0.00936747 0.00985074 0.00934005]

mean value: 0.009479069709777832

key: test_mcc
value: [ 0.51639778  0.25819889  0.73214286  0.21821789  0.75592895  0.32732684
 -0.02620712  0.32732684  0.73214286  0.60714286]

mean value: 0.44486186267144306

key: train_mcc
value: [0.72254413 0.69486799 0.68583647 0.72439971 0.62437433 0.68322489
 0.68163703 0.68163703 0.68011153 0.65087548]

mean value: 0.6829508591825769

key: test_accuracy
value: [0.75       0.625      0.86666667 0.6        0.86666667 0.66666667
 0.46666667 0.66666667 0.86666667 0.8       ]

mean value: 0.7175

key: train_accuracy
value: [0.86029412 0.84558824 0.83941606 0.86131387 0.81021898 0.83941606
 0.83941606 0.83941606 0.83941606 0.82481752]

mean value: 0.8399313009875483

key: test_fscore
value: [0.77777778 0.66666667 0.85714286 0.625      0.83333333 0.61538462
 0.2        0.70588235 0.875      0.8       ]

mean value: 0.6956187603246426

key: train_fscore
value: [0.86524823 0.85314685 0.85135135 0.86713287 0.82191781 0.84931507
 0.84507042 0.84507042 0.84285714 0.82857143]

mean value: 0.8469681591792749

key: test_precision
value: [0.7        0.6        0.85714286 0.55555556 1.         0.66666667
 0.5        0.66666667 0.875      0.85714286]

mean value: 0.7278174603174603

key: train_precision
value: [0.83561644 0.81333333 0.79746835 0.83783784 0.77922078 0.80519481
 0.81081081 0.81081081 0.81944444 0.80555556]

mean value: 0.8115293169994922

key: test_recall
value: [0.875      0.75       0.85714286 0.71428571 0.71428571 0.57142857
 0.125      0.75       0.875      0.75      ]

mean value: 0.6982142857142857

key: train_recall
value: [0.89705882 0.89705882 0.91304348 0.89855072 0.86956522 0.89855072
 0.88235294 0.88235294 0.86764706 0.85294118]

mean value: 0.8859121909633418

key: test_roc_auc
value: [0.75       0.625      0.86607143 0.60714286 0.85714286 0.66071429
 0.49107143 0.66071429 0.86607143 0.80357143]

mean value: 0.71875

key: train_roc_auc
value: [0.86029412 0.84558824 0.83887468 0.86104007 0.80978261 0.83898124
 0.8397272  0.8397272  0.83962063 0.82502131]

mean value: 0.8398657289002557

key: test_jcc
value: [0.63636364 0.5        0.75       0.45454545 0.71428571 0.44444444
 0.11111111 0.54545455 0.77777778 0.66666667]

mean value: 0.560064935064935

key: train_jcc
value: [0.7625     0.74390244 0.74117647 0.7654321  0.69767442 0.73809524
 0.73170732 0.73170732 0.72839506 0.70731707]

mean value: 0.7347907434123415

MCC on Blind test: 0.06

Accuracy on Blind test: 0.68

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00901961 0.00860167 0.0086236  0.00857639 0.00859451 0.00781941
 0.00762987 0.00763583 0.00852704 0.00775576]

mean value: 0.008278369903564453

key: score_time
value: [0.00887251 0.00862575 0.00865078 0.00868511 0.00861216 0.00795341
 0.00790024 0.00792432 0.00796032 0.00794959]

mean value: 0.008313417434692383

key: test_mcc
value: [0.62994079 0.62994079 0.73214286 0.56407607 0.87287156 0.60714286
 0.33928571 0.18898224 0.75592895 0.875     ]

mean value: 0.6195311823553656

key: train_mcc
value: [0.77949606 0.85331034 0.85540562 0.86948194 0.82629176 0.86939892
 0.8978896  0.83947987 0.85400682 0.86868474]

mean value: 0.8513445663864698

key: test_accuracy
value: [0.8125     0.8125     0.86666667 0.73333333 0.93333333 0.8
 0.66666667 0.6        0.86666667 0.93333333]

mean value: 0.8025

key: train_accuracy
value: [0.88970588 0.92647059 0.9270073  0.93430657 0.91240876 0.93430657
 0.94890511 0.91970803 0.9270073  0.93430657]

mean value: 0.9254132674967797

key: test_fscore
value: [0.82352941 0.8        0.85714286 0.77777778 0.92307692 0.8
 0.66666667 0.66666667 0.88888889 0.93333333]

mean value: 0.8137082525317819

key: train_fscore
value: [0.88888889 0.92753623 0.92957746 0.93333333 0.91044776 0.93617021
 0.94814815 0.91851852 0.92647059 0.93333333]

mean value: 0.9252424481090294

key: test_precision
value: [0.77777778 0.85714286 0.85714286 0.63636364 1.         0.75
 0.71428571 0.6        0.8        1.        ]

mean value: 0.7992712842712842

key: train_precision
value: [0.89552239 0.91428571 0.90410959 0.95454545 0.93846154 0.91666667
 0.95522388 0.92537313 0.92647059 0.94029851]

mean value: 0.9270957461683526

key: test_recall
value: [0.875      0.75       0.85714286 1.         0.85714286 0.85714286
 0.625      0.75       1.         0.875     ]

mean value: 0.8446428571428571

key: train_recall
value: [0.88235294 0.94117647 0.95652174 0.91304348 0.88405797 0.95652174
 0.94117647 0.91176471 0.92647059 0.92647059]

mean value: 0.9239556692242115

key: test_roc_auc
value: [0.8125     0.8125     0.86607143 0.75       0.92857143 0.80357143
 0.66964286 0.58928571 0.85714286 0.9375    ]

mean value: 0.8026785714285715

key: train_roc_auc
value: [0.88970588 0.92647059 0.92679028 0.93446292 0.91261722 0.93414322
 0.9488491  0.91965047 0.92700341 0.93424979]

mean value: 0.9253942881500427

key: test_jcc
value: [0.7        0.66666667 0.75       0.63636364 0.85714286 0.66666667
 0.5        0.5        0.8        0.875     ]

mean value: 0.6951839826839826

key: train_jcc
value: [0.8        0.86486486 0.86842105 0.875      0.83561644 0.88
 0.90140845 0.84931507 0.8630137  0.875     ]

mean value: 0.8612639573680121

MCC on Blind test: 0.13

Accuracy on Blind test: 0.69

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.47060013 0.6176157  0.50893569 0.47440553 0.48640704 0.68746996
 0.48285437 0.49517989 0.48710799 0.6235292 ]

mean value: 0.5334105491638184

key: score_time
value: [0.01105475 0.01343441 0.01317406 0.01111579 0.01340437 0.01400685
 0.01163292 0.01111388 0.01380134 0.01445436]

mean value: 0.012719273567199707

key: test_mcc
value: [0.77459667 0.75       0.87287156 0.49099025 1.         0.73214286
 0.47245559 0.32732684 0.75592895 0.73214286]

mean value: 0.6908455570136127

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.875      0.93333333 0.73333333 1.         0.86666667
 0.73333333 0.66666667 0.86666667 0.86666667]

mean value: 0.8416666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.875      0.92307692 0.75       1.         0.85714286
 0.77777778 0.70588235 0.88888889 0.875     ]

mean value: 0.8541657688716512

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        0.875      1.         0.66666667 1.         0.85714286
 0.7        0.66666667 0.8        0.875     ]

mean value: 0.824047619047619

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 0.85714286 1.         0.85714286
 0.875      0.75       1.         0.875     ]

mean value: 0.8946428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.875      0.92857143 0.74107143 1.         0.86607143
 0.72321429 0.66071429 0.85714286 0.86607143]

mean value: 0.8392857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.77777778 0.85714286 0.6        1.         0.75
 0.63636364 0.54545455 0.8        0.77777778]

mean value: 0.7544516594516595

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.69

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01054811 0.01029015 0.00749731 0.00756335 0.00791764 0.00744152
 0.0080564  0.0079093  0.00725198 0.00789857]

mean value: 0.008237433433532716

key: score_time
value: [0.01101589 0.00920248 0.00816894 0.00859499 0.00827861 0.00804591
 0.00809813 0.00824928 0.00804496 0.00812697]

mean value: 0.008582615852355957

key: test_mcc
value: [1.         0.77459667 0.875      0.76376262 1.         0.87287156
 1.         1.         0.87287156 0.875     ]

mean value: 0.9034102406955395

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      0.93333333 0.86666667 1.         0.93333333
 1.         1.         0.93333333 0.93333333]

mean value: 0.9475

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.88888889 0.93333333 0.875      1.         0.92307692
 1.         1.         0.94117647 0.93333333]

mean value: 0.9494808949220714

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.8        0.875      0.77777778 1.         1.
 1.         1.         0.88888889 1.        ]

mean value: 0.9341666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      0.9375     0.875      1.         0.92857143
 1.         1.         0.92857143 0.9375    ]

mean value: 0.9482142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.8        0.875      0.77777778 1.         0.85714286
 1.         1.         0.88888889 0.875     ]

mean value: 0.9073809523809524

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.85

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08002043 0.08039927 0.08053231 0.08475113 0.0849824  0.07976866
 0.07944989 0.08087707 0.07947731 0.0836072 ]

mean value: 0.08138656616210938

key: score_time
value: [0.01744008 0.01705742 0.01676226 0.01815081 0.01665783 0.01678061
 0.01786375 0.0182426  0.01667714 0.01768732]

mean value: 0.017331981658935548

key: test_mcc
value: [0.8819171  0.75       0.87287156 0.66143783 1.         0.87287156
 0.46428571 0.76376262 0.875      0.76376262]

mean value: 0.7905908999279945

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.93333333 0.8        1.         0.93333333
 0.73333333 0.86666667 0.93333333 0.86666667]

mean value: 0.8879166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.875      0.92307692 0.82352941 1.         0.92307692
 0.75       0.85714286 0.93333333 0.85714286]

mean value: 0.8883478776125835

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.875      1.         0.7        1.         1.
 0.75       1.         1.         1.        ]

mean value: 0.9213888888888889

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 1.         1.         0.85714286
 0.75       0.75       0.875      0.75      ]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      0.92857143 0.8125     1.         0.92857143
 0.73214286 0.875      0.9375     0.875     ]

mean value: 0.8901785714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.77777778 0.85714286 0.7        1.         0.85714286
 0.6        0.75       0.875      0.75      ]

mean value: 0.805595238095238

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.81

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00745249 0.0100553  0.00690699 0.00686359 0.00682211 0.0068984
 0.00738096 0.00688457 0.00711179 0.00706244]

mean value: 0.007343864440917969

key: score_time
value: [0.00842643 0.00823832 0.00787807 0.00817585 0.00788713 0.00785685
 0.00784945 0.00778127 0.00796604 0.00789261]

mean value: 0.007995200157165528

key: test_mcc
value: [1.         0.40451992 0.60714286 0.875      0.76376262 0.33928571
 0.76376262 0.46428571 0.75592895 0.875     ]

mean value: 0.6848688380862632

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.6875     0.8        0.93333333 0.86666667 0.66666667
 0.86666667 0.73333333 0.86666667 0.93333333]

mean value: 0.8354166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.73684211 0.8        0.93333333 0.875      0.66666667
 0.85714286 0.75       0.88888889 0.93333333]

mean value: 0.8441207184628238

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.63636364 0.75       0.875      0.77777778 0.625
 1.         0.75       0.8        1.        ]

mean value: 0.8214141414141414

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 1.         1.         0.71428571
 0.75       0.75       1.         0.875     ]

mean value: 0.8821428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.6875     0.80357143 0.9375     0.875      0.66964286
 0.875      0.73214286 0.85714286 0.9375    ]

mean value: 0.8375

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.58333333 0.66666667 0.875      0.77777778 0.5
 0.75       0.6        0.8        0.875     ]

mean value: 0.7427777777777778

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.73

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [0.99027419 1.03364635 0.99537587 0.99380183 1.01279187 1.00695038
 1.00400448 0.99069548 0.98471999 0.9896822 ]

mean value: 1.000194263458252

key: score_time
value: [0.09284711 0.09792686 0.09596872 0.09674335 0.097049   0.09700847
 0.09636211 0.08898997 0.08923626 0.15491176]

mean value: 0.10070436000823975

key: test_mcc
value: [0.8819171  0.8819171  0.875      0.76376262 1.         0.87287156
 0.60714286 0.87287156 1.         0.73214286]

mean value: 0.848762565937602

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     0.93333333 0.86666667 1.         0.93333333
 0.8        0.93333333 1.         0.86666667]

mean value: 0.9208333333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.94117647 0.93333333 0.875      1.         0.92307692
 0.8        0.94117647 1.         0.875     ]

mean value: 0.9229939668174962

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.88888889 0.875      0.77777778 1.         1.
 0.85714286 0.88888889 1.         0.875     ]

mean value: 0.9051587301587302

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 0.75       1.         1.         0.875     ]

mean value: 0.9482142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     0.9375     0.875      1.         0.92857143
 0.80357143 0.92857143 1.         0.86607143]

mean value: 0.9214285714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.88888889 0.875      0.77777778 1.         0.85714286
 0.66666667 0.88888889 1.         0.77777778]

mean value: 0.8621031746031745

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.79997134 0.8617053  0.82350397 0.86626768 0.86152625 0.8775835
 0.89794159 0.84732342 0.82997847 0.88673472]

mean value: 0.8552536249160767

key: score_time
value: [0.23055267 0.18991017 0.19632292 0.2545855  0.13287044 0.18487072
 0.21556759 0.20604992 0.17664123 0.12801123]

mean value: 0.19153823852539062

key: test_mcc
value: [0.8819171  0.75       0.875      0.76376262 0.87287156 0.73214286
 0.60714286 0.73214286 1.         0.73214286]

mean value: 0.7947122709029568

key: train_mcc
value: [0.97100831 0.94117647 0.95710706 0.98550418 0.95630861 0.97080136
 0.98550418 0.98550725 0.97122151 0.98550725]

mean value: 0.9709646177394017

key: test_accuracy
value: [0.9375     0.875      0.93333333 0.86666667 0.93333333 0.86666667
 0.8        0.86666667 1.         0.86666667]

mean value: 0.8945833333333334

key: train_accuracy
value: [0.98529412 0.97058824 0.97810219 0.99270073 0.97810219 0.98540146
 0.99270073 0.99270073 0.98540146 0.99270073]

mean value: 0.9853692571919279

key: test_fscore
value: [0.94117647 0.875      0.93333333 0.875      0.92307692 0.85714286
 0.8        0.875      1.         0.875     ]

mean value: 0.8954729584141349

key: train_fscore
value: [0.98550725 0.97058824 0.9787234  0.99280576 0.97810219 0.98550725
 0.99259259 0.99270073 0.98550725 0.99270073]

mean value: 0.9854735376303184

key: test_precision
value: [0.88888889 0.875      0.875      0.77777778 1.         0.85714286
 0.85714286 0.875      1.         0.875     ]

mean value: 0.888095238095238

key: train_precision
value: [0.97142857 0.97058824 0.95833333 0.98571429 0.98529412 0.98550725
 1.         0.98550725 0.97142857 0.98550725]

mean value: 0.9799308853976374

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 0.75       0.875      1.         0.875     ]

mean value: 0.9089285714285714

key: train_recall
value: [1.         0.97058824 1.         1.         0.97101449 0.98550725
 0.98529412 1.         1.         1.        ]

mean value: 0.9912404092071612

key: test_roc_auc
value: [0.9375     0.875      0.9375     0.875      0.92857143 0.86607143
 0.80357143 0.86607143 1.         0.86607143]

mean value: 0.8955357142857143

key: train_roc_auc
value: [0.98529412 0.97058824 0.97794118 0.99264706 0.97815431 0.98540068
 0.99264706 0.99275362 0.98550725 0.99275362]

mean value: 0.9853687127024723

key: test_jcc
value: [0.88888889 0.77777778 0.875      0.77777778 0.85714286 0.75
 0.66666667 0.77777778 1.         0.77777778]

mean value: 0.8148809523809524

key: train_jcc
value: [0.97142857 0.94285714 0.95833333 0.98571429 0.95714286 0.97142857
 0.98529412 0.98550725 0.97142857 0.98550725]

mean value: 0.9714641943734016

MCC on Blind test: 0.1

Accuracy on Blind test: 0.79

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01698375 0.00691271 0.00714707 0.00702286 0.00691867 0.00681043
 0.00706601 0.0070343  0.00683665 0.0071075 ]

mean value: 0.007983994483947755

key: score_time
value: [0.01220894 0.00788188 0.00856304 0.00795507 0.0079546  0.00788164
 0.00790548 0.00793004 0.00796223 0.00810909]

mean value: 0.008435201644897462

key: test_mcc
value: [ 0.25       -0.25        0.73214286  0.09449112  0.75592895  0.49099025
  0.33928571 -0.13363062  0.33928571  0.19642857]

mean value: 0.2814922553488389

key: train_mcc
value: [0.50195781 0.54894692 0.44946013 0.47724794 0.37278745 0.44522592
 0.41602728 0.48933032 0.41632915 0.44553401]

mean value: 0.4562846929723249

key: test_accuracy
value: [0.625      0.375      0.86666667 0.53333333 0.86666667 0.73333333
 0.66666667 0.46666667 0.66666667 0.6       ]

mean value: 0.64

key: train_accuracy
value: [0.75       0.77205882 0.72262774 0.73722628 0.68613139 0.72262774
 0.7080292  0.74452555 0.7080292  0.72262774]

mean value: 0.727388364104766

key: test_fscore
value: [0.625      0.375      0.85714286 0.58823529 0.83333333 0.75
 0.66666667 0.6        0.66666667 0.625     ]

mean value: 0.658704481792717

key: train_fscore
value: [0.76056338 0.7862069  0.74324324 0.75342466 0.68148148 0.72463768
 0.70588235 0.73684211 0.71014493 0.72463768]

mean value: 0.7327064407151792

key: test_precision
value: [0.625      0.375      0.85714286 0.5        1.         0.66666667
 0.71428571 0.5        0.71428571 0.625     ]

mean value: 0.6577380952380952

key: train_precision
value: [0.72972973 0.74025974 0.69620253 0.71428571 0.6969697  0.72463768
 0.70588235 0.75384615 0.7        0.71428571]

mean value: 0.7176099315122916

key: test_recall
value: [0.625      0.375      0.85714286 0.71428571 0.71428571 0.85714286
 0.625      0.75       0.625      0.625     ]

mean value: 0.6767857142857143

key: train_recall
value: [0.79411765 0.83823529 0.79710145 0.79710145 0.66666667 0.72463768
 0.70588235 0.72058824 0.72058824 0.73529412]

mean value: 0.7500213128729752

key: test_roc_auc
value: [0.625      0.375      0.86607143 0.54464286 0.85714286 0.74107143
 0.66964286 0.44642857 0.66964286 0.59821429]

mean value: 0.6392857142857143

key: train_roc_auc
value: [0.75       0.77205882 0.72208014 0.73678602 0.68627451 0.72261296
 0.70801364 0.74435209 0.7081202  0.72271952]

mean value: 0.7273017902813299

key: test_jcc
value: [0.45454545 0.23076923 0.75       0.41666667 0.71428571 0.6
 0.5        0.42857143 0.5        0.45454545]

mean value: 0.504938394938395

key: train_jcc
value: [0.61363636 0.64772727 0.59139785 0.6043956  0.51685393 0.56818182
 0.54545455 0.58333333 0.5505618  0.56818182]

mean value: 0.57897243357102

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06262994 0.03505611 0.03692508 0.03597021 0.06148291 0.03540158
 0.03483367 0.03505754 0.04629922 0.03492475]

mean value: 0.04185810089111328

key: score_time
value: [0.01055789 0.01049376 0.01050019 0.01044226 0.01041293 0.01036716
 0.01037478 0.0117774  0.01043272 0.01040506]

mean value: 0.010576415061950683

key: test_mcc
value: [1.         0.8819171  0.875      0.76376262 1.         1.
 0.87287156 1.         1.         0.875     ]

mean value: 0.9268551280458139

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     0.93333333 0.86666667 1.         1.
 0.93333333 1.         1.         0.93333333]

mean value: 0.9604166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 0.93333333 0.875      1.         1.
 0.94117647 1.         1.         0.93333333]

mean value: 0.9624019607843137

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 0.875      0.77777778 1.         1.
 0.88888889 1.         1.         1.        ]

mean value: 0.9430555555555555

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.    1.    1.    1.    1.    1.    1.    1.    1.    0.875]

mean value: 0.9875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     0.9375     0.875      1.         1.
 0.92857143 1.         1.         0.9375    ]

mean value: 0.9616071428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 0.875      0.77777778 1.         1.
 0.88888889 1.         1.         0.875     ]

mean value: 0.9305555555555556

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01320124 0.01201296 0.01213074 0.01227403 0.01183081 0.01187682
 0.01212811 0.01183581 0.01189661 0.03893161]

mean value: 0.014811873435974121

key: score_time
value: [0.0113101  0.01076269 0.01052332 0.01057029 0.0105257  0.01047421
 0.0105381  0.01045918 0.01049614 0.01063371]

mean value: 0.01062934398651123

key: test_mcc
value: [0.77459667 0.77459667 0.73214286 0.66143783 0.87287156 0.87287156
 0.75592895 0.47245559 0.64465837 1.        ]

mean value: 0.7561560053780203

key: train_mcc
value: [0.92898531 0.92737353 0.91392776 0.97120941 0.91277477 0.94318882
 0.88668406 0.94323594 0.91597649 0.92791659]

mean value: 0.927127267186985

key: test_accuracy
value: [0.875      0.875      0.86666667 0.8        0.93333333 0.93333333
 0.86666667 0.73333333 0.8        1.        ]

mean value: 0.8683333333333334

key: train_accuracy
value: [0.96323529 0.96323529 0.95620438 0.98540146 0.95620438 0.97080292
 0.94160584 0.97080292 0.95620438 0.96350365]

mean value: 0.9627200515242593

key: test_fscore
value: [0.88888889 0.88888889 0.85714286 0.82352941 0.92307692 0.92307692
 0.88888889 0.77777778 0.84210526 1.        ]

mean value: 0.8813375822663748

key: train_fscore
value: [0.96453901 0.96402878 0.95774648 0.98571429 0.95714286 0.97183099
 0.94366197 0.97142857 0.95774648 0.96402878]

mean value: 0.9637868190827705

key: test_precision
value: [0.8        0.8        0.85714286 0.7        1.         1.
 0.8        0.7        0.72727273 1.        ]

mean value: 0.8384415584415584

key: train_precision
value: [0.93150685 0.94366197 0.93150685 0.97183099 0.94366197 0.94520548
 0.90540541 0.94444444 0.91891892 0.94366197]

mean value: 0.9379804848259411

key: test_recall
value: [1.         1.         0.85714286 1.         0.85714286 0.85714286
 1.         0.875      1.         1.        ]

mean value: 0.9446428571428571

key: train_recall
value: [1.         0.98529412 0.98550725 1.         0.97101449 1.
 0.98529412 1.         1.         0.98529412]

mean value: 0.9912404092071612

key: test_roc_auc
value: [0.875      0.875      0.86607143 0.8125     0.92857143 0.92857143
 0.85714286 0.72321429 0.78571429 1.        ]

mean value: 0.8651785714285715

key: train_roc_auc
value: [0.96323529 0.96323529 0.95598892 0.98529412 0.95609548 0.97058824
 0.94192242 0.97101449 0.95652174 0.96366155]

mean value: 0.9627557544757033

key: test_jcc
value: [0.8        0.8        0.75       0.7        0.85714286 0.85714286
 0.8        0.63636364 0.72727273 1.        ]

mean value: 0.7927922077922078

key: train_jcc
value: [0.93150685 0.93055556 0.91891892 0.97183099 0.91780822 0.94520548
 0.89333333 0.94444444 0.91891892 0.93055556]

mean value: 0.9303078260587425

MCC on Blind test: 0.05

Accuracy on Blind test: 0.64

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00955915 0.0070889  0.00685692 0.00679111 0.00704551 0.0069344
 0.00679803 0.0070312  0.00693274 0.00679612]

mean value: 0.0071834087371826175

key: score_time
value: [0.01106119 0.00804234 0.00776124 0.00791359 0.00784922 0.00808263
 0.0079062  0.00786209 0.00788569 0.00790429]

mean value: 0.008226847648620606

key: test_mcc
value: [ 0.12598816  0.25819889  0.73214286  0.33928571  0.87287156  0.37796447
  0.19642857 -0.13363062  0.46428571  0.6000992 ]

mean value: 0.3833634515705724

key: train_mcc
value: [0.48661135 0.51745489 0.47592003 0.50667322 0.41725962 0.50373224
 0.50394373 0.5339313  0.53314859 0.47473887]

mean value: 0.4953413853595016

key: test_accuracy
value: [0.5625     0.625      0.86666667 0.66666667 0.93333333 0.66666667
 0.6        0.46666667 0.73333333 0.8       ]

mean value: 0.6920833333333334

key: train_accuracy
value: [0.74264706 0.75735294 0.73722628 0.75182482 0.7080292  0.75182482
 0.75182482 0.76642336 0.76642336 0.73722628]

mean value: 0.747080291970803

key: test_fscore
value: [0.58823529 0.57142857 0.85714286 0.66666667 0.92307692 0.70588235
 0.625      0.6        0.75       0.82352941]

mean value: 0.7110962077138547

key: train_fscore
value: [0.75177305 0.76923077 0.75       0.76712329 0.72222222 0.75714286
 0.75362319 0.77142857 0.76811594 0.73913043]

mean value: 0.7549790322558434

key: test_precision
value: [0.55555556 0.66666667 0.85714286 0.625      1.         0.6
 0.625      0.5        0.75       0.77777778]

mean value: 0.6957142857142857

key: train_precision
value: [0.7260274  0.73333333 0.72       0.72727273 0.69333333 0.74647887
 0.74285714 0.75       0.75714286 0.72857143]

mean value: 0.7325017093010533

key: test_recall
value: [0.625      0.5        0.85714286 0.71428571 0.85714286 0.85714286
 0.625      0.75       0.75       0.875     ]

mean value: 0.7410714285714286

key: train_recall
value: [0.77941176 0.80882353 0.7826087  0.8115942  0.75362319 0.76811594
 0.76470588 0.79411765 0.77941176 0.75      ]

mean value: 0.7792412617220801

key: test_roc_auc
value: [0.5625     0.625      0.86607143 0.66964286 0.92857143 0.67857143
 0.59821429 0.44642857 0.73214286 0.79464286]

mean value: 0.6901785714285714

key: train_roc_auc
value: [0.74264706 0.75735294 0.73689258 0.75138534 0.70769395 0.75170503
 0.75191816 0.76662404 0.76651748 0.73731884]

mean value: 0.7470055413469735

key: test_jcc
value: [0.41666667 0.4        0.75       0.5        0.85714286 0.54545455
 0.45454545 0.42857143 0.6        0.7       ]

mean value: 0.5652380952380952

key: train_jcc
value: [0.60227273 0.625      0.6        0.62222222 0.56521739 0.6091954
 0.60465116 0.62790698 0.62352941 0.5862069 ]

mean value: 0.6066202190949461

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00778508 0.00735497 0.00741458 0.00750518 0.0074389  0.00734568
 0.00739765 0.00764251 0.00755811 0.00754356]

mean value: 0.007498621940612793

key: score_time
value: [0.00792003 0.00796342 0.00831413 0.00790501 0.0078187  0.00797248
 0.00821495 0.00799203 0.0079875  0.00809884]

mean value: 0.008018708229064942

key: test_mcc
value: [0.62994079 0.62994079 0.875      0.19642857 0.87287156 0.87287156
 0.32732684 0.75592895 0.64465837 0.875     ]

mean value: 0.6679967422606682

key: train_mcc
value: [0.88580789 0.91334626 0.89863497 0.83795818 0.91240409 0.83063246
 0.92787101 0.91281179 0.92710997 0.92709446]

mean value: 0.8973671087701672

key: test_accuracy
value: [0.8125     0.8125     0.93333333 0.6        0.93333333 0.93333333
 0.66666667 0.86666667 0.8        0.93333333]

mean value: 0.8291666666666667

key: train_accuracy
value: [0.94117647 0.95588235 0.94890511 0.91240876 0.95620438 0.91240876
 0.96350365 0.95620438 0.96350365 0.96350365]

mean value: 0.9473701159295835

key: test_fscore
value: [0.82352941 0.82352941 0.93333333 0.57142857 0.92307692 0.92307692
 0.70588235 0.88888889 0.84210526 0.93333333]

mean value: 0.8368184412766456

key: train_fscore
value: [0.93846154 0.95714286 0.95035461 0.9047619  0.95652174 0.90769231
 0.96240602 0.95652174 0.96350365 0.96296296]

mean value: 0.9460329323884149

key: test_precision
value: [0.77777778 0.77777778 0.875      0.57142857 1.         1.
 0.66666667 0.8        0.72727273 1.        ]

mean value: 0.819592352092352

key: train_precision
value: [0.98387097 0.93055556 0.93055556 1.         0.95652174 0.96721311
 0.98461538 0.94285714 0.95652174 0.97014925]

mean value: 0.9622860453071885

key: test_recall
value: [0.875      0.875      1.         0.57142857 0.85714286 0.85714286
 0.75       1.         1.         0.875     ]

mean value: 0.8660714285714286

key: train_recall
value: [0.89705882 0.98529412 0.97101449 0.82608696 0.95652174 0.85507246
 0.94117647 0.97058824 0.97058824 0.95588235]

mean value: 0.9329283887468031

key: test_roc_auc
value: [0.8125     0.8125     0.9375     0.59821429 0.92857143 0.92857143
 0.66071429 0.85714286 0.78571429 0.9375    ]

mean value: 0.8258928571428572

key: train_roc_auc
value: [0.94117647 0.95588235 0.94874254 0.91304348 0.95620205 0.91283035
 0.96334186 0.95630861 0.96355499 0.96344842]

mean value: 0.9474531116794545

key: test_jcc
value: [0.7        0.7        0.875      0.4        0.85714286 0.85714286
 0.54545455 0.8        0.72727273 0.875     ]

mean value: 0.7337012987012987

key: train_jcc
value: [0.88405797 0.91780822 0.90540541 0.82608696 0.91666667 0.83098592
 0.92753623 0.91666667 0.92957746 0.92857143]

mean value: 0.8983362926190229

MCC on Blind test: 0.05

Accuracy on Blind test: 0.63

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01016855 0.0098474  0.0079248  0.00727248 0.00720954 0.00730157
 0.00728512 0.0073278  0.00719166 0.00727534]

mean value: 0.007880425453186036

key: score_time
value: [0.010952   0.00936007 0.00861073 0.00789762 0.00792098 0.00781727
 0.00834227 0.00790691 0.00788283 0.00789428]

mean value: 0.008458495140075684

key: test_mcc
value: [0.57735027 0.8819171  0.875      0.33928571 0.87287156 0.87287156
 0.33928571 0.37796447 0.46428571 0.875     ]

mean value: 0.6475832110632131

key: train_mcc
value: [0.63408348 0.8979331  0.77817796 0.83063246 0.92951942 0.81712461
 0.85977656 0.72794365 0.85721269 0.88920184]

mean value: 0.822160576316637

key: test_accuracy
value: [0.75       0.9375     0.93333333 0.66666667 0.93333333 0.93333333
 0.66666667 0.66666667 0.73333333 0.93333333]

mean value: 0.8154166666666667

key: train_accuracy
value: [0.78676471 0.94852941 0.88321168 0.91240876 0.96350365 0.90510949
 0.9270073  0.84671533 0.9270073  0.94160584]

mean value: 0.9041863460712752

key: test_fscore
value: [0.8        0.93333333 0.93333333 0.66666667 0.92307692 0.92307692
 0.66666667 0.61538462 0.75       0.93333333]

mean value: 0.8144871794871795

key: train_fscore
value: [0.82424242 0.94736842 0.89333333 0.90769231 0.96240602 0.91156463
 0.921875   0.8173913  0.92307692 0.9375    ]

mean value: 0.904645035463338

key: test_precision
value: [0.66666667 1.         0.875      0.625      1.         1.
 0.71428571 0.8        0.75       1.        ]

mean value: 0.8430952380952381

key: train_precision
value: [0.70103093 0.96923077 0.82716049 0.96721311 1.         0.85897436
 0.98333333 1.         0.96774194 1.        ]

mean value: 0.9274684933438643

key: test_recall
value: [1.         0.875      1.         0.71428571 0.85714286 0.85714286
 0.625      0.5        0.75       0.875     ]

mean value: 0.8053571428571429

key: train_recall
value: [1.         0.92647059 0.97101449 0.85507246 0.92753623 0.97101449
 0.86764706 0.69117647 0.88235294 0.88235294]

mean value: 0.8974637681159421

key: test_roc_auc
value: [0.75       0.9375     0.9375     0.66964286 0.92857143 0.92857143
 0.66964286 0.67857143 0.73214286 0.9375    ]

mean value: 0.8169642857142857

key: train_roc_auc
value: [0.78676471 0.94852941 0.88256607 0.91283035 0.96376812 0.90462489
 0.92657715 0.84558824 0.92668372 0.94117647]

mean value: 0.9039109121909633

key: test_jcc
value: [0.66666667 0.875      0.875      0.5        0.85714286 0.85714286
 0.5        0.44444444 0.6        0.875     ]

mean value: 0.7050396825396825

key: train_jcc
value: [0.70103093 0.9        0.80722892 0.83098592 0.92753623 0.8375
 0.85507246 0.69117647 0.85714286 0.88235294]

mean value: 0.8290026723550397

MCC on Blind test: 0.06

Accuracy on Blind test: 0.89

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.07523441 0.06551671 0.06416392 0.06420016 0.06557775 0.06523657
 0.06472826 0.06670904 0.06583929 0.06667423]

mean value: 0.06638803482055664

key: score_time
value: [0.01517701 0.01486087 0.01571703 0.01545548 0.01541901 0.01526618
 0.01506066 0.01570487 0.01489067 0.01541162]

mean value: 0.015296339988708496

key: test_mcc
value: [0.8819171  0.8819171  0.875      0.66143783 1.         0.87287156
 1.         0.87287156 0.87287156 0.875     ]

mean value: 0.879388671797445

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     0.93333333 0.8        1.         0.93333333
 1.         0.93333333 0.93333333 0.93333333]

mean value: 0.9341666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.94117647 0.93333333 0.82352941 1.         0.92307692
 1.         0.94117647 0.94117647 0.93333333]

mean value: 0.9377978883861237

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.88888889 0.875      0.7        1.         1.
 1.         0.88888889 0.88888889 1.        ]

mean value: 0.9130555555555555

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     0.9375     0.8125     1.         0.92857143
 1.         0.92857143 0.92857143 0.9375    ]

mean value: 0.9348214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.88888889 0.875      0.7        1.         0.85714286
 1.         0.88888889 0.88888889 0.875     ]

mean value: 0.8862698412698412

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.76

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03459382 0.04311633 0.02597976 0.02609849 0.03556275 0.03000331
 0.02980828 0.03237772 0.04908109 0.03363228]

mean value: 0.034025382995605466

key: score_time
value: [0.03137994 0.01657486 0.01867056 0.01809192 0.03612328 0.02216148
 0.02189708 0.01990652 0.03687644 0.01487947]

mean value: 0.023656153678894044

key: test_mcc
value: [1.         0.8819171  0.875      0.76376262 0.87287156 0.87287156
 1.         1.         1.         0.875     ]

mean value: 0.9141422841402109

key: train_mcc
value: [1.         1.         0.98550418 1.         0.98550725 1.
 1.         0.98550418 1.         0.98550725]

mean value: 0.9942022851330479

key: test_accuracy
value: [1.         0.9375     0.93333333 0.86666667 0.93333333 0.93333333
 1.         1.         1.         0.93333333]

mean value: 0.95375

key: train_accuracy
value: [1.         1.         0.99270073 1.         0.99270073 1.
 1.         0.99270073 1.         0.99270073]

mean value: 0.997080291970803

key: test_fscore
value: [1.         0.94117647 0.93333333 0.875      0.92307692 0.92307692
 1.         1.         1.         0.93333333]

mean value: 0.9528996983408748

key: train_fscore
value: [1.         1.         0.99280576 1.         0.99270073 1.
 1.         0.99259259 1.         0.99270073]

mean value: 0.9970799807842291

key: test_precision
value: [1.         0.88888889 0.875      0.77777778 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9541666666666666

key: train_precision
value: [1.         1.         0.98571429 1.         1.         1.
 1.         1.         1.         0.98550725]

mean value: 0.9971221532091097

key: test_recall
value: [1.         1.         1.         1.         0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9589285714285715

key: train_recall
value: [1.         1.         1.         1.         0.98550725 1.
 1.         0.98529412 1.         1.        ]

mean value: 0.997080136402387

key: test_roc_auc
value: [1.         0.9375     0.9375     0.875      0.92857143 0.92857143
 1.         1.         1.         0.9375    ]

mean value: 0.9544642857142858

key: train_roc_auc
value: [1.         1.         0.99264706 1.         0.99275362 1.
 1.         0.99264706 1.         0.99275362]

mean value: 0.997080136402387

key: test_jcc
value: [1.         0.88888889 0.875      0.77777778 0.85714286 0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9130952380952381

key: train_jcc
value: [1.         1.         0.98571429 1.         0.98550725 1.
 1.         0.98529412 1.         0.98550725]

mean value: 0.9942022896114968

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03075981 0.03832865 0.06963396 0.06718922 0.03995085 0.03923106
 0.03885245 0.061131   0.04299402 0.03564787]

mean value: 0.04637188911437988

key: score_time
value: [0.02218819 0.01115394 0.01116896 0.03056479 0.02163672 0.02096963
 0.02151918 0.03110862 0.01723385 0.01872468]

mean value: 0.02062685489654541

key: test_mcc
value: [0.67419986 0.75       0.87287156 0.37796447 1.         0.73214286
 0.46428571 0.46428571 1.         0.76376262]

mean value: 0.7099512797956697

key: train_mcc
value: [0.95598573 0.98540068 0.97080136 0.95630861 0.97080136 0.95630861
 0.97080136 0.97080136 0.97080136 0.97080136]

mean value: 0.9678811811884551

key: test_accuracy
value: [0.8125     0.875      0.93333333 0.66666667 1.         0.86666667
 0.73333333 0.73333333 1.         0.86666667]

mean value: 0.84875

key: train_accuracy
value: [0.97794118 0.99264706 0.98540146 0.97810219 0.98540146 0.97810219
 0.98540146 0.98540146 0.98540146 0.98540146]

mean value: 0.9839201373980249

key: test_fscore
value: [0.84210526 0.875      0.92307692 0.70588235 1.         0.85714286
 0.75       0.75       1.         0.85714286]

mean value: 0.8560350253461708

key: train_fscore
value: [0.97777778 0.99259259 0.98550725 0.97810219 0.98550725 0.97810219
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9838765713274273

key: test_precision
value: [0.72727273 0.875      1.         0.6        1.         0.85714286
 0.75       0.75       1.         1.        ]

mean value: 0.8559415584415584

key: train_precision
value: [0.98507463 1.         0.98550725 0.98529412 0.98550725 0.98529412
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9867853825501648

key: test_recall
value: [1.         0.875      0.85714286 0.85714286 1.         0.85714286
 0.75       0.75       1.         0.75      ]

mean value: 0.8696428571428572

key: train_recall
value: [0.97058824 0.98529412 0.98550725 0.97101449 0.98550725 0.97101449
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9810102301790282

key: test_roc_auc
value: [0.8125     0.875      0.92857143 0.67857143 1.         0.86607143
 0.73214286 0.73214286 1.         0.875     ]

mean value: 0.85

key: train_roc_auc
value: [0.97794118 0.99264706 0.98540068 0.97815431 0.98540068 0.97815431
 0.98540068 0.98540068 0.98540068 0.98540068]

mean value: 0.9839300937766412

key: test_jcc
value: [0.72727273 0.77777778 0.85714286 0.54545455 1.         0.75
 0.6        0.6        1.         0.75      ]

mean value: 0.7607647907647908

key: train_jcc
value: [0.95652174 0.98529412 0.97142857 0.95714286 0.97142857 0.95714286
 0.97101449 0.97101449 0.97101449 0.97101449]

mean value: 0.9683016684934843

MCC on Blind test: 0.06

Accuracy on Blind test: 0.66

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.09868574 0.10024285 0.09119558 0.08083391 0.09357262 0.08993793
 0.10161471 0.10149956 0.09327483 0.08393335]

mean value: 0.0934791088104248

key: score_time
value: [0.00927162 0.00913954 0.00923514 0.00928712 0.00947499 0.00919628
 0.00924039 0.00923944 0.00928307 0.00930262]

mean value: 0.009267020225524902

key: test_mcc
value: [1.         0.8819171  0.875      0.76376262 1.         0.87287156
 1.         1.         0.87287156 0.73214286]

mean value: 0.8998565698544966

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     0.93333333 0.86666667 1.         0.93333333
 1.         1.         0.93333333 0.86666667]

mean value: 0.9470833333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 0.93333333 0.875      1.         0.92307692
 1.         1.         0.94117647 0.875     ]

mean value: 0.9488763197586727

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 0.875      0.77777778 1.         1.
 1.         1.         0.88888889 0.875     ]

mean value: 0.9305555555555556

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     0.9375     0.875      1.         0.92857143
 1.         1.         0.92857143 0.86607143]

mean value: 0.9473214285714285

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 0.875      0.77777778 1.         0.85714286
 1.         1.         0.88888889 0.77777778]

mean value: 0.906547619047619

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00916886 0.01091504 0.01081634 0.01080394 0.0134356  0.02716422
 0.01087403 0.01102185 0.01133323 0.01125884]

mean value: 0.012679195404052735

key: score_time
value: [0.01023698 0.01037884 0.01042628 0.01103234 0.01079631 0.01123476
 0.01329875 0.01280212 0.01062632 0.01068163]

mean value: 0.011151432991027832

key: test_mcc
value: [0.8819171  0.67419986 0.75592895 0.75592895 0.75592895 0.53452248
 0.37796447 0.76376262 0.76376262 0.76376262]

mean value: 0.7027678608518798

key: train_mcc
value: [1.         0.90184995 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9901849950564579

key: test_accuracy
value: [0.9375     0.8125     0.86666667 0.86666667 0.86666667 0.73333333
 0.66666667 0.86666667 0.86666667 0.86666667]

mean value: 0.835

key: train_accuracy
value: [1.         0.94852941 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9948529411764706

key: test_fscore
value: [0.94117647 0.76923077 0.83333333 0.83333333 0.83333333 0.6
 0.61538462 0.85714286 0.85714286 0.85714286]

mean value: 0.7997220426632191

key: train_fscore
value: [1.         0.94573643 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9945736434108527

key: test_precision
value: [0.88888889 1.         1.         1.         1.         1.
 0.8        1.         1.         1.        ]

mean value: 0.9688888888888889

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.625      0.71428571 0.71428571 0.71428571 0.42857143
 0.5        0.75       0.75       0.75      ]

mean value: 0.6946428571428571

key: train_recall
value: [1.         0.89705882 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9897058823529412

key: test_roc_auc
value: [0.9375     0.8125     0.85714286 0.85714286 0.85714286 0.71428571
 0.67857143 0.875      0.875      0.875     ]

mean value: 0.8339285714285715

key: train_roc_auc
value: [1.         0.94852941 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9948529411764706

key: test_jcc
value: [0.88888889 0.625      0.71428571 0.71428571 0.71428571 0.42857143
 0.44444444 0.75       0.75       0.75      ]

mean value: 0.6779761904761905

key: train_jcc
value: [1.         0.89705882 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9897058823529412

MCC on Blind test: -0.02

Accuracy on Blind test: 0.95

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01154399 0.01006269 0.00780892 0.00763559 0.00742674 0.00739622
 0.00755072 0.00743032 0.00746632 0.00746202]

mean value: 0.008178353309631348

key: score_time
value: [0.01060176 0.00935245 0.00819874 0.00818491 0.00788951 0.00785613
 0.00786829 0.00791764 0.00788474 0.0078702 ]

mean value: 0.008362436294555664

key: test_mcc
value: [0.75       0.62994079 0.73214286 0.49099025 0.87287156 0.87287156
 0.64465837 0.6000992  0.64465837 0.875     ]

mean value: 0.7113232961000079

key: train_mcc
value: [0.83832595 0.86849267 0.85434012 0.91240409 0.86868474 0.8978896
 0.88360693 0.82480818 0.86948194 0.8555278 ]

mean value: 0.8673562022561286

key: test_accuracy
value: [0.875      0.8125     0.86666667 0.73333333 0.93333333 0.93333333
 0.8        0.8        0.8        0.93333333]

mean value: 0.84875

key: train_accuracy
value: [0.91911765 0.93382353 0.9270073  0.95620438 0.93430657 0.94890511
 0.94160584 0.91240876 0.93430657 0.9270073 ]

mean value: 0.9334693001288106

key: test_fscore
value: [0.875      0.82352941 0.85714286 0.75       0.92307692 0.92307692
 0.84210526 0.82352941 0.84210526 0.93333333]

mean value: 0.8592899386475238

key: train_fscore
value: [0.91970803 0.9352518  0.92857143 0.95652174 0.9352518  0.94964029
 0.94202899 0.91176471 0.9352518  0.92857143]

mean value: 0.9342562000313209

key: test_precision
value: [0.875      0.77777778 0.85714286 0.66666667 1.         1.
 0.72727273 0.77777778 0.72727273 1.        ]

mean value: 0.8408910533910534

key: train_precision
value: [0.91304348 0.91549296 0.91549296 0.95652174 0.92857143 0.94285714
 0.92857143 0.91176471 0.91549296 0.90277778]

mean value: 0.9230586574290872

key: test_recall
value: [0.875      0.875      0.85714286 0.85714286 0.85714286 0.85714286
 1.         0.875      1.         0.875     ]

mean value: 0.8928571428571428

key: train_recall
value: [0.92647059 0.95588235 0.94202899 0.95652174 0.94202899 0.95652174
 0.95588235 0.91176471 0.95588235 0.95588235]

mean value: 0.9458866155157716

key: test_roc_auc
value: [0.875      0.8125     0.86607143 0.74107143 0.92857143 0.92857143
 0.78571429 0.79464286 0.78571429 0.9375    ]

mean value: 0.8455357142857143

key: train_roc_auc
value: [0.91911765 0.93382353 0.92689685 0.95620205 0.93424979 0.9488491
 0.94170929 0.91240409 0.93446292 0.92721654]

mean value: 0.933493179880648

key: test_jcc
value: [0.77777778 0.7        0.75       0.6        0.85714286 0.85714286
 0.72727273 0.7        0.72727273 0.875     ]

mean value: 0.7571608946608946

key: train_jcc
value: [0.85135135 0.87837838 0.86666667 0.91666667 0.87837838 0.90410959
 0.89041096 0.83783784 0.87837838 0.86666667]

mean value: 0.876884487226953

MCC on Blind test: 0.07

Accuracy on Blind test: 0.7

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
       'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.07331181 0.06256533 0.06086087 0.06129169 0.06339002 0.06102061
 0.06144905 0.06290483 0.06220293 0.06425667]

mean value: 0.0633253812789917

key: score_time
value: [0.00836086 0.00896025 0.00843644 0.00837779 0.00837159 0.00880218
 0.00834155 0.00841522 0.00857377 0.00888371]

mean value: 0.008552336692810058

key: test_mcc
value: [0.75       0.62994079 0.73214286 0.66143783 0.87287156 0.87287156
 0.64465837 0.6000992  0.64465837 0.875     ]

mean value: 0.7283680535735243

key: train_mcc
value: [0.83832595 0.87000211 0.88466669 0.91240409 0.86868474 0.89863497
 0.90025835 0.88476385 0.9139999  0.84173622]

mean value: 0.8813476865607188

key: test_accuracy
value: [0.875      0.8125     0.86666667 0.8        0.93333333 0.93333333
 0.8        0.8        0.8        0.93333333]

mean value: 0.8554166666666667

key: train_accuracy
value: [0.91911765 0.93382353 0.94160584 0.95620438 0.93430657 0.94890511
 0.94890511 0.94160584 0.95620438 0.91970803]

mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.940038643194504

key: test_fscore
value: [0.875      0.82352941 0.85714286 0.82352941 0.92307692 0.92307692
 0.84210526 0.82352941 0.84210526 0.93333333]

mean value: 0.8666428798239943

key: train_fscore
value: [0.91970803 0.93617021 0.94366197 0.95652174 0.9352518  0.95035461
 0.95035461 0.94285714 0.95714286 0.92198582]

mean value: 0.9414008786946603

key: test_precision
value: [0.875      0.77777778 0.85714286 0.7        1.         1.
 0.72727273 0.77777778 0.72727273 1.        ]

mean value: 0.8442243867243867

key: train_precision
value: [0.91304348 0.90410959 0.91780822 0.95652174 0.92857143 0.93055556
 0.91780822 0.91666667 0.93055556 0.89041096]

mean value: 0.920605141004188

key: test_recall
value: [0.875      0.875      0.85714286 1.         0.85714286 0.85714286
 1.         0.875      1.         0.875     ]

mean value: 0.9071428571428571

key: train_recall
value: [0.92647059 0.97058824 0.97101449 0.95652174 0.94202899 0.97101449
 0.98529412 0.97058824 0.98529412 0.95588235]

mean value: 0.9634697357203751

key: test_roc_auc
value: [0.875      0.8125     0.86607143 0.8125     0.92857143 0.92857143
 0.78571429 0.79464286 0.78571429 0.9375    ]

mean value: 0.8526785714285714

key: train_roc_auc
value: [0.91911765 0.93382353 0.9413896  0.95620205 0.93424979 0.94874254
 0.9491688  0.94181586 0.95641517 0.91997016]

mean value: 0.9400895140664962

key: test_jcc
value: [0.77777778 0.7        0.75       0.7        0.85714286 0.85714286
 0.72727273 0.7        0.72727273 0.875     ]

mean value: 0.7671608946608947

key: train_jcc
value: [0.85135135 0.88       0.89333333 0.91666667 0.87837838 0.90540541
 0.90540541 0.89189189 0.91780822 0.85526316]

mean value: 0.8895503809505252

MCC on Blind test: 0.06

Accuracy on Blind test: 0.66