LSHTM_analysis/scripts/ml/log_pnca_config.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 424

PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation

or_mychisq          102
log10_or_mychisq    102
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

No. of numerical features: 43
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True
Original Data
 Counter({1: 114, 0: 71}) Data dim: (185, 50)

-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (185, 50)
Test data size: (239, 50)
y_train numbers: Counter({1: 114, 0: 71})
y_train ratio: 0.6228070175438597

y_test_numbers: Counter({0: 120, 1: 119})
y_test ratio: 1.0084033613445378
-------------------------------------------------------------
Simple Random OverSampling
 Counter({0: 114, 1: 114})
(228, 50)
Simple Random UnderSampling
 Counter({0: 71, 1: 71})
(142, 50)
Simple Combined Over and UnderSampling
 Counter({0: 114, 1: 114})
(228, 50)
SMOTE_NC OverSampling
 Counter({0: 114, 1: 114})
(228, 50)

#####################################################################

Running ML analysis: UQ [without AA  index but with active site annotations]
Gene name: pncA
Drug name: pyrazinamide

Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/uq_v1/

Sanity checks:
Total input features: 50

Training data size: (185, 50)
Test data size: (239, 50)

Target feature numbers (training data): Counter({1: 114, 0: 71})
Target features ratio (training data: 0.6228070175438597

Target feature numbers (test data): Counter({0: 120, 1: 119})
Target features ratio (test data): 1.0084033613445378

#####################################################################


================================================================

Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01643085 0.01587701 0.01659703 0.01716876 0.0157814  0.01725698
 0.0168314  0.01593757 0.01981902 0.01665521]

mean value: 0.016835522651672364

key: score_time
value: [0.01110053 0.01039171 0.01039815 0.01039696 0.01042223 0.01038051
 0.01035452 0.01037598 0.01076961 0.01039314]

mean value: 0.010498332977294921

key: test_mcc
value: [0.33796318 0.58655573 0.28690229 0.67460105 0.6761234  0.64465837
 1.         0.12182898 0.67005939 0.52299758]

mean value: 0.5521689989382099

key: train_mcc
value: [0.78194719 0.69251873 0.70439866 0.69166175 0.69166175 0.72007099
 0.73268764 0.74454326 0.77164805 0.75735135]

mean value: 0.7288489368704532

key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.68421053 0.78947368 0.68421053 0.84210526 0.84210526 0.83333333
 1.         0.61111111 0.83333333 0.77777778]

mean value: 0.789766081871345

key: train_accuracy
value: [0.89759036 0.85542169 0.86144578 0.85542169 0.85542169 0.86826347
 0.8742515  0.88023952 0.89221557 0.88622754]

mean value: 0.8726498809609696

key: test_fscore
value: [0.75       0.81818182 0.76923077 0.86956522 0.88888889 0.86956522
 1.         0.72       0.88       0.83333333]

mean value: 0.8398765244417419

key: train_fscore
value: [0.9178744  0.88888889 0.89099526 0.88785047 0.88785047 0.89908257
 0.90322581 0.90654206 0.91666667 0.91079812]

mean value: 0.9009774700333214

key: test_precision
value: [0.69230769 0.9        0.71428571 0.90909091 0.8        0.83333333
 1.         0.64285714 0.78571429 0.76923077]

mean value: 0.8046819846819847

key: train_precision
value: [0.91346154 0.84210526 0.86238532 0.84821429 0.84821429 0.85217391
 0.85964912 0.87387387 0.87610619 0.88181818]

mean value: 0.8658001980381739

key: test_recall
value: [0.81818182 0.75       0.83333333 0.83333333 1.         0.90909091
 1.         0.81818182 1.         0.90909091]

mean value: 0.8871212121212121

key: train_recall
value: [0.9223301  0.94117647 0.92156863 0.93137255 0.93137255 0.95145631
 0.95145631 0.94174757 0.96116505 0.94174757]

mean value: 0.9395393108699791

key: test_roc_auc
value: [0.65909091 0.80357143 0.63095238 0.8452381  0.78571429 0.81168831
 1.         0.55194805 0.78571429 0.74025974]

mean value: 0.7614177489177489

key: train_roc_auc
value: [0.88973648 0.82996324 0.84359681 0.83287377 0.83287377 0.84291566
 0.85072816 0.86149879 0.87120752 0.86931129]

mean value: 0.8524705482921324

key: test_jcc
value: [0.6        0.69230769 0.625      0.76923077 0.8        0.76923077
 1.         0.5625     0.78571429 0.71428571]

mean value: 0.7318269230769231

key: train_jcc
value: [0.84821429 0.8        0.8034188  0.79831933 0.79831933 0.81666667
 0.82352941 0.82905983 0.84615385 0.8362069 ]

mean value: 0.8199888394792046

MCC on Blind test: 0.32

Accuracy on Blind test: 0.64

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.41101408 0.43322134 0.43520188 0.423666   0.40705562 0.42979956
 0.44227242 0.42043042 0.44399595 0.42415285]

mean value: 0.4270810127258301

key: score_time
value: [0.01099348 0.01108956 0.01145554 0.01103401 0.01119256 0.02155566
 0.01103187 0.01108336 0.0113132  0.01128364]

mean value: 0.012203288078308106

key: test_mcc
value: [0.45868247 0.54761905 0.88949918 0.80507649 1.         0.76623377
 0.71350607 0.52299758 0.67005939 0.4025974 ]

mean value: 0.6776271401537742

key: train_mcc
value: [0.93615116 0.87323164 0.8982762  0.88572497 0.91158328 1.
 0.87286094 0.89863369 0.94933931 0.98737524]

mean value: 0.9213176411447679

key: test_accuracy
value: [0.73684211 0.78947368 0.94736842 0.89473684 1.         0.88888889
 0.83333333 0.77777778 0.83333333 0.66666667]

mean value: 0.8368421052631578

key: train_accuracy
value: [0.96987952 0.93975904 0.95180723 0.94578313 0.95783133 1.
 0.94011976 0.95209581 0.9760479  0.99401198]

mean value: 0.9627335690065651

key: test_fscore
value: [0.8        0.83333333 0.96       0.90909091 1.         0.90909091
 0.84210526 0.83333333 0.88       0.66666667]

mean value: 0.8633620414673047

key: train_fscore
value: [0.97607656 0.95238095 0.96153846 0.9569378  0.96650718 1.
 0.95192308 0.96190476 0.98076923 0.99516908]

mean value: 0.9703207096742565

key: test_precision
value: [0.71428571 0.83333333 0.92307692 1.         1.         0.90909091
 1.         0.76923077 0.78571429 0.85714286]

mean value: 0.8791874791874792

key: train_precision
value: [0.96226415 0.92592593 0.94339623 0.93457944 0.94392523 1.
 0.94285714 0.94392523 0.97142857 0.99038462]

mean value: 0.9558686539496802

key: test_recall
value: [0.90909091 0.83333333 1.         0.83333333 1.         0.90909091
 0.72727273 0.90909091 1.         0.54545455]

mean value: 0.8666666666666667

key: train_recall
value: [0.99029126 0.98039216 0.98039216 0.98039216 0.99019608 1.
 0.96116505 0.98058252 0.99029126 1.        ]

mean value: 0.9853702646106987

key: test_roc_auc
value: [0.70454545 0.77380952 0.92857143 0.91666667 1.         0.88311688
 0.86363636 0.74025974 0.78571429 0.7012987 ]

mean value: 0.8297619047619048

key: train_roc_auc
value: [0.9633996  0.92769608 0.94332108 0.93550858 0.94822304 1.
 0.93370752 0.94341626 0.97170813 0.9921875 ]

mean value: 0.9559167791307461

key: test_jcc
value: [0.66666667 0.71428571 0.92307692 0.83333333 1.         0.83333333
 0.72727273 0.71428571 0.78571429 0.5       ]

mean value: 0.7697968697968698

key: train_jcc
value: [0.95327103 0.90909091 0.92592593 0.91743119 0.93518519 1.
 0.90825688 0.9266055  0.96226415 0.99038462]

mean value: 0.9428415392549067

MCC on Blind test: 0.2

Accuracy on Blind test: 0.59

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00974631 0.00932074 0.0071764  0.0069828  0.00684166 0.0073626
 0.00683832 0.00736499 0.00704551 0.00715041]

mean value: 0.007582974433898926

key: score_time
value: [0.01076746 0.01019335 0.00825691 0.00814319 0.00808716 0.00803375
 0.00830436 0.00816035 0.00810671 0.00811577]

mean value: 0.008616900444030762

key: test_mcc
value: [ 0.5077524   0.26772484 -0.12677314  0.40849122  0.09356015  0.39594419
  0.44320263  0.0805823   0.0805823   0.56061191]

mean value: 0.2711678789936081

key: train_mcc
value: [0.39956942 0.36799004 0.44276724 0.40782666 0.39882278 0.42873208
 0.40887563 0.43322852 0.42873208 0.41898177]

mean value: 0.41355262056408115

key: test_accuracy
value: [0.73684211 0.68421053 0.52631579 0.73684211 0.63157895 0.72222222
 0.72222222 0.61111111 0.61111111 0.77777778]

mean value: 0.6760233918128655

key: train_accuracy
value: [0.72289157 0.69277108 0.74096386 0.72289157 0.71686747 0.73053892
 0.7245509  0.73652695 0.73053892 0.73053892]

mean value: 0.7249080152947118

key: test_fscore
value: [0.81481481 0.78571429 0.66666667 0.81481481 0.75862069 0.8
 0.81481481 0.74074074 0.74074074 0.84615385]

mean value: 0.7783081414115897

key: train_fscore
value: [0.81147541 0.8        0.81702128 0.80991736 0.80816327 0.81632653
 0.81147541 0.81666667 0.81632653 0.81327801]

mean value: 0.8120650453135811

key: test_precision
value: [0.6875     0.6875     0.6        0.73333333 0.64705882 0.71428571
 0.6875     0.625      0.625      0.73333333]

mean value: 0.6740511204481793

key: train_precision
value: [0.70212766 0.66666667 0.72180451 0.7        0.69230769 0.70422535
 0.70212766 0.71532847 0.70422535 0.71014493]

mean value: 0.7018958288316359

key: test_recall
value: [1.         0.91666667 0.75       0.91666667 0.91666667 0.90909091
 1.         0.90909091 0.90909091 1.        ]

mean value: 0.9227272727272727

key: train_recall
value: [0.96116505 1.         0.94117647 0.96078431 0.97058824 0.97087379
 0.96116505 0.95145631 0.97087379 0.95145631]

mean value: 0.9639539310869979

key: test_roc_auc
value: [0.6875     0.60119048 0.44642857 0.67261905 0.5297619  0.66883117
 0.64285714 0.52597403 0.52597403 0.71428571]

mean value: 0.6015422077922078

key: train_roc_auc
value: [0.64724919 0.6015625  0.68152574 0.65226716 0.64154412 0.65731189
 0.65245752 0.67104066 0.65731189 0.66322816]

mean value: 0.6525498822101656

key: test_jcc
value: [0.6875     0.64705882 0.5        0.6875     0.61111111 0.66666667
 0.6875     0.58823529 0.58823529 0.73333333]

mean value: 0.6397140522875817

key: train_jcc
value: [0.68275862 0.66666667 0.69064748 0.68055556 0.67808219 0.68965517
 0.68275862 0.69014085 0.68965517 0.68531469]

mean value: 0.6836235012609437

MCC on Blind test: 0.44

Accuracy on Blind test: 0.69

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00754261 0.00742865 0.00720763 0.00705886 0.00744486 0.00740099
 0.00711918 0.00755024 0.00717258 0.00742507]

mean value: 0.007335066795349121

key: score_time
value: [0.00887275 0.00809169 0.00855422 0.00822783 0.00839949 0.0080905
 0.00788832 0.00815272 0.00828528 0.00825739]

mean value: 0.008282017707824708

key: test_mcc
value: [ 0.21660006  0.32142857  0.23262105  0.28690229  0.28690229  0.43320011
  0.16116459 -0.24029619  0.40291148  0.40291148]

mean value: 0.2504345746462975

key: train_mcc
value: [0.34619876 0.33098314 0.29538063 0.35569507 0.35404664 0.3240165
 0.35981593 0.37214605 0.27958995 0.33041139]

mean value: 0.3348284059138056

key: test_accuracy
value: [0.63157895 0.68421053 0.63157895 0.68421053 0.68421053 0.72222222
 0.61111111 0.44444444 0.72222222 0.72222222]

mean value: 0.6538011695906433

key: train_accuracy
value: [0.69879518 0.69277108 0.6746988  0.70481928 0.69879518 0.68862275
 0.70658683 0.71257485 0.67065868 0.68862275]

mean value: 0.6936945386335762

key: test_fscore
value: [0.72       0.75       0.69565217 0.76923077 0.76923077 0.76190476
 0.69565217 0.58333333 0.7826087  0.7826087 ]

mean value: 0.7310221372830068

key: train_fscore
value: [0.76635514 0.76497696 0.74766355 0.77625571 0.76190476 0.75925926
 0.77625571 0.78181818 0.74885845 0.75471698]

mean value: 0.7638064697242107

key: test_precision
value: [0.64285714 0.75       0.72727273 0.71428571 0.71428571 0.8
 0.66666667 0.53846154 0.75       0.75      ]

mean value: 0.7053829503829504

key: train_precision
value: [0.73873874 0.72173913 0.71428571 0.72649573 0.74074074 0.72566372
 0.73275862 0.73504274 0.70689655 0.73394495]

mean value: 0.7276306629094831

key: test_recall
value: [0.81818182 0.75       0.66666667 0.83333333 0.83333333 0.72727273
 0.72727273 0.63636364 0.81818182 0.81818182]

mean value: 0.7628787878787879

key: train_recall
value: [0.7961165  0.81372549 0.78431373 0.83333333 0.78431373 0.7961165
 0.82524272 0.83495146 0.7961165  0.77669903]

mean value: 0.8040928992956405

key: test_roc_auc
value: [0.59659091 0.66071429 0.61904762 0.63095238 0.63095238 0.72077922
 0.57792208 0.38961039 0.69480519 0.69480519]

mean value: 0.6216179653679654

key: train_roc_auc
value: [0.66789952 0.65686275 0.64215686 0.66666667 0.67340686 0.65587075
 0.67043386 0.67528823 0.63243325 0.66178701]

mean value: 0.6602805766319473

key: test_jcc
value: [0.5625     0.6        0.53333333 0.625      0.625      0.61538462
 0.53333333 0.41176471 0.64285714 0.64285714]

mean value: 0.5792030273647921

key: train_jcc
value: [0.62121212 0.61940299 0.59701493 0.63432836 0.61538462 0.6119403
 0.63432836 0.64179104 0.59854015 0.60606061]

mean value: 0.6180003458791998

MCC on Blind test: 0.51

Accuracy on Blind test: 0.74

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00691724 0.00921154 0.00725603 0.00683308 0.00641084 0.0067122
 0.00755239 0.00700617 0.00747585 0.00672269]

mean value: 0.00720980167388916

key: score_time
value: [0.04755116 0.03781438 0.01461935 0.01340771 0.01276135 0.01304817
 0.01399469 0.01276302 0.01416636 0.01354003]

mean value: 0.01936662197113037

key: test_mcc
value: [ 0.33796318  0.14085904  0.32142857 -0.33071891 -0.20865621  0.12182898
 -0.02548236  0.2987013   0.12182898  0.53246753]

mean value: 0.13102201054732363

key: train_mcc
value: [0.51724228 0.58603243 0.6140767  0.51866448 0.57255314 0.57404517
 0.54744208 0.57404517 0.6296076  0.53388143]

mean value: 0.5667590493666902

key: test_accuracy
value: [0.68421053 0.63157895 0.68421053 0.47368421 0.47368421 0.61111111
 0.5        0.66666667 0.61111111 0.77777778]

mean value: 0.6114035087719298

key: train_accuracy
value: [0.77710843 0.80722892 0.81927711 0.77710843 0.80120482 0.80239521
 0.79041916 0.80239521 0.82634731 0.78443114]

mean value: 0.7987915734795469

key: test_fscore
value: [0.75       0.74074074 0.75       0.64285714 0.61538462 0.72
 0.57142857 0.72727273 0.72       0.81818182]

mean value: 0.7055865615865616

key: train_fscore
value: [0.83842795 0.85321101 0.86363636 0.83257919 0.84651163 0.85067873
 0.84304933 0.85067873 0.86995516 0.83486239]

mean value: 0.8483590469525649

key: test_precision
value: [0.69230769 0.66666667 0.75       0.5625     0.57142857 0.64285714
 0.6        0.72727273 0.64285714 0.81818182]

mean value: 0.6674071761571762

key: train_precision
value: [0.76190476 0.80172414 0.80508475 0.77310924 0.80530973 0.79661017
 0.78333333 0.79661017 0.80833333 0.79130435]

mean value: 0.7923323977285066

key: test_recall
value: [0.81818182 0.83333333 0.75       0.75       0.66666667 0.81818182
 0.54545455 0.72727273 0.81818182 0.81818182]

mean value: 0.7545454545454545

key: train_recall
value: [0.93203883 0.91176471 0.93137255 0.90196078 0.89215686 0.91262136
 0.91262136 0.91262136 0.94174757 0.88349515]

mean value: 0.9132400533028746

key: test_roc_auc
value: [0.65909091 0.55952381 0.66071429 0.375      0.4047619  0.55194805
 0.48701299 0.64935065 0.55194805 0.76623377]

mean value: 0.5665584415584416

key: train_roc_auc
value: [0.72792418 0.77619485 0.78599877 0.74004289 0.77420343 0.76881068
 0.75318568 0.76881068 0.79118629 0.75424757]

mean value: 0.7640605028419135

key: test_jcc
value: [0.6        0.58823529 0.6        0.47368421 0.44444444 0.5625
 0.4        0.57142857 0.5625     0.69230769]

mean value: 0.5495100212824671

key: train_jcc
value: [0.72180451 0.744      0.76       0.71317829 0.73387097 0.74015748
 0.72868217 0.74015748 0.76984127 0.71653543]

mean value: 0.7368227607678467

MCC on Blind test: 0.22

Accuracy on Blind test: 0.6

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00955367 0.00924325 0.00820327 0.00793886 0.00868225 0.00840187
 0.00826693 0.00931406 0.00927305 0.00903201]

mean value: 0.008790922164916993

key: score_time
value: [0.0091145  0.00844264 0.00794554 0.00818419 0.00846839 0.00918674
 0.00848913 0.00863361 0.00857615 0.00812507]

mean value: 0.008516597747802734

key: test_mcc
value: [ 0.34405118  0.14085904 -0.03149704  0.14085904  0.3086067   0.56061191
  0.44320263  0.0805823   0.3040345   0.56061191]

mean value: 0.2851922165850045

key: train_mcc
value: [0.65495721 0.59292706 0.63691667 0.64636933 0.56076174 0.57399753
 0.57517958 0.70283753 0.55505316 0.64203075]

mean value: 0.6141030557952815

key: test_accuracy
value: [0.68421053 0.63157895 0.57894737 0.63157895 0.68421053 0.77777778
 0.72222222 0.61111111 0.66666667 0.77777778]

mean value: 0.6766081871345029

key: train_accuracy
value: [0.8313253  0.79518072 0.8253012  0.8253012  0.78313253 0.79041916
 0.79640719 0.85628743 0.78443114 0.82634731]

mean value: 0.8114133179424284

key: test_fscore
value: [0.76923077 0.74074074 0.71428571 0.74074074 0.8        0.84615385
 0.81481481 0.74074074 0.78571429 0.84615385]

mean value: 0.7798575498575498

key: train_fscore
value: [0.87931034 0.85714286 0.8722467  0.87445887 0.8487395  0.85355649
 0.85470085 0.89380531 0.8487395  0.87445887]

mean value: 0.8657159288311089

key: test_precision
value: [0.66666667 0.66666667 0.625      0.66666667 0.66666667 0.73333333
 0.6875     0.625      0.64705882 0.73333333]

mean value: 0.6717892156862745

key: train_precision
value: [0.79069767 0.75       0.792      0.78294574 0.74264706 0.75
 0.76335878 0.82113821 0.74814815 0.7890625 ]

mean value: 0.7729998107832459

key: test_recall
value: [0.90909091 0.83333333 0.83333333 0.83333333 1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9318181818181819

key: train_recall
value: [0.99029126 1.         0.97058824 0.99019608 0.99019608 0.99029126
 0.97087379 0.98058252 0.98058252 0.98058252]

mean value: 0.9844184275652008

key: test_roc_auc
value: [0.64204545 0.55952381 0.48809524 0.55952381 0.57142857 0.71428571
 0.64285714 0.52597403 0.57142857 0.71428571]

mean value: 0.5989448051948052

key: train_roc_auc
value: [0.78085992 0.734375   0.78216912 0.77634804 0.72166054 0.72952063
 0.74324939 0.81841626 0.72466626 0.77935376]

mean value: 0.759061892354029

key: test_jcc
value: [0.625      0.58823529 0.55555556 0.58823529 0.66666667 0.73333333
 0.6875     0.58823529 0.64705882 0.73333333]

mean value: 0.6413153594771241

key: train_jcc
value: [0.78461538 0.75       0.7734375  0.77692308 0.73722628 0.74452555
 0.74626866 0.808      0.73722628 0.77692308]

mean value: 0.7635145797367737

MCC on Blind test: 0.42

Accuracy on Blind test: 0.67

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.67604995 0.61749172 0.58538461 0.58932328 0.74973798 0.71265984
 0.60874438 0.61769533 0.61801505 0.55964708]

mean value: 0.6334749221801758

key: score_time
value: [0.01328945 0.01198721 0.01105618 0.0122695  0.01297355 0.01261806
 0.01269841 0.01275897 0.01224279 0.01214409]

mean value: 0.01240382194519043

key: test_mcc
value: [0.45868247 0.28690229 0.67460105 0.45361105 0.88949918 0.64465837
 0.71350607 0.12182898 0.2548236  0.2987013 ]

mean value: 0.4796814363849572

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 0.68421053 0.84210526 0.73684211 0.94736842 0.83333333
 0.83333333 0.61111111 0.66666667 0.66666667]

mean value: 0.7558479532163742

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        0.76923077 0.86956522 0.7826087  0.96       0.86956522
 0.84210526 0.72       0.76923077 0.72727273]

mean value: 0.8109578659326944

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.71428571 0.90909091 0.81818182 0.92307692 0.83333333
 1.         0.64285714 0.66666667 0.72727273]

mean value: 0.7949050949050949

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.83333333 0.83333333 0.75       1.         0.90909091
 0.72727273 0.81818182 0.90909091 0.72727273]

mean value: 0.8416666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.70454545 0.63095238 0.8452381  0.73214286 0.92857143 0.81168831
 0.86363636 0.55194805 0.5974026  0.64935065]

mean value: 0.7315476190476191

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.625      0.76923077 0.64285714 0.92307692 0.76923077
 0.72727273 0.5625     0.625      0.57142857]

mean value: 0.688226356976357

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.65

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01134348 0.01067019 0.00884771 0.00852466 0.0083487  0.00794363
 0.00782943 0.00811911 0.00781536 0.0099113 ]

mean value: 0.008935356140136718

key: score_time
value: [0.01315331 0.00911045 0.00869799 0.00861573 0.00856686 0.00791645
 0.00785732 0.00792217 0.00788569 0.00925827]

mean value: 0.008898425102233886

key: test_mcc
value: [0.45361105 0.89559105 1.         0.89559105 0.67460105 0.66254135
 0.89188259 0.26856633 0.88640526 0.76623377]

mean value: 0.7395023497912928

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 0.94736842 1.         0.94736842 0.84210526 0.83333333
 0.94444444 0.66666667 0.94444444 0.88888889]

mean value: 0.8751461988304093

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7826087  0.95652174 1.         0.95652174 0.86956522 0.85714286
 0.95238095 0.75       0.95652174 0.90909091]

mean value: 0.8990353849049502

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       1.         1.         1.         0.90909091 0.9
 1.         0.69230769 0.91666667 0.90909091]

mean value: 0.9077156177156177

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.91666667 1.         0.91666667 0.83333333 0.81818182
 0.90909091 0.81818182 1.         0.90909091]

mean value: 0.8939393939393939

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.72159091 0.95833333 1.         0.95833333 0.8452381  0.83766234
 0.95454545 0.62337662 0.92857143 0.88311688]

mean value: 0.8710768398268398

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.64285714 0.91666667 1.         0.91666667 0.76923077 0.75
 0.90909091 0.6        0.91666667 0.83333333]

mean value: 0.8254512154512155

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.51

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08982825 0.08871984 0.08893657 0.08521676 0.08517504 0.09118032
 0.09140134 0.09060311 0.08537102 0.08313799]

mean value: 0.08795702457427979

key: score_time
value: [0.01784706 0.01710677 0.01794147 0.01741266 0.0171783  0.01783466
 0.01772738 0.01788592 0.01994085 0.01653624]

mean value: 0.017741131782531738

key: test_mcc
value: [0.33796318 0.65477023 0.65477023 0.54761905 0.88949918 0.76623377
 0.88640526 0.26856633 0.67005939 0.77742884]

mean value: 0.6453315461934368

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.68421053 0.84210526 0.84210526 0.78947368 0.94736842 0.88888889
 0.94444444 0.66666667 0.83333333 0.88888889]

mean value: 0.8327485380116959

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.88       0.88       0.83333333 0.96       0.90909091
 0.95652174 0.75       0.88       0.91666667]

mean value: 0.8715612648221344

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.69230769 0.84615385 0.84615385 0.83333333 0.92307692 0.90909091
 0.91666667 0.69230769 0.78571429 0.84615385]

mean value: 0.8290959040959041

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.91666667 0.91666667 0.83333333 1.         0.90909091
 1.         0.81818182 1.         1.        ]

mean value: 0.9212121212121213

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.65909091 0.81547619 0.81547619 0.77380952 0.92857143 0.88311688
 0.92857143 0.62337662 0.78571429 0.85714286]

mean value: 0.8070346320346321

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.78571429 0.78571429 0.71428571 0.92307692 0.83333333
 0.91666667 0.6        0.78571429 0.84615385]

mean value: 0.779065934065934

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.36

Accuracy on Blind test: 0.65

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0070622  0.00688052 0.00693893 0.00693154 0.00710917 0.00692058
 0.00734472 0.0074923  0.00760174 0.00686693]

mean value: 0.007114863395690918

key: score_time
value: [0.00797033 0.00801826 0.00802183 0.0084672  0.00798321 0.00860476
 0.00797367 0.00884962 0.0084374  0.00873232]

mean value: 0.008305859565734864

key: test_mcc
value: [ 0.4719399   0.20935895  0.32142857  0.01163105  0.0952381  -0.06493506
  0.20385888  0.11396058 -0.0805823   0.2548236 ]

mean value: 0.1536722257776471

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 0.57894737 0.68421053 0.52631579 0.57894737 0.44444444
 0.61111111 0.55555556 0.5        0.66666667]

mean value: 0.5883040935672514

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.76190476 0.6        0.75       0.60869565 0.66666667 0.44444444
 0.66666667 0.6        0.60869565 0.76923077]

mean value: 0.6476304613261135

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        0.75       0.75       0.63636364 0.66666667 0.57142857
 0.7        0.66666667 0.58333333 0.66666667]

mean value: 0.6791125541125541

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.5        0.75       0.58333333 0.66666667 0.36363636
 0.63636364 0.54545455 0.63636364 0.90909091]

mean value: 0.6318181818181818

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73863636 0.60714286 0.66071429 0.50595238 0.54761905 0.46753247
 0.6038961  0.55844156 0.46103896 0.5974026 ]

mean value: 0.5748376623376623

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.61538462 0.42857143 0.6        0.4375     0.5        0.28571429
 0.5        0.42857143 0.4375     0.625     ]

mean value: 0.4858241758241758

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.15

Accuracy on Blind test: 0.57

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.08540225 1.07859063 1.06257463 1.06920266 1.05264735 1.05303788
 1.1160934  1.0755322  1.05426216 1.04404473]

mean value: 1.0691387891769408

key: score_time
value: [0.09499049 0.09243846 0.09022665 0.09180641 0.08709741 0.08691168
 0.08712554 0.08879185 0.0880568  0.08689451]

mean value: 0.08943397998809814

key: test_mcc
value: [0.45868247 1.         1.         0.77380952 0.88949918 0.76623377
 1.         0.56061191 0.88640526 0.64465837]

mean value: 0.7979900484560085

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 1.         1.         0.89473684 0.94736842 0.88888889
 1.         0.77777778 0.94444444 0.83333333]

mean value: 0.9023391812865497

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        1.         1.         0.91666667 0.96       0.90909091
 1.         0.84615385 0.95652174 0.86956522]

mean value: 0.9257998378433161

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 1.         1.         0.91666667 0.92307692 0.90909091
 1.         0.73333333 0.91666667 0.83333333]

mean value: 0.8946453546453547

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         1.         0.91666667 1.         0.90909091
 1.         1.         1.         0.90909091]

mean value: 0.9643939393939394

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.70454545 1.         1.         0.88690476 0.92857143 0.88311688
 1.         0.71428571 0.92857143 0.81168831]

mean value: 0.8857683982683983

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 1.         1.         0.84615385 0.92307692 0.83333333
 1.         0.73333333 0.91666667 0.76923077]

mean value: 0.8688461538461538

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.17

Accuracy on Blind test: 0.56

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.74963999 0.88859892 0.84024143 0.96218228 0.93069863 0.91661644
 0.85723209 0.860641   0.84781289 0.82444263]

mean value: 0.9678106307983398

key: score_time
value: [0.22211456 0.18948603 0.20391059 0.21200871 0.21816325 0.22368956
 0.13365841 0.19525099 0.19360924 0.23414063]

mean value: 0.20260319709777833

key: test_mcc
value: [0.60553007 0.89559105 0.88949918 0.77380952 0.88949918 0.76623377
 1.         0.39594419 0.88640526 0.77742884]

mean value: 0.7879941063317681

key: train_mcc
value: [0.89849587 0.86235326 0.8501742  0.86235326 0.87457979 0.86499607
 0.86279135 0.89953068 0.87498674 0.8872319 ]

mean value: 0.8737493106163656

key: test_accuracy
value: [0.78947368 0.94736842 0.94736842 0.89473684 0.94736842 0.88888889
 1.         0.72222222 0.94444444 0.88888889]

mean value: 0.8970760233918128

key: train_accuracy
value: [0.95180723 0.93373494 0.92771084 0.93373494 0.93975904 0.93413174
 0.93413174 0.95209581 0.94011976 0.94610778]

mean value: 0.9393333814299113

key: test_fscore
value: [0.84615385 0.95652174 0.96       0.91666667 0.96       0.90909091
 1.         0.8        0.95652174 0.91666667]

mean value: 0.9221621566838958

key: train_fscore
value: [0.96226415 0.94835681 0.94392523 0.94835681 0.95283019 0.94930876
 0.94883721 0.96226415 0.95327103 0.95774648]

mean value: 0.9527160811207689

key: test_precision
value: [0.73333333 1.         0.92307692 0.91666667 0.92307692 0.90909091
 1.         0.71428571 0.91666667 0.84615385]

mean value: 0.8882350982350983

key: train_precision
value: [0.93577982 0.90990991 0.90178571 0.90990991 0.91818182 0.90350877
 0.91071429 0.93577982 0.91891892 0.92727273]

mean value: 0.9171761689150632

key: test_recall
value: [1.         0.91666667 1.         0.91666667 1.         0.90909091
 1.         0.90909091 1.         1.        ]

mean value: 0.9651515151515151

key: train_recall
value: [0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 1.
 0.99029126 0.99029126 0.99029126 0.99029126]

mean value: 0.9912240624405102

key: test_roc_auc
value: [0.75       0.95833333 0.92857143 0.88690476 0.92857143 0.88311688
 1.         0.66883117 0.92857143 0.85714286]

mean value: 0.879004329004329

key: train_roc_auc
value: [0.93959008 0.91697304 0.90916054 0.91697304 0.92478554 0.9140625
 0.91702063 0.94045813 0.92483313 0.93264563]

mean value: 0.9236502256646996

key: test_jcc
value: [0.73333333 0.91666667 0.92307692 0.84615385 0.92307692 0.83333333
 1.         0.66666667 0.91666667 0.84615385]

mean value: 0.8605128205128205

key: train_jcc
value: [0.92727273 0.90178571 0.89380531 0.90178571 0.90990991 0.90350877
 0.90265487 0.92727273 0.91071429 0.91891892]

mean value: 0.9097628946580972

MCC on Blind test: 0.25

Accuracy on Blind test: 0.59

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00777102 0.00735879 0.00705314 0.00707221 0.00719404 0.00715971
 0.00785017 0.00709414 0.00755572 0.00760627]

mean value: 0.00737152099609375

key: score_time
value: [0.00856996 0.00846505 0.00813699 0.00842476 0.00811648 0.00864673
 0.00846338 0.00875568 0.00872493 0.00881457]

mean value: 0.008511853218078614

key: test_mcc
value: [ 0.21660006  0.32142857  0.23262105  0.28690229  0.28690229  0.43320011
  0.16116459 -0.24029619  0.40291148  0.40291148]

mean value: 0.2504345746462975

key: train_mcc
value: [0.34619876 0.33098314 0.29538063 0.35569507 0.35404664 0.3240165
 0.35981593 0.37214605 0.27958995 0.33041139]

mean value: 0.3348284059138056

key: test_accuracy
value: [0.63157895 0.68421053 0.63157895 0.68421053 0.68421053 0.72222222
 0.61111111 0.44444444 0.72222222 0.72222222]

mean value: 0.6538011695906433

key: train_accuracy
value: [0.69879518 0.69277108 0.6746988  0.70481928 0.69879518 0.68862275
 0.70658683 0.71257485 0.67065868 0.68862275]

mean value: 0.6936945386335762

key: test_fscore
value: [0.72       0.75       0.69565217 0.76923077 0.76923077 0.76190476
 0.69565217 0.58333333 0.7826087  0.7826087 ]

mean value: 0.7310221372830068

key: train_fscore
value: [0.76635514 0.76497696 0.74766355 0.77625571 0.76190476 0.75925926
 0.77625571 0.78181818 0.74885845 0.75471698]

mean value: 0.7638064697242107

key: test_precision
value: [0.64285714 0.75       0.72727273 0.71428571 0.71428571 0.8
 0.66666667 0.53846154 0.75       0.75      ]

mean value: 0.7053829503829504

key: train_precision
value: [0.73873874 0.72173913 0.71428571 0.72649573 0.74074074 0.72566372
 0.73275862 0.73504274 0.70689655 0.73394495]

mean value: 0.7276306629094831

key: test_recall
value: [0.81818182 0.75       0.66666667 0.83333333 0.83333333 0.72727273
 0.72727273 0.63636364 0.81818182 0.81818182]

mean value: 0.7628787878787879

key: train_recall
value: [0.7961165  0.81372549 0.78431373 0.83333333 0.78431373 0.7961165
 0.82524272 0.83495146 0.7961165  0.77669903]

mean value: 0.8040928992956405

key: test_roc_auc
value: [0.59659091 0.66071429 0.61904762 0.63095238 0.63095238 0.72077922
 0.57792208 0.38961039 0.69480519 0.69480519]

mean value: 0.6216179653679654

key: train_roc_auc
value: [0.66789952 0.65686275 0.64215686 0.66666667 0.67340686 0.65587075
 0.67043386 0.67528823 0.63243325 0.66178701]

mean value: 0.6602805766319473

key: test_jcc
value: [0.5625     0.6        0.53333333 0.625      0.625      0.61538462
 0.53333333 0.41176471 0.64285714 0.64285714]

mean value: 0.5792030273647921

key: train_jcc
value: [0.62121212 0.61940299 0.59701493 0.63432836 0.61538462 0.6119403
 0.63432836 0.64179104 0.59854015 0.60606061]

mean value: 0.6180003458791998

MCC on Blind test: 0.51

Accuracy on Blind test: 0.74

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07428098 0.04616737 0.04378724 0.04078841 0.05201912 0.16673613
 0.0367384  0.03498602 0.03795505 0.03922486]

mean value: 0.0572683572769165

key: score_time
value: [0.0104301  0.0102849  0.01059461 0.01035333 0.01026797 0.01001692
 0.00953746 0.00964141 0.00958657 0.00953507]

mean value: 0.010024833679199218

key: test_mcc
value: [0.56729535 0.88949918 0.89559105 1.         0.77380952 0.76623377
 1.         0.39594419 0.88640526 0.66254135]

mean value: 0.7837319665122223

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.94736842 0.94736842 1.         0.89473684 0.88888889
 1.         0.72222222 0.94444444 0.83333333]

mean value: 0.8967836257309941

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.96       0.95652174 1.         0.91666667 0.90909091
 1.         0.8        0.95652174 0.85714286]

mean value: 0.9189277244494636

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.76923077 0.92307692 1.         1.         0.91666667 0.90909091
 1.         0.71428571 0.91666667 0.9       ]

mean value: 0.9049017649017649

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         0.91666667 1.         0.91666667 0.90909091
 1.         0.90909091 1.         0.81818182]

mean value: 0.9378787878787879

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.76704545 0.92857143 0.95833333 1.         0.88690476 0.88311688
 1.         0.66883117 0.92857143 0.83766234]

mean value: 0.8859036796536797

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.92307692 0.91666667 1.         0.84615385 0.83333333
 1.         0.66666667 0.91666667 0.75      ]

mean value: 0.8566849816849816

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.52

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01267529 0.01247311 0.01756597 0.03184366 0.03098655 0.03128266
 0.03108382 0.03088689 0.03076506 0.03127789]

mean value: 0.026084089279174806

key: score_time
value: [0.01049232 0.01059413 0.02069259 0.01077557 0.01968479 0.0106771
 0.02043557 0.01927018 0.01058149 0.02066064]

mean value: 0.015386438369750977

key: test_mcc
value: [0.45361105 0.67460105 0.88949918 0.89559105 0.89559105 0.89188259
 0.79772404 0.53246753 0.56061191 0.56980288]

mean value: 0.7161382335698945

key: train_mcc
value: [0.92325474 0.82122399 0.84675102 0.83387364 0.84675102 0.84729198
 0.87296284 0.86004923 0.89835373 0.86032048]

mean value: 0.8610832667133086

key: test_accuracy
value: [0.73684211 0.84210526 0.94736842 0.94736842 0.94736842 0.94444444
 0.88888889 0.77777778 0.77777778 0.77777778]

mean value: 0.8587719298245614

key: train_accuracy
value: [0.96385542 0.91566265 0.92771084 0.92168675 0.92771084 0.92814371
 0.94011976 0.93413174 0.95209581 0.93413174]

mean value: 0.9345249260515114

key: test_fscore
value: [0.7826087  0.86956522 0.96       0.95652174 0.95652174 0.95238095
 0.9        0.81818182 0.84615385 0.8       ]

mean value: 0.8841934008020964

key: train_fscore
value: [0.97087379 0.93203883 0.94230769 0.93719807 0.94230769 0.94285714
 0.95238095 0.94736842 0.96153846 0.9468599 ]

mean value: 0.9475730954818289

key: test_precision
value: [0.75       0.90909091 0.92307692 1.         1.         1.
 1.         0.81818182 0.73333333 0.88888889]

mean value: 0.9022571872571873

key: train_precision
value: [0.97087379 0.92307692 0.9245283  0.92380952 0.9245283  0.92523364
 0.93457944 0.93396226 0.95238095 0.94230769]

mean value: 0.9355280830019537

key: test_recall
value: [0.81818182 0.83333333 1.         0.91666667 0.91666667 0.90909091
 0.81818182 0.81818182 1.         0.72727273]

mean value: 0.8757575757575757

key: train_recall
value: [0.97087379 0.94117647 0.96078431 0.95098039 0.96078431 0.96116505
 0.97087379 0.96116505 0.97087379 0.95145631]

mean value: 0.960013325718637

key: test_roc_auc
value: [0.72159091 0.8452381  0.92857143 0.95833333 0.95833333 0.95454545
 0.90909091 0.76623377 0.71428571 0.79220779]

mean value: 0.8548430735930737

key: train_roc_auc
value: [0.96162737 0.90808824 0.91789216 0.9129902  0.91789216 0.91808252
 0.93074939 0.92589502 0.94637439 0.92885316]

mean value: 0.9268444604783661

key: test_jcc
value: [0.64285714 0.76923077 0.92307692 0.91666667 0.91666667 0.90909091
 0.81818182 0.69230769 0.73333333 0.66666667]

mean value: 0.7988078588078588

key: train_jcc
value: [0.94339623 0.87272727 0.89090909 0.88181818 0.89090909 0.89189189
 0.90909091 0.9        0.92592593 0.89908257]

mean value: 0.9005751158494797

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00939846 0.00705338 0.00695395 0.00734782 0.00737906 0.00728893
 0.00739002 0.00729275 0.00736165 0.00728703]

mean value: 0.00747530460357666

key: score_time
value: [0.01311707 0.00814056 0.00845981 0.00825524 0.00828815 0.00828743
 0.00835443 0.00829411 0.00827003 0.00832057]

mean value: 0.008778738975524902

key: test_mcc
value: [0.60553007 0.32142857 0.14085904 0.28690229 0.26772484 0.52299758
 0.44320263 0.0805823  0.0805823  0.56061191]

mean value: 0.33104215300057266

key: train_mcc
value: [0.34161624 0.39993512 0.3929602  0.3794614  0.42213076 0.39858139
 0.42337541 0.42542126 0.32037061 0.3808643 ]

mean value: 0.3884716694211574

key: test_accuracy
value: [0.78947368 0.68421053 0.63157895 0.68421053 0.68421053 0.77777778
 0.72222222 0.61111111 0.61111111 0.77777778]

mean value: 0.6973684210526316

key: train_accuracy
value: [0.70481928 0.72289157 0.72289157 0.71686747 0.73493976 0.7245509
 0.73652695 0.73652695 0.69461078 0.71856287]

mean value: 0.7213188081667989

key: test_fscore
value: [0.84615385 0.75       0.74074074 0.76923077 0.78571429 0.83333333
 0.81481481 0.74074074 0.74074074 0.84615385]

mean value: 0.7867623117623117

key: train_fscore
value: [0.79324895 0.80672269 0.79824561 0.79828326 0.8018018  0.80672269
 0.80869565 0.81196581 0.78297872 0.79295154]

mean value: 0.8001616730332606

key: test_precision
value: [0.73333333 0.75       0.66666667 0.71428571 0.6875     0.76923077
 0.6875     0.625      0.625      0.73333333]

mean value: 0.6991849816849817

key: train_precision
value: [0.70149254 0.70588235 0.72222222 0.70992366 0.74166667 0.71111111
 0.73228346 0.72519084 0.6969697  0.72580645]

mean value: 0.7172549007220933

key: test_recall
value: [1.         0.75       0.83333333 0.83333333 0.91666667 0.90909091
 1.         0.90909091 0.90909091 1.        ]

mean value: 0.906060606060606

key: train_recall
value: [0.91262136 0.94117647 0.89215686 0.91176471 0.87254902 0.93203883
 0.90291262 0.9223301  0.89320388 0.87378641]

mean value: 0.9054540262707025

key: test_roc_auc
value: [0.75       0.66071429 0.55952381 0.63095238 0.60119048 0.74025974
 0.64285714 0.52597403 0.52597403 0.71428571]

mean value: 0.6351731601731602

key: train_roc_auc
value: [0.63885036 0.65808824 0.67264093 0.65900735 0.69408701 0.66133192
 0.68583131 0.67991505 0.63410194 0.6712682 ]

mean value: 0.6655122313893195

key: test_jcc
value: [0.73333333 0.6        0.58823529 0.625      0.64705882 0.71428571
 0.6875     0.58823529 0.58823529 0.73333333]

mean value: 0.6505217086834734

key: train_jcc
value: [0.65734266 0.67605634 0.66423358 0.66428571 0.66917293 0.67605634
 0.67883212 0.68345324 0.64335664 0.65693431]

mean value: 0.6669723860782252

MCC on Blind test: 0.51

Accuracy on Blind test: 0.73

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00765204 0.00983834 0.00933051 0.01027274 0.00990605 0.00989223
 0.01009798 0.01009941 0.01041555 0.01013994]

mean value: 0.009764480590820312

key: score_time
value: [0.00810671 0.00978684 0.00992608 0.0102284  0.01031661 0.01037884
 0.01027703 0.01031566 0.01036739 0.01031637]

mean value: 0.01000199317932129

key: test_mcc
value: [0.33796318 0.54761905 0.65477023 0.7824608  0.80507649 0.76623377
 0.28203804 0.34188173 0.44320263 0.52299758]

mean value: 0.548424349224469

key: train_mcc
value: [0.88657784 0.85954556 0.72631812 0.76988112 0.84858071 0.83737341
 0.56743022 0.76293969 0.77046864 0.79393863]

mean value: 0.7823053934321447

key: test_accuracy
value: [0.68421053 0.78947368 0.84210526 0.89473684 0.89473684 0.88888889
 0.5        0.66666667 0.72222222 0.77777778]

mean value: 0.7660818713450293

key: train_accuracy
value: [0.94578313 0.93373494 0.86746988 0.88554217 0.92771084 0.92215569
 0.7245509  0.8742515  0.88622754 0.89820359]

mean value: 0.8865630185412308

key: test_fscore
value: [0.75       0.83333333 0.88       0.92307692 0.90909091 0.90909091
 0.30769231 0.7        0.81481481 0.83333333]

mean value: 0.7860432530432531

key: train_fscore
value: [0.95566502 0.9468599  0.9009009  0.91479821 0.94059406 0.93596059
 0.7125     0.88888889 0.91555556 0.92376682]

mean value: 0.9035489946317999

key: test_precision
value: [0.69230769 0.83333333 0.84615385 0.85714286 1.         0.90909091
 1.         0.77777778 0.6875     0.76923077]

mean value: 0.8372537185037185

key: train_precision
value: [0.97       0.93333333 0.83333333 0.84297521 0.95       0.95
 1.         0.97674419 0.8442623  0.85833333]

mean value: 0.9158981687740049

key: test_recall
value: [0.81818182 0.83333333 0.91666667 1.         0.83333333 0.90909091
 0.18181818 0.63636364 1.         0.90909091]

mean value: 0.8037878787878788

key: train_recall
value: [0.94174757 0.96078431 0.98039216 1.         0.93137255 0.9223301
 0.55339806 0.81553398 1.         1.        ]

mean value: 0.9105558728345707

key: test_roc_auc
value: [0.65909091 0.77380952 0.81547619 0.85714286 0.91666667 0.88311688
 0.59090909 0.67532468 0.64285714 0.74025974]

mean value: 0.755465367965368

key: train_roc_auc
value: [0.94706426 0.92570466 0.83394608 0.8515625  0.92662377 0.92210255
 0.77669903 0.89214199 0.8515625  0.8671875 ]

mean value: 0.879459484036333

key: test_jcc
value: [0.6        0.71428571 0.78571429 0.85714286 0.83333333 0.83333333
 0.18181818 0.53846154 0.6875     0.71428571]

mean value: 0.6745874958374959

key: train_jcc
value: [0.91509434 0.89908257 0.81967213 0.84297521 0.88785047 0.87962963
 0.55339806 0.8        0.8442623  0.85833333]

mean value: 0.830029802977617

MCC on Blind test: 0.13

Accuracy on Blind test: 0.55

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01018405 0.01003146 0.00991631 0.01017022 0.01003242 0.01013088
 0.00956511 0.00980854 0.01098204 0.01009631]

mean value: 0.010091733932495118

key: score_time
value: [0.01030588 0.01020026 0.0102222  0.01032352 0.01022243 0.01029515
 0.01032782 0.01033735 0.01036525 0.01023245]

mean value: 0.010283231735229492

key: test_mcc
value: [0.5077524  0.51887452 0.72456884 0.80507649 0.6761234  0.66254135
 0.1934765  0.44320263 0.67005939 0.43320011]

mean value: 0.5634875632497535

key: train_mcc
value: [0.73618348 0.59399514 0.70269787 0.86061598 0.60495638 0.88573143
 0.29075534 0.82931725 0.66982421 0.82396818]

mean value: 0.6998045257344441

key: test_accuracy
value: [0.73684211 0.68421053 0.84210526 0.89473684 0.84210526 0.83333333
 0.44444444 0.72222222 0.83333333 0.72222222]

mean value: 0.7555555555555555

key: train_accuracy
value: [0.87349398 0.75301205 0.84337349 0.93373494 0.80120482 0.94610778
 0.50299401 0.91616766 0.80838323 0.91616766]

mean value: 0.8294639636389871

key: test_fscore
value: [0.81481481 0.66666667 0.85714286 0.90909091 0.88888889 0.85714286
 0.16666667 0.81481481 0.88       0.76190476]

mean value: 0.7617133237133237

key: train_fscore
value: [0.9058296  0.75151515 0.86021505 0.94581281 0.86075949 0.95652174
 0.32520325 0.93636364 0.81818182 0.93137255]

mean value: 0.8291775097971825

key: test_precision
value: [0.6875     1.         1.         1.         0.8        0.9
 1.         0.6875     0.78571429 0.8       ]

mean value: 0.8660714285714286

key: train_precision
value: [0.84166667 0.98412698 0.95238095 0.95049505 0.75555556 0.95192308
 1.         0.88034188 0.98630137 0.94059406]

mean value: 0.924338559476902

key: test_recall
value: [1.         0.5        0.75       0.83333333 1.         0.81818182
 0.09090909 1.         1.         0.72727273]

mean value: 0.771969696969697

key: train_recall
value: [0.98058252 0.60784314 0.78431373 0.94117647 1.         0.96116505
 0.19417476 1.         0.69902913 0.9223301 ]

mean value: 0.8090614886731392

key: test_roc_auc
value: [0.6875     0.75       0.875      0.91666667 0.78571429 0.83766234
 0.54545455 0.64285714 0.78571429 0.72077922]

mean value: 0.7547348484848485

key: train_roc_auc
value: [0.83949761 0.79610907 0.86090686 0.93152574 0.7421875  0.94152002
 0.59708738 0.890625   0.84170206 0.91429005]

mean value: 0.8355451292572045

key: test_jcc
value: [0.6875     0.5        0.75       0.83333333 0.8        0.75
 0.09090909 0.6875     0.78571429 0.61538462]

mean value: 0.6500341325341326

key: train_jcc
value: [0.82786885 0.60194175 0.75471698 0.89719626 0.75555556 0.91666667
 0.19417476 0.88034188 0.69230769 0.87155963]

mean value: 0.7392330028027021

MCC on Blind test: 0.22

Accuracy on Blind test: 0.57

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.08395481 0.0697968  0.07071424 0.07110596 0.07078338 0.07142019
 0.07246375 0.07124639 0.07080793 0.07235765]

mean value: 0.07246510982513428

key: score_time
value: [0.0151608  0.01497865 0.01519632 0.01489806 0.01543808 0.01554489
 0.01553178 0.01531959 0.01536942 0.01543546]

mean value: 0.015287303924560547

key: test_mcc
value: [0.60553007 0.54761905 1.         0.89559105 0.67460105 0.76623377
 0.79772404 0.52299758 0.88640526 0.48416483]

mean value: 0.7180866701858623

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.78947368 1.         0.94736842 0.84210526 0.88888889
 0.88888889 0.77777778 0.94444444 0.72222222]

mean value: 0.8590643274853801

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84615385 0.83333333 1.         0.95652174 0.86956522 0.90909091
 0.9        0.83333333 0.95652174 0.73684211]

mean value: 0.8841362222826754

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.73333333 0.83333333 1.         1.         0.90909091 0.90909091
 1.         0.76923077 0.91666667 0.875     ]

mean value: 0.8945745920745921

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.83333333 1.         0.91666667 0.83333333 0.90909091
 0.81818182 0.90909091 1.         0.63636364]

mean value: 0.8856060606060606

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.77380952 1.         0.95833333 0.8452381  0.88311688
 0.90909091 0.74025974 0.92857143 0.74675325]

mean value: 0.8535173160173161

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73333333 0.71428571 1.         0.91666667 0.76923077 0.83333333
 0.81818182 0.71428571 0.91666667 0.58333333]

mean value: 0.799931734931735

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.0

Accuracy on Blind test: 0.5

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.02634501 0.02745032 0.03282142 0.02707553 0.03081536 0.03036213
 0.03100204 0.03243303 0.0233736  0.02885413]

mean value: 0.029053258895874023

key: score_time
value: [0.01942372 0.01645255 0.02343774 0.01584172 0.02157259 0.01987481
 0.02784896 0.02355909 0.01557946 0.0171895 ]

mean value: 0.020078015327453614

key: test_mcc
value: [0.56729535 0.89559105 0.89559105 1.         0.67460105 0.76623377
 0.79772404 0.56061191 0.88640526 0.64465837]

mean value: 0.768871184699948

key: train_mcc
value: [1.         0.97457108 0.97457108 1.         0.98740179 0.98737524
 0.96301704 0.97466626 0.98744925 0.94933931]

mean value: 0.9798391039351805

key: test_accuracy
value: [0.78947368 0.94736842 0.94736842 1.         0.84210526 0.88888889
 0.88888889 0.77777778 0.94444444 0.83333333]

mean value: 0.8859649122807017

key: train_accuracy
value: [1.         0.98795181 0.98795181 1.         0.9939759  0.99401198
 0.98203593 0.98802395 0.99401198 0.9760479 ]

mean value: 0.9904011254599235

key: test_fscore
value: [0.83333333 0.95652174 0.95652174 1.         0.86956522 0.90909091
 0.9        0.84615385 0.95652174 0.86956522]

mean value: 0.9097273740752001

key: train_fscore
value: [1.         0.99019608 0.99019608 1.         0.99507389 0.99516908
 0.98522167 0.99029126 0.99512195 0.98076923]

mean value: 0.9922039249615477

key: test_precision
value: [0.76923077 1.         1.         1.         0.90909091 0.90909091
 1.         0.73333333 0.91666667 0.83333333]

mean value: 0.9070745920745921

key: train_precision
value: [1.         0.99019608 0.99019608 1.         1.         0.99038462
 1.         0.99029126 1.         0.97142857]

mean value: 0.9932496605811855

key: test_recall
value: [0.90909091 0.91666667 0.91666667 1.         0.83333333 0.90909091
 0.81818182 1.         1.         0.90909091]

mean value: 0.9212121212121211

key: train_recall
value: [1.         0.99019608 0.99019608 1.         0.99019608 1.
 0.97087379 0.99029126 0.99029126 0.99029126]

mean value: 0.9912335808109651

key: test_roc_auc
value: [0.76704545 0.95833333 0.95833333 1.         0.8452381  0.88311688
 0.90909091 0.71428571 0.92857143 0.81168831]

mean value: 0.8775703463203464

key: train_roc_auc
value: [1.         0.98728554 0.98728554 1.         0.99509804 0.9921875
 0.98543689 0.98733313 0.99514563 0.97170813]

mean value: 0.9901480404054825

key: test_jcc
value: [0.71428571 0.91666667 0.91666667 1.         0.76923077 0.83333333
 0.81818182 0.73333333 0.91666667 0.76923077]

mean value: 0.8387595737595738

key: train_jcc
value: [1.         0.98058252 0.98058252 1.         0.99019608 0.99038462
 0.97087379 0.98076923 0.99029126 0.96226415]

mean value: 0.9845944172615994

MCC on Blind test: 0.1

Accuracy on Blind test: 0.54

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.01885796 0.01930904 0.02562237 0.02129722 0.06075287 0.03211212
 0.04384661 0.03204489 0.0758667  0.05150294]

mean value: 0.038121271133422854

key: score_time
value: [0.01133871 0.01133037 0.0113616  0.02015972 0.02054238 0.01122904
 0.011343   0.01591635 0.02104354 0.01130295]

mean value: 0.01455676555633545

key: test_mcc
value: [ 0.40219983  0.26772484  0.28690229  0.18531233  0.44908871  0.2548236
  0.39594419 -0.05096472  0.3040345   0.67005939]

mean value: 0.31651249570546197

key: train_mcc
value: [0.88606149 0.90075726 0.87457979 0.88685769 0.92515014 0.91320801
 0.89953068 0.91188694 0.87498674 0.94997541]

mean value: 0.9022994142722148

key: test_accuracy
value: [0.68421053 0.68421053 0.68421053 0.63157895 0.73684211 0.66666667
 0.72222222 0.55555556 0.66666667 0.83333333]

mean value: 0.6865497076023391

key: train_accuracy
value: [0.94578313 0.95180723 0.93975904 0.94578313 0.96385542 0.95808383
 0.95209581 0.95808383 0.94011976 0.9760479 ]

mean value: 0.953141908953178

key: test_fscore
value: [0.78571429 0.78571429 0.76923077 0.72       0.82758621 0.76923077
 0.8        0.69230769 0.78571429 0.88      ]

mean value: 0.7815498294808639

key: train_fscore
value: [0.95774648 0.96226415 0.95283019 0.95734597 0.97142857 0.96713615
 0.96226415 0.96682464 0.95327103 0.98095238]

mean value: 0.9632063716206098

key: test_precision
value: [0.64705882 0.6875     0.71428571 0.69230769 0.70588235 0.66666667
 0.71428571 0.6        0.64705882 0.78571429]

mean value: 0.6860760073260074

key: train_precision
value: [0.92727273 0.92727273 0.91818182 0.9266055  0.94444444 0.93636364
 0.93577982 0.94444444 0.91891892 0.96261682]

mean value: 0.9341900860429541

key: test_recall
value: [1.         0.91666667 0.83333333 0.75       1.         0.90909091
 0.90909091 0.81818182 1.         1.        ]

mean value: 0.9136363636363636

key: train_recall
value: [0.99029126 1.         0.99019608 0.99019608 1.         1.
 0.99029126 0.99029126 0.99029126 1.        ]

mean value: 0.9941557205406435

key: test_roc_auc
value: [0.625      0.60119048 0.63095238 0.58928571 0.64285714 0.5974026
 0.66883117 0.48051948 0.57142857 0.78571429]

mean value: 0.6193181818181818

key: train_roc_auc
value: [0.93165357 0.9375     0.92478554 0.93259804 0.953125   0.9453125
 0.94045813 0.94827063 0.92483313 0.96875   ]

mean value: 0.9407286539211154

key: test_jcc
value: [0.64705882 0.64705882 0.625      0.5625     0.70588235 0.625
 0.66666667 0.52941176 0.64705882 0.78571429]

mean value: 0.6441351540616247

key: train_jcc
value: [0.91891892 0.92727273 0.90990991 0.91818182 0.94444444 0.93636364
 0.92727273 0.93577982 0.91071429 0.96261682]

mean value: 0.9291475107022136

MCC on Blind test: 0.33

Accuracy on Blind test: 0.64

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.12610126 0.12060261 0.11894894 0.11790848 0.12793779 0.12083268
 0.12258196 0.12219334 0.12002635 0.11419153]

mean value: 0.121132493019104

key: score_time
value: [0.00943565 0.00874519 0.00873065 0.00975442 0.00927114 0.00965858
 0.00981712 0.01030612 0.00883412 0.00869703]

mean value: 0.009325003623962403

key: test_mcc
value: [0.45361105 1.         0.80507649 1.         0.88949918 0.76623377
 1.         0.56061191 0.88640526 0.64465837]

mean value: 0.8006096026925367

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 1.         0.89473684 1.         0.94736842 0.88888889
 1.         0.77777778 0.94444444 0.83333333]

mean value: 0.9023391812865497

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7826087  1.         0.90909091 1.         0.96       0.90909091
 1.         0.84615385 0.95652174 0.86956522]

mean value: 0.9233031316509577

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       1.         1.         1.         0.92307692 0.90909091
 1.         0.73333333 0.91666667 0.83333333]

mean value: 0.9065501165501165

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.83333333 1.         1.         0.90909091
 1.         1.         1.         0.90909091]

mean value: 0.946969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.72159091 1.         0.91666667 1.         0.92857143 0.88311688
 1.         0.71428571 0.92857143 0.81168831]

mean value: 0.8904491341991343

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.64285714 1.         0.83333333 1.         0.92307692 0.83333333
 1.         0.73333333 0.91666667 0.76923077]

mean value: 0.8651831501831502

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.55

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02101755 0.03171253 0.02527022 0.01208258 0.01173353 0.01204896
 0.01371312 0.01347113 0.01202703 0.01235557]

mean value: 0.01654322147369385

key: score_time
value: [0.01127648 0.01125073 0.01176476 0.01200485 0.01166868 0.01086617
 0.01114273 0.01123142 0.0110383  0.01096892]

mean value: 0.011321306228637695

key: test_mcc
value: [0.4719399  0.40849122 0.09356015 0.44908871 0.56694671 0.26856633
 0.44320263 0.0805823  0.0805823  0.66254135]

mean value: 0.3525501597195837

key: train_mcc
value: [0.6002326  0.50998847 0.67610805 0.54823412 0.49142346 0.64107028
 0.54903745 0.61519707 0.55309666 0.78305013]

mean value: 0.5967438289020631

key: test_accuracy
value: [0.73684211 0.73684211 0.63157895 0.73684211 0.78947368 0.66666667
 0.72222222 0.61111111 0.61111111 0.83333333]

mean value: 0.7076023391812866

key: train_accuracy
value: [0.81325301 0.75903614 0.8373494  0.77710843 0.75903614 0.83233533
 0.77844311 0.82035928 0.79041916 0.89820359]

mean value: 0.8065543611572037

key: test_fscore
value: [0.76190476 0.81481481 0.75862069 0.82758621 0.85714286 0.75
 0.81481481 0.74074074 0.74074074 0.85714286]

mean value: 0.7923508483853311

key: train_fscore
value: [0.85167464 0.83471074 0.88311688 0.84518828 0.83050847 0.87272727
 0.84647303 0.86486486 0.83253589 0.91943128]

mean value: 0.858123135858806

key: test_precision
value: [0.8        0.73333333 0.64705882 0.70588235 0.75       0.69230769
 0.6875     0.625      0.625      0.9       ]

mean value: 0.7166082202111614

key: train_precision
value: [0.83962264 0.72142857 0.79069767 0.73722628 0.73134328 0.82051282
 0.73913043 0.80672269 0.82075472 0.89814815]

mean value: 0.7905587257811302

key: test_recall
value: [0.72727273 0.91666667 0.91666667 1.         1.         0.81818182
 1.         0.90909091 0.90909091 0.81818182]

mean value: 0.9015151515151515

key: train_recall
value: [0.86407767 0.99019608 1.         0.99019608 0.96078431 0.93203883
 0.99029126 0.93203883 0.84466019 0.94174757]

mean value: 0.9446030839520274

key: test_roc_auc
value: [0.73863636 0.67261905 0.5297619  0.64285714 0.71428571 0.62337662
 0.64285714 0.52597403 0.52597403 0.83766234]

mean value: 0.6454004329004329

key: train_roc_auc
value: [0.7971182  0.69041054 0.7890625  0.71384804 0.69914216 0.80195692
 0.71389563 0.78633192 0.7738926  0.88493629]

mean value: 0.7650594784839503

key: test_jcc
value: [0.61538462 0.6875     0.61111111 0.70588235 0.75       0.6
 0.6875     0.58823529 0.58823529 0.75      ]

mean value: 0.6583848667672197

key: train_jcc
value: [0.74166667 0.71631206 0.79069767 0.73188406 0.71014493 0.77419355
 0.73381295 0.76190476 0.71311475 0.85087719]

mean value: 0.7524608590343069

MCC on Blind test: 0.32

Accuracy on Blind test: 0.61

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0152986  0.01060939 0.01029634 0.01028705 0.01044655 0.01044083
 0.01033711 0.01052427 0.01060605 0.01032233]

mean value: 0.010916852951049804

key: score_time
value: [0.01142406 0.01068401 0.01064181 0.01059437 0.01238728 0.01085711
 0.01055193 0.0106585  0.01086307 0.01084256]

mean value: 0.010950469970703125

key: test_mcc
value: [0.21660006 0.67460105 0.77380952 0.80507649 0.89559105 0.76623377
 0.79772404 0.67005939 0.56061191 0.66254135]

mean value: 0.6822848626279371

key: train_mcc
value: [0.92308458 0.85954556 0.88685769 0.88521749 0.83387364 0.89863369
 0.84736815 0.87286094 0.89835373 0.88573143]

mean value: 0.8791526890998981

key: test_accuracy
value: [0.63157895 0.84210526 0.89473684 0.89473684 0.94736842 0.88888889
 0.88888889 0.83333333 0.77777778 0.83333333]

mean value: 0.8432748538011696

key: train_accuracy
value: [0.96385542 0.93373494 0.94578313 0.94578313 0.92168675 0.95209581
 0.92814371 0.94011976 0.95209581 0.94610778]

mean value: 0.9429406247745473

key: test_fscore
value: [0.72       0.86956522 0.91666667 0.90909091 0.95652174 0.90909091
 0.9        0.88       0.84615385 0.85714286]

mean value: 0.8764232144666927

key: train_fscore
value: [0.97115385 0.9468599  0.95734597 0.95652174 0.93719807 0.96190476
 0.94230769 0.95192308 0.96153846 0.95652174]

mean value: 0.9543275259667182

key: test_precision
value: [0.64285714 0.90909091 0.91666667 1.         1.         0.90909091
 1.         0.78571429 0.73333333 0.9       ]

mean value: 0.8796753246753246

key: train_precision
value: [0.96190476 0.93333333 0.9266055  0.94285714 0.92380952 0.94392523
 0.93333333 0.94285714 0.95238095 0.95192308]

mean value: 0.9412930005631284

key: test_recall
value: [0.81818182 0.83333333 0.91666667 0.83333333 0.91666667 0.90909091
 0.81818182 1.         1.         0.81818182]

mean value: 0.8863636363636364

key: train_recall
value: [0.98058252 0.96078431 0.99019608 0.97058824 0.95098039 0.98058252
 0.95145631 0.96116505 0.97087379 0.96116505]

mean value: 0.967837426232629

key: test_roc_auc
value: [0.59659091 0.8452381  0.88690476 0.91666667 0.95833333 0.88311688
 0.90909091 0.78571429 0.71428571 0.83766234]

mean value: 0.8333603896103896

key: train_roc_auc
value: [0.95854523 0.92570466 0.93259804 0.93841912 0.9129902  0.94341626
 0.92104066 0.93370752 0.94637439 0.94152002]

mean value: 0.9354316099417114

key: test_jcc
value: [0.5625     0.76923077 0.84615385 0.83333333 0.91666667 0.83333333
 0.81818182 0.78571429 0.73333333 0.75      ]

mean value: 0.7848447385947386

key: train_jcc
value: [0.94392523 0.89908257 0.91818182 0.91666667 0.88181818 0.9266055
 0.89090909 0.90825688 0.92592593 0.91666667]

mean value: 0.9128038537941651

MCC on Blind test: 0.21

Accuracy on Blind test: 0.59

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.0855453  0.08357239 0.08917785 0.08271241 0.08259749 0.08286834
 0.08288574 0.08441114 0.08273721 0.08266902]

mean value: 0.08391768932342529

key: score_time
value: [0.01079178 0.01085567 0.01092529 0.01075959 0.01109052 0.01090479
 0.0108192  0.01083088 0.01145196 0.01102829]

mean value: 0.010945796966552734

key: test_mcc
value: [0.21660006 0.67460105 0.77380952 0.89559105 0.89559105 0.76623377
 0.79772404 0.67005939 0.56061191 0.56980288]

mean value: 0.6820624726317762

key: train_mcc
value: [0.92308458 0.85980258 0.88685769 0.83387364 0.83400835 0.89863369
 0.87296284 0.87286094 0.89835373 0.86032048]

mean value: 0.8740758514702531

key: test_accuracy
value: [0.63157895 0.84210526 0.89473684 0.94736842 0.94736842 0.88888889
 0.88888889 0.83333333 0.77777778 0.77777778]

mean value: 0.8429824561403508

key: train_accuracy
value: [0.96385542 0.93373494 0.94578313 0.92168675 0.92168675 0.95209581
 0.94011976 0.94011976 0.95209581 0.93413174]

mean value: 0.9405309862203304

key: test_fscore
value: [0.72       0.86956522 0.91666667 0.95652174 0.95652174 0.90909091
 0.9        0.88       0.84615385 0.8       ]

mean value: 0.8754520117563596

key: train_fscore
value: [0.97115385 0.94634146 0.95734597 0.93719807 0.93779904 0.96190476
 0.95238095 0.95192308 0.96153846 0.9468599 ]

mean value: 0.9524445547956408

key: test_precision
value: [0.64285714 0.90909091 0.91666667 1.         1.         0.90909091
 1.         0.78571429 0.73333333 0.88888889]

mean value: 0.8785642135642135

key: train_precision
value: [0.96190476 0.94174757 0.9266055  0.92380952 0.91588785 0.94392523
 0.93457944 0.94285714 0.95238095 0.94230769]

mean value: 0.9386005674027249

key: test_recall
value: [0.81818182 0.83333333 0.91666667 0.91666667 0.91666667 0.90909091
 0.81818182 1.         1.         0.72727273]

mean value: 0.8856060606060606

key: train_recall
value: [0.98058252 0.95098039 0.99019608 0.95098039 0.96078431 0.98058252
 0.97087379 0.96116505 0.97087379 0.95145631]

mean value: 0.9668475157053112

key: test_roc_auc
value: [0.59659091 0.8452381  0.88690476 0.95833333 0.95833333 0.88311688
 0.90909091 0.78571429 0.71428571 0.79220779]

mean value: 0.8329816017316017

key: train_roc_auc
value: [0.95854523 0.9286152  0.93259804 0.9129902  0.91007966 0.94341626
 0.93074939 0.93370752 0.94637439 0.92885316]

mean value: 0.9325929046780524

key: test_jcc
value: [0.5625     0.76923077 0.84615385 0.91666667 0.91666667 0.83333333
 0.81818182 0.78571429 0.73333333 0.66666667]

mean value: 0.7848447385947386

key: train_jcc
value: [0.94392523 0.89814815 0.91818182 0.88181818 0.88288288 0.9266055
 0.90909091 0.90825688 0.92592593 0.89908257]

mean value: 0.9093918053821166

MCC on Blind test: 0.1

Accuracy on Blind test: 0.54

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02180219 0.02042246 0.01844001 0.01872444 0.01910233 0.01665592
 0.0174613  0.01886535 0.02107906 0.0179646 ]

mean value: 0.019051766395568846

key: score_time
value: [0.01086044 0.01106286 0.01095486 0.01095009 0.01091909 0.01115131
 0.01068163 0.010638   0.01093936 0.01097393]

mean value: 0.01091315746307373

key: test_mcc
value: [0.58002308 0.48856385 0.41096386 0.56490196 0.74242424 0.74047959
 0.82575758 0.91666667 0.83205029 0.63636364]

mean value: 0.6738194748889874

key: train_mcc
value: [0.80500813 0.77565201 0.79548704 0.77563066 0.7469525  0.76601619
 0.76597166 0.76597166 0.73817726 0.81557242]

mean value: 0.7750439506475604

key: test_accuracy
value: [0.7826087  0.73913043 0.69565217 0.7826087  0.86956522 0.86956522
 0.91304348 0.95652174 0.90909091 0.81818182]

mean value: 0.833596837944664

key: train_accuracy
value: [0.90243902 0.88780488 0.89756098 0.88780488 0.87317073 0.88292683
 0.88292683 0.88292683 0.86893204 0.90776699]

mean value: 0.887426000473597

key: test_fscore
value: [0.73684211 0.75       0.72       0.76190476 0.86956522 0.88
 0.91666667 0.95652174 0.91666667 0.81818182]

mean value: 0.832634897520481

key: train_fscore
value: [0.90384615 0.88780488 0.89655172 0.88888889 0.875      0.88349515
 0.88118812 0.88118812 0.86699507 0.90731707]

mean value: 0.8872275175238942

key: test_precision
value: [0.875      0.69230769 0.64285714 0.8        0.90909091 0.84615385
 0.91666667 1.         0.84615385 0.81818182]

mean value: 0.8346411921411921

key: train_precision
value: [0.8952381  0.89215686 0.91       0.88461538 0.85849057 0.875
 0.89       0.89       0.88       0.91176471]

mean value: 0.8887265614518667

key: test_recall
value: [0.63636364 0.81818182 0.81818182 0.72727273 0.83333333 0.91666667
 0.91666667 0.91666667 1.         0.81818182]

mean value: 0.8401515151515152

key: train_recall
value: [0.91262136 0.88349515 0.88349515 0.89320388 0.89215686 0.89215686
 0.87254902 0.87254902 0.85436893 0.90291262]

mean value: 0.8859508852084523

key: test_roc_auc
value: [0.77651515 0.74242424 0.70075758 0.78030303 0.87121212 0.86742424
 0.91287879 0.95833333 0.90909091 0.81818182]

mean value: 0.8337121212121211

key: train_roc_auc
value: [0.90238911 0.887826   0.89762993 0.88777841 0.8732629  0.88297164
 0.88287645 0.88287645 0.86893204 0.90776699]

mean value: 0.8874309918142015

key: test_jcc
value: [0.58333333 0.6        0.5625     0.61538462 0.76923077 0.78571429
 0.84615385 0.91666667 0.84615385 0.69230769]

mean value: 0.7217445054945055

key: train_jcc
value: [0.8245614  0.79824561 0.8125     0.8        0.77777778 0.79130435
 0.78761062 0.78761062 0.76521739 0.83035714]

mean value: 0.7975184916247269

MCC on Blind test: 0.35

Accuracy on Blind test: 0.67

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.70506549 0.62887144 0.6596272  0.79791594 0.68545508 0.64333749
 0.76110959 0.69955564 0.68501639 0.77351475]

mean value: 0.7039469003677368

key: score_time
value: [0.01389217 0.01472902 0.01154828 0.01132441 0.01137733 0.01125288
 0.02320051 0.01134515 0.01451206 0.01475286]

mean value: 0.013793468475341797

key: test_mcc
value: [0.76277007 0.66414149 0.48856385 0.74047959 0.74242424 0.82575758
 0.65151515 0.58930667 0.68313005 0.83205029]

mean value: 0.6980138984393582

key: train_mcc
value: [0.92211753 0.97077583 0.93174679 0.87320324 0.91259644 0.91259644
 0.88292404 0.94163576 1.         1.        ]

mean value: 0.9347596066723304

key: test_accuracy
value: [0.86956522 0.82608696 0.73913043 0.86956522 0.86956522 0.91304348
 0.82608696 0.7826087  0.81818182 0.90909091]

mean value: 0.8422924901185771

key: train_accuracy
value: [0.96097561 0.98536585 0.96585366 0.93658537 0.95609756 0.95609756
 0.94146341 0.97073171 1.         1.        ]

mean value: 0.9673170731707317

key: test_fscore
value: [0.84210526 0.83333333 0.75       0.85714286 0.86956522 0.91666667
 0.83333333 0.76190476 0.77777778 0.91666667]

mean value: 0.8358495877374597

key: train_fscore
value: [0.96153846 0.98550725 0.96618357 0.93719807 0.95652174 0.95652174
 0.94117647 0.97029703 1.         1.        ]

mean value: 0.9674944328979426

key: test_precision
value: [1.         0.76923077 0.69230769 0.9        0.90909091 0.91666667
 0.83333333 0.88888889 1.         0.84615385]

mean value: 0.8755672105672105

key: train_precision
value: [0.95238095 0.98076923 0.96153846 0.93269231 0.94285714 0.94285714
 0.94117647 0.98       1.         1.        ]

mean value: 0.9634271708683473

key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.81818182 0.83333333 0.91666667
 0.83333333 0.66666667 0.63636364 1.        ]

mean value: 0.8159090909090909

key: train_recall
value: [0.97087379 0.99029126 0.97087379 0.94174757 0.97058824 0.97058824
 0.94117647 0.96078431 1.         1.        ]

mean value: 0.9716923662668951

key: test_roc_auc
value: [0.86363636 0.82954545 0.74242424 0.86742424 0.87121212 0.91287879
 0.82575758 0.78787879 0.81818182 0.90909091]

mean value: 0.8428030303030303

key: train_roc_auc
value: [0.96092709 0.98534171 0.96582905 0.93656006 0.9561679  0.9561679
 0.94146202 0.97068342 1.         1.        ]

mean value: 0.9673139158576052

key: test_jcc
value: [0.72727273 0.71428571 0.6        0.75       0.76923077 0.84615385
 0.71428571 0.61538462 0.63636364 0.84615385]

mean value: 0.721913086913087

key: train_jcc
value: [0.92592593 0.97142857 0.93457944 0.88181818 0.91666667 0.91666667
 0.88888889 0.94230769 1.         1.        ]

mean value: 0.937828203295493

MCC on Blind test: 0.04

Accuracy on Blind test: 0.52

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01285195 0.00952053 0.00782275 0.00767875 0.00741482 0.0068872
 0.00694108 0.00817084 0.00734329 0.00712538]

mean value: 0.0081756591796875

key: score_time
value: [0.01059294 0.00885248 0.00852776 0.00859118 0.00863194 0.00855207
 0.00864601 0.00876474 0.00838137 0.00814605]

mean value: 0.008768653869628907

key: test_mcc
value: [0.11236664 0.43929769 0.44411739 0.41096386 0.47923384 0.50168817
 0.40451992 0.55048188 0.47140452 0.20412415]

mean value: 0.4018198054310791

key: train_mcc
value: [0.3148712  0.50657911 0.52847427 0.5185658  0.43504485 0.51678072
 0.4680327  0.45392287 0.49379046 0.43864549]

mean value: 0.4674707470646864

key: test_accuracy
value: [0.52173913 0.65217391 0.69565217 0.69565217 0.69565217 0.73913043
 0.65217391 0.73913043 0.68181818 0.59090909]

mean value: 0.6664031620553359

key: train_accuracy
value: [0.60487805 0.72682927 0.73658537 0.72682927 0.68292683 0.73170732
 0.70243902 0.69756098 0.7184466  0.68932039]

mean value: 0.7017523087852238

key: test_fscore
value: [0.64516129 0.73333333 0.74074074 0.72       0.77419355 0.78571429
 0.75       0.8        0.75862069 0.66666667]

mean value: 0.7374430554819876

key: train_fscore
value: [0.71378092 0.77777778 0.78571429 0.78125    0.74903475 0.77911647
 0.76078431 0.75590551 0.77165354 0.75193798]

mean value: 0.7626955550457906

key: test_precision
value: [0.5        0.57894737 0.625      0.64285714 0.63157895 0.6875
 0.6        0.66666667 0.61111111 0.5625    ]

mean value: 0.6106161236424394

key: train_precision
value: [0.56111111 0.65771812 0.66442953 0.65359477 0.61783439 0.65986395
 0.63398693 0.63157895 0.64900662 0.62580645]

mean value: 0.6354930823444798

key: test_recall
value: [0.90909091 1.         0.90909091 0.81818182 1.         0.91666667
 1.         1.         1.         0.81818182]

mean value: 0.9371212121212121

key: train_recall
value: [0.98058252 0.95145631 0.96116505 0.97087379 0.95098039 0.95098039
 0.95098039 0.94117647 0.95145631 0.94174757]

mean value: 0.9551399200456882

key: test_roc_auc
value: [0.53787879 0.66666667 0.70454545 0.70075758 0.68181818 0.73106061
 0.63636364 0.72727273 0.68181818 0.59090909]

mean value: 0.6659090909090909

key: train_roc_auc
value: [0.60303636 0.72572816 0.73548449 0.72563297 0.68422806 0.73277175
 0.70364554 0.69874358 0.7184466  0.68932039]

mean value: 0.7017037883114411

key: test_jcc
value: [0.47619048 0.57894737 0.58823529 0.5625     0.63157895 0.64705882
 0.6        0.66666667 0.61111111 0.5       ]

mean value: 0.5862288687404786

key: train_jcc
value: [0.55494505 0.63636364 0.64705882 0.64102564 0.59876543 0.63815789
 0.61392405 0.60759494 0.62820513 0.60248447]

mean value: 0.6168525070295942

MCC on Blind test: 0.37

Accuracy on Blind test: 0.65

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0081985  0.00712252 0.00714064 0.00715804 0.00717282 0.00716138
 0.00722837 0.00712657 0.00720763 0.00717926]

mean value: 0.007269573211669922

key: score_time
value: [0.00871158 0.00801897 0.0079174  0.00812721 0.00795984 0.00803661
 0.00796866 0.00801921 0.00809813 0.00810313]

mean value: 0.008096075057983399

key: test_mcc
value: [0.30240737 0.05427825 0.03816905 0.3030303  0.42228828 0.30240737
 0.03816905 0.65151515 0.37796447 0.27272727]

mean value: 0.2762956564879194

key: train_mcc
value: [0.35623111 0.34638101 0.37560698 0.3463735  0.28783552 0.36612372
 0.35687769 0.3658258  0.32044877 0.39058328]

mean value: 0.3512287378448915

key: test_accuracy
value: [0.65217391 0.52173913 0.52173913 0.65217391 0.69565217 0.65217391
 0.52173913 0.82608696 0.68181818 0.63636364]

mean value: 0.6361660079051383

key: train_accuracy
value: [0.67804878 0.67317073 0.68780488 0.67317073 0.64390244 0.68292683
 0.67804878 0.68292683 0.66019417 0.69417476]

mean value: 0.6754368932038836

key: test_fscore
value: [0.6        0.56       0.47619048 0.63636364 0.75862069 0.69230769
 0.56       0.83333333 0.72       0.63636364]

mean value: 0.6473179464213947

key: train_fscore
value: [0.68571429 0.67942584 0.69230769 0.67317073 0.64390244 0.67336683
 0.68571429 0.67980296 0.65686275 0.70967742]

mean value: 0.6779945226077302

key: test_precision
value: [0.66666667 0.5        0.5        0.63636364 0.64705882 0.64285714
 0.53846154 0.83333333 0.64285714 0.63636364]

mean value: 0.6243961920432509

key: train_precision
value: [0.6728972  0.66981132 0.68571429 0.67647059 0.6407767  0.69072165
 0.66666667 0.68316832 0.66336634 0.6754386 ]

mean value: 0.6725031656102882

key: test_recall
value: [0.54545455 0.63636364 0.45454545 0.63636364 0.91666667 0.75
 0.58333333 0.83333333 0.81818182 0.63636364]

mean value: 0.681060606060606

key: train_recall
value: [0.69902913 0.68932039 0.69902913 0.66990291 0.64705882 0.65686275
 0.70588235 0.67647059 0.65048544 0.74757282]

mean value: 0.6841614315629164

key: test_roc_auc
value: [0.64772727 0.52651515 0.51893939 0.65151515 0.68560606 0.64772727
 0.51893939 0.82575758 0.68181818 0.63636364]

mean value: 0.634090909090909

key: train_roc_auc
value: [0.67794594 0.67309157 0.68774986 0.67318675 0.64391776 0.6828003
 0.67818389 0.68289549 0.66019417 0.69417476]

mean value: 0.6754140491147915

key: test_jcc
value: [0.42857143 0.38888889 0.3125     0.46666667 0.61111111 0.52941176
 0.38888889 0.71428571 0.5625     0.46666667]

mean value: 0.4869491129785247

key: train_jcc
value: [0.52173913 0.51449275 0.52941176 0.50735294 0.47482014 0.50757576
 0.52173913 0.51492537 0.48905109 0.55      ]

mean value: 0.5131108089860595

MCC on Blind test: 0.35

Accuracy on Blind test: 0.67

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00741982 0.00747585 0.00764775 0.00689459 0.00716424 0.00761509
 0.00779819 0.00776052 0.00681782 0.00763607]

mean value: 0.007422995567321777

key: score_time
value: [0.01033688 0.00987959 0.00997186 0.00971317 0.00995612 0.00998878
 0.00992155 0.01000547 0.00932527 0.00970054]

mean value: 0.00987992286682129

key: test_mcc
value: [-0.12878788  0.3030303   0.12878788  0.38932432  0.56490196  0.15096491
  0.50460839  0.65909298  0.32539569  0.37796447]

mean value: 0.3275283023309783

key: train_mcc
value: [0.61013747 0.58290698 0.56242364 0.62329827 0.62174364 0.55771431
 0.60061066 0.58363235 0.58722022 0.59402749]

mean value: 0.5923715024669853

key: test_accuracy
value: [0.43478261 0.65217391 0.56521739 0.69565217 0.7826087  0.56521739
 0.69565217 0.82608696 0.63636364 0.68181818]

mean value: 0.6535573122529644

key: train_accuracy
value: [0.80487805 0.7902439  0.7804878  0.8097561  0.8097561  0.77560976
 0.8        0.7902439  0.79126214 0.7961165 ]

mean value: 0.7948354250532796

key: test_fscore
value: [0.43478261 0.63636364 0.54545455 0.66666667 0.8        0.5
 0.58823529 0.84615385 0.5        0.63157895]

mean value: 0.6149235544820415

key: train_fscore
value: [0.80952381 0.78172589 0.77386935 0.8        0.8        0.75531915
 0.79396985 0.77720207 0.77720207 0.78787879]

mean value: 0.785669097572126

key: test_precision
value: [0.41666667 0.63636364 0.54545455 0.7        0.76923077 0.625
 1.         0.78571429 0.8        0.75      ]

mean value: 0.7028429903429904

key: train_precision
value: [0.79439252 0.81914894 0.80208333 0.84782609 0.83870968 0.8255814
 0.81443299 0.82417582 0.83333333 0.82105263]

mean value: 0.8220736731371573

key: test_recall
value: [0.45454545 0.63636364 0.54545455 0.63636364 0.83333333 0.41666667
 0.41666667 0.91666667 0.36363636 0.54545455]

mean value: 0.5765151515151515

key: train_recall
value: [0.82524272 0.74757282 0.74757282 0.75728155 0.76470588 0.69607843
 0.7745098  0.73529412 0.72815534 0.75728155]

mean value: 0.7533695031410622

key: test_roc_auc
value: [0.43560606 0.65151515 0.56439394 0.69318182 0.78030303 0.5719697
 0.70833333 0.8219697  0.63636364 0.68181818]

mean value: 0.6545454545454545

key: train_roc_auc
value: [0.80477822 0.79045307 0.78064915 0.81001333 0.80953741 0.77522368
 0.79987626 0.78997716 0.79126214 0.7961165 ]

mean value: 0.7947886921758995

key: test_jcc
value: [0.27777778 0.46666667 0.375      0.5        0.66666667 0.33333333
 0.41666667 0.73333333 0.33333333 0.46153846]

mean value: 0.45643162393162395

key: train_jcc
value: [0.68       0.64166667 0.63114754 0.66666667 0.66666667 0.60683761
 0.65833333 0.63559322 0.63559322 0.65      ]

mean value: 0.6472504921832513

MCC on Blind test: 0.17

Accuracy on Blind test: 0.59

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00950766 0.0092063  0.0089941  0.00913739 0.00906396 0.00937152
 0.00898767 0.0092783  0.00926185 0.00896335]

mean value: 0.009177207946777344

key: score_time
value: [0.00941825 0.00845361 0.00863886 0.00895977 0.00840282 0.00849152
 0.00842237 0.00841475 0.00835538 0.00848985]

mean value: 0.008604717254638673

key: test_mcc
value: [0.30240737 0.48856385 0.38932432 0.48075018 0.65151515 0.66414149
 0.65151515 0.74242424 0.63636364 0.36514837]

mean value: 0.5372153759035299

key: train_mcc
value: [0.81500527 0.7606076  0.79704499 0.77749321 0.72682277 0.73662669
 0.70844205 0.76709739 0.76829494 0.738735  ]

mean value: 0.7596169926310354

key: test_accuracy
value: [0.65217391 0.73913043 0.69565217 0.73913043 0.82608696 0.82608696
 0.82608696 0.86956522 0.81818182 0.68181818]

mean value: 0.7673913043478261

key: train_accuracy
value: [0.90731707 0.87804878 0.89756098 0.88780488 0.86341463 0.86829268
 0.85365854 0.88292683 0.88349515 0.86893204]

mean value: 0.8791451574709922

key: test_fscore
value: [0.6        0.75       0.66666667 0.7        0.83333333 0.81818182
 0.83333333 0.86956522 0.81818182 0.66666667]

mean value: 0.7555928853754941

key: train_fscore
value: [0.90640394 0.87179487 0.89447236 0.88442211 0.8627451  0.86829268
 0.84848485 0.87878788 0.88       0.86567164]

mean value: 0.8761075435073197

key: test_precision
value: [0.66666667 0.69230769 0.7        0.77777778 0.83333333 0.9
 0.83333333 0.90909091 0.81818182 0.7       ]

mean value: 0.783069153069153

key: train_precision
value: [0.92       0.92391304 0.92708333 0.91666667 0.8627451  0.86407767
 0.875      0.90625    0.90721649 0.8877551 ]

mean value: 0.8990707408306566

key: test_recall
value: [0.54545455 0.81818182 0.63636364 0.63636364 0.83333333 0.75
 0.83333333 0.83333333 0.81818182 0.63636364]

mean value: 0.7340909090909091

key: train_recall
value: [0.89320388 0.82524272 0.86407767 0.85436893 0.8627451  0.87254902
 0.82352941 0.85294118 0.85436893 0.84466019]

mean value: 0.854768703597944

key: test_roc_auc
value: [0.64772727 0.74242424 0.69318182 0.73484848 0.82575758 0.82954545
 0.82575758 0.87121212 0.81818182 0.68181818]

mean value: 0.7670454545454546

key: train_roc_auc
value: [0.90738626 0.87830763 0.89772511 0.88796878 0.86341138 0.86831334
 0.85351228 0.88278127 0.88349515 0.86893204]

mean value: 0.8791833238149629

key: test_jcc
value: [0.42857143 0.6        0.5        0.53846154 0.71428571 0.69230769
 0.71428571 0.76923077 0.69230769 0.5       ]

mean value: 0.6149450549450549

key: train_jcc
value: [0.82882883 0.77272727 0.80909091 0.79279279 0.75862069 0.76724138
 0.73684211 0.78378378 0.78571429 0.76315789]

mean value: 0.779879994190339

MCC on Blind test: 0.37

Accuracy on Blind test: 0.69

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.7920084  0.75999999 0.51944804 0.73570108 0.81945181 0.88329148
 0.80979872 0.6771915  0.85314083 0.79645777]

mean value: 0.7646489620208741

key: score_time
value: [0.0137279  0.01358175 0.01116061 0.01169181 0.01440668 0.01173091
 0.01179957 0.01326561 0.01169682 0.01486826]

mean value: 0.01279299259185791

key: test_mcc
value: [0.47727273 0.58930667 0.41096386 0.91605722 0.74242424 0.74047959
 0.91605722 0.82575758 0.83205029 0.64715023]

mean value: 0.7097519634627414

key: train_mcc
value: [0.90516294 0.89271776 0.80864195 0.87320324 0.95126594 0.88361919
 0.90261781 0.84407425 0.87415728 0.90291262]

mean value: 0.8838372984769908

key: test_accuracy
value: [0.73913043 0.7826087  0.69565217 0.95652174 0.86956522 0.86956522
 0.95652174 0.91304348 0.90909091 0.81818182]

mean value: 0.8509881422924901

key: train_accuracy
value: [0.95121951 0.94634146 0.90243902 0.93658537 0.97560976 0.94146341
 0.95121951 0.92195122 0.9368932  0.95145631]

mean value: 0.941517878285579

key: test_fscore
value: [0.72727273 0.8        0.72       0.95238095 0.86956522 0.88
 0.96       0.91666667 0.91666667 0.8       ]

mean value: 0.8542552230378317

key: train_fscore
value: [0.95327103 0.9468599  0.90740741 0.93719807 0.97560976 0.94230769
 0.95145631 0.9223301  0.93779904 0.95145631]

mean value: 0.942569561637334

key: test_precision
value: [0.72727273 0.71428571 0.64285714 1.         0.90909091 0.84615385
 0.92307692 0.91666667 0.84615385 0.88888889]

mean value: 0.8414446664446664

key: train_precision
value: [0.91891892 0.94230769 0.86725664 0.93269231 0.97087379 0.9245283
 0.94230769 0.91346154 0.9245283  0.95145631]

mean value: 0.9288331487717255

key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.90909091 0.83333333 0.91666667
 1.         0.91666667 1.         0.72727273]

mean value: 0.8757575757575757

key: train_recall
value: [0.99029126 0.95145631 0.95145631 0.94174757 0.98039216 0.96078431
 0.96078431 0.93137255 0.95145631 0.95145631]

mean value: 0.9571197411003236

key: test_roc_auc
value: [0.73863636 0.78787879 0.70075758 0.95454545 0.87121212 0.86742424
 0.95454545 0.91287879 0.90909091 0.81818182]

mean value: 0.8515151515151514

key: train_roc_auc
value: [0.95102798 0.94631639 0.90219874 0.93656006 0.97563297 0.94155721
 0.95126594 0.92199695 0.9368932  0.95145631]

mean value: 0.9414905768132495

key: test_jcc
value: [0.57142857 0.66666667 0.5625     0.90909091 0.76923077 0.78571429
 0.92307692 0.84615385 0.84615385 0.66666667]

mean value: 0.7546682484182484

key: train_jcc
value: [0.91071429 0.89908257 0.83050847 0.88181818 0.95238095 0.89090909
 0.90740741 0.85585586 0.88288288 0.90740741]

mean value: 0.8918967107759675

MCC on Blind test: 0.29

Accuracy on Blind test: 0.64

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.0114994  0.01038074 0.00875664 0.00789666 0.00787902 0.00791883
 0.008286   0.00795555 0.00806046 0.00805378]

mean value: 0.00866870880126953

key: score_time
value: [0.0112102  0.00881648 0.00798893 0.00786543 0.00786877 0.00795722
 0.00800657 0.00791478 0.00787568 0.00790453]

mean value: 0.008340859413146972

key: test_mcc
value: [0.74047959 0.41096386 0.74242424 0.91666667 0.58930667 0.83971912
 0.58002308 0.91666667 0.83205029 0.91287093]

mean value: 0.7481171113402942

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.69565217 0.86956522 0.95652174 0.7826087  0.91304348
 0.7826087  0.95652174 0.90909091 0.95454545]

mean value: 0.8689723320158103

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.72       0.86956522 0.95652174 0.76190476 0.90909091
 0.81481481 0.95652174 0.9        0.95238095]

mean value: 0.8697942990986469

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.64285714 0.83333333 0.91666667 0.88888889 1.
 0.73333333 1.         1.         1.        ]

mean value: 0.8915079365079365

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.81818182 0.90909091 1.         0.66666667 0.83333333
 0.91666667 0.91666667 0.81818182 0.90909091]

mean value: 0.8606060606060606

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.70075758 0.87121212 0.95833333 0.78787879 0.91666667
 0.77651515 0.95833333 0.90909091 0.95454545]

mean value: 0.8700757575757576

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.5625     0.76923077 0.91666667 0.61538462 0.83333333
 0.6875     0.91666667 0.81818182 0.90909091]

mean value: 0.7778554778554778

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.03

Accuracy on Blind test: 0.51

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09773445 0.08627319 0.08551788 0.08611631 0.08678699 0.08618236
 0.08573103 0.08577466 0.08596325 0.08555889]

mean value: 0.08716390132904053

key: score_time
value: [0.01986361 0.01779342 0.01704097 0.01689577 0.01845622 0.01685524
 0.01741266 0.01715446 0.0169692  0.01674438]

mean value: 0.01751859188079834

key: test_mcc
value: [0.74047959 0.76764947 0.56818182 0.82575758 0.82575758 0.91605722
 0.65909298 1.         0.83205029 0.83205029]

mean value: 0.7967076829215525

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.86956522 0.7826087  0.91304348 0.91304348 0.95652174
 0.82608696 1.         0.90909091 0.90909091]

mean value: 0.8948616600790513

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.88       0.7826087  0.90909091 0.91666667 0.96
 0.84615385 1.         0.91666667 0.9       ]

mean value: 0.896832964137312

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.78571429 0.75       0.90909091 0.91666667 0.92307692
 0.78571429 1.         0.84615385 1.        ]

mean value: 0.8816416916416916

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.81818182 0.90909091 0.91666667 1.
 0.91666667 1.         1.         0.81818182]

mean value: 0.9196969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.875      0.78409091 0.91287879 0.91287879 0.95454545
 0.8219697  1.         0.90909091 0.90909091]

mean value: 0.8946969696969697

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.78571429 0.64285714 0.83333333 0.84615385 0.92307692
 0.73333333 1.         0.84615385 0.81818182]

mean value: 0.8178804528804529

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.28

Accuracy on Blind test: 0.62

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00724578 0.00700259 0.00706482 0.00705457 0.00700569 0.00709224
 0.00697279 0.00716543 0.00719452 0.00722599]

mean value: 0.007102441787719726

key: score_time
value: [0.00805378 0.00790071 0.00796628 0.00789952 0.00804639 0.00786138
 0.00799799 0.00784135 0.00800538 0.00841856]

mean value: 0.007999134063720704

key: test_mcc
value: [ 0.48075018  0.47727273  0.47727273 -0.04545455  0.56490196  0.31298622
  0.48075018  0.38932432  0.54772256  0.63636364]

mean value: 0.4321889948051151

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.73913043 0.73913043 0.47826087 0.7826087  0.65217391
 0.73913043 0.69565217 0.77272727 0.81818182]

mean value: 0.7156126482213438

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7        0.72727273 0.72727273 0.45454545 0.8        0.63636364
 0.76923077 0.72       0.76190476 0.81818182]

mean value: 0.7114771894771895

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.72727273 0.72727273 0.45454545 0.76923077 0.7
 0.71428571 0.69230769 0.8        0.81818182]

mean value: 0.7180874680874682

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.72727273 0.72727273 0.45454545 0.83333333 0.58333333
 0.83333333 0.75       0.72727273 0.81818182]

mean value: 0.7090909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73484848 0.73863636 0.73863636 0.47727273 0.78030303 0.65530303
 0.73484848 0.69318182 0.77272727 0.81818182]

mean value: 0.7143939393939394

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.53846154 0.57142857 0.57142857 0.29411765 0.66666667 0.46666667
 0.625      0.5625     0.61538462 0.69230769]

mean value: 0.5603961969403146

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.1

Accuracy on Blind test: 0.45

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.09740305 1.13326406 1.09444618 1.15521884 1.09004188 1.09466553
 1.09249401 1.09514403 1.09620023 1.09324002]

mean value: 1.1042117834091187

key: score_time
value: [0.08939648 0.09209704 0.08982658 0.08881688 0.0895195  0.08898306
 0.0902555  0.0890646  0.09453082 0.08895087]

mean value: 0.09014413356781006

key: test_mcc
value: [0.83743579 0.58930667 0.58930667 1.         0.74242424 0.91666667
 0.82575758 1.         1.         0.81818182]

mean value: 0.8319079425560323

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.7826087  0.7826087  1.         0.86956522 0.95652174
 0.91304348 1.         1.         0.90909091]

mean value: 0.9126482213438735

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.9        0.8        0.8        1.         0.86956522 0.95652174
 0.91666667 1.         1.         0.90909091]

mean value: 0.9151844532279315

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.71428571 0.71428571 1.         0.90909091 1.
 0.91666667 1.         1.         0.90909091]

mean value: 0.9163419913419913

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.90909091 0.90909091 1.         0.83333333 0.91666667
 0.91666667 1.         1.         0.90909091]

mean value: 0.9212121212121211

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90909091 0.78787879 0.78787879 1.         0.87121212 0.95833333
 0.91287879 1.         1.         0.90909091]

mean value: 0.9136363636363636

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.81818182 0.66666667 0.66666667 1.         0.76923077 0.91666667
 0.84615385 1.         1.         0.83333333]

mean value: 0.8516899766899767

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.15

Accuracy on Blind test: 0.55

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.8280468  0.83096433 0.89360881 0.91945672 0.93384838 0.88851166
 0.8646996  0.91655803 0.83419728 0.87553668]

mean value: 0.8785428285598755

key: score_time
value: [0.22438312 0.13418078 0.20978713 0.20942521 0.23700547 0.21879888
 0.21615219 0.20876241 0.20610762 0.21533132]

mean value: 0.20799341201782226

key: test_mcc
value: [0.76277007 0.6992059  0.58930667 1.         0.74242424 0.91666667
 0.74047959 1.         0.73029674 0.63636364]

mean value: 0.7817513515626897

key: train_mcc
value: [0.97077583 0.961154   0.98067223 0.961154   0.96116136 0.96116136
 0.94219063 0.96097468 0.97091955 0.94245853]

mean value: 0.9612622141858389

key: test_accuracy
value: [0.86956522 0.82608696 0.7826087  1.         0.86956522 0.95652174
 0.86956522 1.         0.86363636 0.81818182]

mean value: 0.8855731225296443

key: train_accuracy
value: [0.98536585 0.9804878  0.9902439  0.9804878  0.9804878  0.9804878
 0.97073171 0.9804878  0.98543689 0.97087379]

mean value: 0.9805091167416529

key: test_fscore
value: [0.84210526 0.84615385 0.8        1.         0.86956522 0.95652174
 0.88       1.         0.86956522 0.81818182]

mean value: 0.8882093101406603

key: train_fscore
value: [0.98550725 0.98076923 0.99038462 0.98076923 0.98058252 0.98058252
 0.97115385 0.98039216 0.98550725 0.97142857]

mean value: 0.9807077192665552

key: test_precision
value: [1.         0.73333333 0.71428571 1.         0.90909091 1.
 0.84615385 1.         0.83333333 0.81818182]

mean value: 0.8854378954378954

key: train_precision
value: [0.98076923 0.97142857 0.98095238 0.97142857 0.97115385 0.97115385
 0.95283019 0.98039216 0.98076923 0.95327103]

mean value: 0.9714149051235051

key: test_recall
value: [0.72727273 1.         0.90909091 1.         0.83333333 0.91666667
 0.91666667 1.         0.90909091 0.81818182]

mean value: 0.9030303030303031

key: train_recall
value: [0.99029126 0.99029126 1.         0.99029126 0.99019608 0.99019608
 0.99019608 0.98039216 0.99029126 0.99029126]

mean value: 0.9902436702836475

key: test_roc_auc
value: [0.86363636 0.83333333 0.78787879 1.         0.87121212 0.95833333
 0.86742424 1.         0.86363636 0.81818182]

mean value: 0.8863636363636364

key: train_roc_auc
value: [0.98534171 0.98043975 0.99019608 0.98043975 0.98053493 0.98053493
 0.97082619 0.98048734 0.98543689 0.97087379]

mean value: 0.9805111364934324

key: test_jcc
value: [0.72727273 0.73333333 0.66666667 1.         0.76923077 0.91666667
 0.78571429 1.         0.76923077 0.69230769]

mean value: 0.8060422910422911

key: train_jcc
value: [0.97142857 0.96226415 0.98095238 0.96226415 0.96190476 0.96190476
 0.94392523 0.96153846 0.97142857 0.94444444]

mean value: 0.9622055489133606

MCC on Blind test: 0.26

Accuracy on Blind test: 0.61

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01719785 0.00701189 0.00701404 0.00702286 0.0074749  0.00695348
 0.00701451 0.00692368 0.00781441 0.00701284]

mean value: 0.008144044876098632

key: score_time
value: [0.01571059 0.00787878 0.00795388 0.00786066 0.00811124 0.00786138
 0.00785327 0.00791693 0.00871611 0.00790095]

mean value: 0.008776378631591798

key: test_mcc
value: [0.30240737 0.05427825 0.03816905 0.3030303  0.42228828 0.30240737
 0.03816905 0.65151515 0.37796447 0.27272727]

mean value: 0.2762956564879194

key: train_mcc
value: [0.35623111 0.34638101 0.37560698 0.3463735  0.28783552 0.36612372
 0.35687769 0.3658258  0.32044877 0.39058328]

mean value: 0.3512287378448915

key: test_accuracy
value: [0.65217391 0.52173913 0.52173913 0.65217391 0.69565217 0.65217391
 0.52173913 0.82608696 0.68181818 0.63636364]

mean value: 0.6361660079051383

key: train_accuracy
value: [0.67804878 0.67317073 0.68780488 0.67317073 0.64390244 0.68292683
 0.67804878 0.68292683 0.66019417 0.69417476]

mean value: 0.6754368932038836

key: test_fscore
value: [0.6        0.56       0.47619048 0.63636364 0.75862069 0.69230769
 0.56       0.83333333 0.72       0.63636364]

mean value: 0.6473179464213947

key: train_fscore
value: [0.68571429 0.67942584 0.69230769 0.67317073 0.64390244 0.67336683
 0.68571429 0.67980296 0.65686275 0.70967742]

mean value: 0.6779945226077302

key: test_precision
value: [0.66666667 0.5        0.5        0.63636364 0.64705882 0.64285714
 0.53846154 0.83333333 0.64285714 0.63636364]

mean value: 0.6243961920432509

key: train_precision
value: [0.6728972  0.66981132 0.68571429 0.67647059 0.6407767  0.69072165
 0.66666667 0.68316832 0.66336634 0.6754386 ]

mean value: 0.6725031656102882

key: test_recall
value: [0.54545455 0.63636364 0.45454545 0.63636364 0.91666667 0.75
 0.58333333 0.83333333 0.81818182 0.63636364]

mean value: 0.681060606060606

key: train_recall
value: [0.69902913 0.68932039 0.69902913 0.66990291 0.64705882 0.65686275
 0.70588235 0.67647059 0.65048544 0.74757282]

mean value: 0.6841614315629164

key: test_roc_auc
value: [0.64772727 0.52651515 0.51893939 0.65151515 0.68560606 0.64772727
 0.51893939 0.82575758 0.68181818 0.63636364]

mean value: 0.634090909090909

key: train_roc_auc
value: [0.67794594 0.67309157 0.68774986 0.67318675 0.64391776 0.6828003
 0.67818389 0.68289549 0.66019417 0.69417476]

mean value: 0.6754140491147915

key: test_jcc
value: [0.42857143 0.38888889 0.3125     0.46666667 0.61111111 0.52941176
 0.38888889 0.71428571 0.5625     0.46666667]

mean value: 0.4869491129785247

key: train_jcc
value: [0.52173913 0.51449275 0.52941176 0.50735294 0.47482014 0.50757576
 0.52173913 0.51492537 0.48905109 0.55      ]

mean value: 0.5131108089860595

MCC on Blind test: 0.35

Accuracy on Blind test: 0.67

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.08559012 0.14609933 0.03756452 0.03884339 0.04227829 0.08023286
 0.03739309 0.06376338 0.03956223 0.03932667]

mean value: 0.061065387725830075

key: score_time
value: [0.01105213 0.01027513 0.01021671 0.00977039 0.00970769 0.01002121
 0.00953698 0.00958061 0.00958157 0.00957847]

mean value: 0.00993208885192871

key: test_mcc
value: [0.83743579 0.58930667 0.66414149 0.91605722 0.74242424 0.91666667
 0.91605722 1.         1.         0.81818182]

mean value: 0.8400271120875227

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.7826087  0.82608696 0.95652174 0.86956522 0.95652174
 0.95652174 1.         1.         0.90909091]

mean value: 0.9169960474308301

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.9        0.8        0.83333333 0.95238095 0.86956522 0.95652174
 0.96       1.         1.         0.90909091]

mean value: 0.9180892151326934

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.71428571 0.76923077 1.         0.90909091 1.
 0.92307692 1.         1.         0.90909091]

mean value: 0.9224775224775225

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667
 1.         1.         1.         0.90909091]

mean value: 0.9204545454545454

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90909091 0.78787879 0.82954545 0.95454545 0.87121212 0.95833333
 0.95454545 1.         1.         0.90909091]

mean value: 0.9174242424242425

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.81818182 0.66666667 0.71428571 0.90909091 0.76923077 0.91666667
 0.92307692 1.         1.         0.83333333]

mean value: 0.8550532800532801

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.53

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01432657 0.03284144 0.03134632 0.03231716 0.03199744 0.03253031
 0.03206563 0.03227401 0.03221512 0.03237677]

mean value: 0.0304290771484375

key: score_time
value: [0.0105865  0.02112436 0.02062201 0.0215013  0.01899457 0.01899886
 0.01989794 0.02078581 0.01071429 0.02165031]

mean value: 0.01848759651184082

key: test_mcc
value: [0.48075018 0.65151515 0.39393939 1.         0.66414149 0.91666667
 0.58002308 0.91666667 0.75592895 0.81818182]

mean value: 0.7177813381485896

key: train_mcc
value: [0.90310636 0.89271776 0.91224062 0.86358877 0.87320324 0.88292404
 0.89271776 0.86341138 0.89358299 0.84481947]

mean value: 0.882231240068856

key: test_accuracy
value: [0.73913043 0.82608696 0.69565217 1.         0.82608696 0.95652174
 0.7826087  0.95652174 0.86363636 0.90909091]

mean value: 0.8555335968379447

key: train_accuracy
value: [0.95121951 0.94634146 0.95609756 0.93170732 0.93658537 0.94146341
 0.94634146 0.93170732 0.94660194 0.9223301 ]

mean value: 0.9410395453469098

key: test_fscore
value: [0.7        0.81818182 0.69565217 1.         0.81818182 0.95652174
 0.81481481 0.95652174 0.84210526 0.90909091]

mean value: 0.8511070275601168

key: train_fscore
value: [0.95238095 0.9468599  0.95609756 0.93137255 0.93596059 0.94117647
 0.94581281 0.93137255 0.94736842 0.92156863]

mean value: 0.9409970432884046

key: test_precision
value: [0.77777778 0.81818182 0.66666667 1.         0.9        1.
 0.73333333 1.         1.         0.90909091]

mean value: 0.8805050505050505

key: train_precision
value: [0.93457944 0.94230769 0.96078431 0.94059406 0.94059406 0.94117647
 0.95049505 0.93137255 0.93396226 0.93069307]

mean value: 0.9406558966668068

key: test_recall
value: [0.63636364 0.81818182 0.72727273 1.         0.75       0.91666667
 0.91666667 0.91666667 0.72727273 0.90909091]

mean value: 0.8318181818181818

key: train_recall
value: [0.97087379 0.95145631 0.95145631 0.9223301  0.93137255 0.94117647
 0.94117647 0.93137255 0.96116505 0.91262136]

mean value: 0.9415000951837046

key: test_roc_auc
value: [0.73484848 0.82575758 0.6969697  1.         0.82954545 0.95833333
 0.77651515 0.95833333 0.86363636 0.90909091]

mean value: 0.8553030303030302

key: train_roc_auc
value: [0.95112317 0.94631639 0.95612031 0.93175328 0.93656006 0.94146202
 0.94631639 0.93170569 0.94660194 0.9223301 ]

mean value: 0.9410289358461832

key: test_jcc
value: [0.53846154 0.69230769 0.53333333 1.         0.69230769 0.91666667
 0.6875     0.91666667 0.72727273 0.83333333]

mean value: 0.7537849650349651

key: train_jcc
value: [0.90909091 0.89908257 0.91588785 0.87155963 0.87962963 0.88888889
 0.89719626 0.87155963 0.9        0.85454545]

mean value: 0.8887440829166801

MCC on Blind test: 0.07

Accuracy on Blind test: 0.53

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02144146 0.00733709 0.00703239 0.00698829 0.00709367 0.0077188
 0.00785375 0.00805616 0.00777936 0.00778151]

mean value: 0.008908247947692871

key: score_time
value: [0.00882101 0.00820684 0.00806975 0.00794053 0.00805497 0.00866318
 0.00886822 0.00857925 0.00864053 0.00864553]

mean value: 0.008448982238769531

key: test_mcc
value: [0.30240737 0.05427825 0.21969697 0.3030303  0.39727608 0.30240737
 0.39393939 0.56818182 0.29277002 0.09090909]

mean value: 0.29248966588600495

key: train_mcc
value: [0.37650652 0.40495245 0.36648346 0.34638101 0.41611143 0.36642547
 0.29790481 0.32736295 0.38836782 0.33048671]

mean value: 0.3620982636826904

key: test_accuracy
value: [0.65217391 0.52173913 0.60869565 0.65217391 0.69565217 0.65217391
 0.69565217 0.7826087  0.63636364 0.54545455]

mean value: 0.6442687747035574

key: train_accuracy
value: [0.68780488 0.70243902 0.68292683 0.67317073 0.70731707 0.68292683
 0.64878049 0.66341463 0.69417476 0.66504854]

mean value: 0.6808003788775752

key: test_fscore
value: [0.6        0.56       0.60869565 0.63636364 0.74074074 0.69230769
 0.69565217 0.7826087  0.69230769 0.54545455]

mean value: 0.6554130828913438

key: train_fscore
value: [0.70093458 0.70813397 0.69483568 0.67942584 0.71698113 0.68899522
 0.65384615 0.66985646 0.69565217 0.67298578]

mean value: 0.6881646985269205

key: test_precision
value: [0.66666667 0.5        0.58333333 0.63636364 0.66666667 0.64285714
 0.72727273 0.81818182 0.6        0.54545455]

mean value: 0.6386796536796537

key: train_precision
value: [0.67567568 0.69811321 0.67272727 0.66981132 0.69090909 0.6728972
 0.64150943 0.65420561 0.69230769 0.65740741]

mean value: 0.6725563905029608

key: test_recall
value: [0.54545455 0.63636364 0.63636364 0.63636364 0.83333333 0.75
 0.66666667 0.75       0.81818182 0.54545455]

mean value: 0.6818181818181818

key: train_recall
value: [0.72815534 0.7184466  0.7184466  0.68932039 0.74509804 0.70588235
 0.66666667 0.68627451 0.69902913 0.68932039]

mean value: 0.7046640015229393

key: test_roc_auc
value: [0.64772727 0.52651515 0.60984848 0.65151515 0.68939394 0.64772727
 0.6969697  0.78409091 0.63636364 0.54545455]

mean value: 0.643560606060606

key: train_roc_auc
value: [0.68760708 0.70236056 0.68275271 0.67309157 0.70750048 0.68303826
 0.64886731 0.6635256  0.69417476 0.66504854]

mean value: 0.6807966876070817

key: test_jcc
value: [0.42857143 0.38888889 0.4375     0.46666667 0.58823529 0.52941176
 0.53333333 0.64285714 0.52941176 0.375     ]

mean value: 0.49198762838468724

key: train_jcc
value: [0.53956835 0.54814815 0.5323741  0.51449275 0.55882353 0.52554745
 0.48571429 0.50359712 0.53333333 0.50714286]

mean value: 0.5248741920974376

MCC on Blind test: 0.39

Accuracy on Blind test: 0.69

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00869298 0.01086235 0.01095939 0.01100326 0.01040626 0.01017833
 0.01074862 0.01032758 0.01047254 0.01042032]

mean value: 0.010407161712646485

key: score_time
value: [0.00884461 0.01061177 0.01056266 0.01120257 0.01041794 0.01044512
 0.01040959 0.01039767 0.01039672 0.0103991 ]

mean value: 0.010368776321411134

key: test_mcc
value: [0.65909298 0.47923384 0.5164589  0.69084928 0.74242424 0.74242424
 0.76277007 0.69084928 0.83205029 0.54232614]

mean value: 0.6658479277717093

key: train_mcc
value: [0.8373082  0.60342152 0.90672005 0.74004127 0.84982541 0.84787319
 0.87166073 0.56519801 0.82977382 0.63500064]

mean value: 0.7686822837406401

key: test_accuracy
value: [0.82608696 0.69565217 0.73913043 0.82608696 0.86956522 0.86956522
 0.86956522 0.82608696 0.90909091 0.72727273]

mean value: 0.8158102766798419

key: train_accuracy
value: [0.91707317 0.77073171 0.95121951 0.85365854 0.92195122 0.92195122
 0.93170732 0.74146341 0.90776699 0.79126214]

mean value: 0.8708785223774568

key: test_fscore
value: [0.8        0.53333333 0.76923077 0.77777778 0.86956522 0.86956522
 0.88888889 0.85714286 0.91666667 0.625     ]

mean value: 0.7907170727822902

key: train_fscore
value: [0.92093023 0.70807453 0.9537037  0.82954545 0.92592593 0.91752577
 0.93577982 0.79377432 0.91555556 0.73939394]

mean value: 0.8640209254619995

key: test_precision
value: [0.88888889 1.         0.66666667 1.         0.90909091 0.90909091
 0.8        0.75       0.84615385 1.        ]

mean value: 0.8769891219891219

key: train_precision
value: [0.88392857 0.98275862 0.91150442 1.         0.87719298 0.9673913
 0.87931034 0.65806452 0.8442623  0.98387097]

mean value: 0.8988284027481475

key: test_recall
value: [0.72727273 0.36363636 0.90909091 0.63636364 0.83333333 0.83333333
 1.         1.         1.         0.45454545]

mean value: 0.7757575757575758

key: train_recall
value: [0.96116505 0.55339806 1.         0.70873786 0.98039216 0.87254902
 1.         1.         1.         0.59223301]

mean value: 0.8668475157053113

key: test_roc_auc
value: [0.8219697  0.68181818 0.74621212 0.81818182 0.87121212 0.87121212
 0.86363636 0.81818182 0.90909091 0.72727273]

mean value: 0.8128787878787879

key: train_roc_auc
value: [0.91685703 0.77179707 0.95098039 0.85436893 0.92223491 0.9217114
 0.93203883 0.74271845 0.90776699 0.79126214]

mean value: 0.8711736150770988

key: test_jcc
value: [0.66666667 0.36363636 0.625      0.63636364 0.76923077 0.76923077
 0.8        0.75       0.84615385 0.45454545]

mean value: 0.6680827505827506

key: train_jcc
value: [0.85344828 0.54807692 0.91150442 0.70873786 0.86206897 0.84761905
 0.87931034 0.65806452 0.8442623  0.58653846]

mean value: 0.769963111850876

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01068044 0.01032329 0.01064396 0.01066589 0.01067901 0.01030278
 0.01019835 0.01055288 0.01074505 0.01002216]

mean value: 0.0104813814163208

key: score_time
value: [0.01096964 0.01078129 0.01076531 0.01084399 0.01041245 0.01038766
 0.01039839 0.0104928  0.01040888 0.01040125]

mean value: 0.010586166381835937

key: test_mcc
value: [0.56490196 0.40451992 0.33371191 0.74047959 0.74242424 0.58002308
 0.74242424 0.91666667 1.         0.31622777]

mean value: 0.6341379361751176

key: train_mcc
value: [0.84083863 0.65525342 0.84965937 0.84102851 0.91330072 0.82620413
 0.82825757 0.85400014 0.87581131 0.46017899]

mean value: 0.7944532795671018

key: test_accuracy
value: [0.7826087  0.65217391 0.65217391 0.86956522 0.86956522 0.7826087
 0.86956522 0.95652174 1.         0.59090909]

mean value: 0.8025691699604743

key: train_accuracy
value: [0.91707317 0.8        0.92195122 0.91707317 0.95609756 0.90731707
 0.91219512 0.92682927 0.9368932  0.67475728]

mean value: 0.8870187070802747

key: test_fscore
value: [0.76190476 0.42857143 0.69230769 0.85714286 0.86956522 0.81481481
 0.86956522 0.95652174 1.         0.30769231]

mean value: 0.7558086036346906

key: train_fscore
value: [0.92237443 0.75151515 0.9266055  0.9119171  0.9569378  0.91402715
 0.90721649 0.92537313 0.93896714 0.51798561]

mean value: 0.8672919508970722

key: test_precision
value: [0.8        1.         0.6        0.9        0.90909091 0.73333333
 0.90909091 1.         1.         1.        ]

mean value: 0.8851515151515151

key: train_precision
value: [0.87068966 1.         0.87826087 0.97777778 0.93457944 0.8487395
 0.95652174 0.93939394 0.90909091 1.        ]

mean value: 0.9315053825181348

key: test_recall
value: [0.72727273 0.27272727 0.81818182 0.81818182 0.83333333 0.91666667
 0.83333333 0.91666667 1.         0.18181818]

mean value: 0.7318181818181818

key: train_recall
value: [0.98058252 0.60194175 0.98058252 0.85436893 0.98039216 0.99019608
 0.8627451  0.91176471 0.97087379 0.34951456]

mean value: 0.848296211688559

key: test_roc_auc
value: [0.78030303 0.63636364 0.65909091 0.86742424 0.87121212 0.77651515
 0.87121212 0.95833333 1.         0.59090909]

mean value: 0.8011363636363636

key: train_roc_auc
value: [0.91676185 0.80097087 0.92166381 0.91738054 0.9562155  0.9077194
 0.91195507 0.92675614 0.9368932  0.67475728]

mean value: 0.8871073672187322

key: test_jcc
value: [0.61538462 0.27272727 0.52941176 0.75       0.76923077 0.6875
 0.76923077 0.91666667 1.         0.18181818]

mean value: 0.6491970039764158

key: train_jcc
value: [0.8559322  0.60194175 0.86324786 0.83809524 0.91743119 0.84166667
 0.83018868 0.86111111 0.88495575 0.34951456]

mean value: 0.7844085017308544

MCC on Blind test: 0.21

Accuracy on Blind test: 0.61

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.08768725 0.0763936  0.07663751 0.07690692 0.07599807 0.07673168
 0.07730794 0.07686925 0.07642341 0.07606792]

mean value: 0.07770235538482666

key: score_time
value: [0.01560497 0.01552868 0.0159018  0.01562333 0.01548648 0.01561403
 0.01570988 0.0160079  0.01545119 0.0155468 ]

mean value: 0.015647506713867186

key: test_mcc
value: [0.91605722 0.6992059  0.74242424 0.83743579 0.74242424 0.91666667
 0.91605722 0.83971912 1.         1.        ]

mean value: 0.8609990412070886

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95652174 0.82608696 0.86956522 0.91304348 0.86956522 0.95652174
 0.95652174 0.91304348 1.         1.        ]

mean value: 0.9260869565217391

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95238095 0.84615385 0.86956522 0.9        0.86956522 0.95652174
 0.96       0.90909091 1.         1.        ]

mean value: 0.9263277881538751

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.73333333 0.83333333 1.         0.90909091 1.
 0.92307692 1.         1.         1.        ]

mean value: 0.9398834498834499

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         0.90909091 0.81818182 0.83333333 0.91666667
 1.         0.83333333 1.         1.        ]

mean value: 0.921969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.83333333 0.87121212 0.90909091 0.87121212 0.95833333
 0.95454545 0.91666667 1.         1.        ]

mean value: 0.9268939393939394

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90909091 0.73333333 0.76923077 0.81818182 0.76923077 0.91666667
 0.92307692 0.83333333 1.         1.        ]

mean value: 0.8672144522144523

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.02

Accuracy on Blind test: 0.49

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03209019 0.02829218 0.03167534 0.03128099 0.03293514 0.02934527
 0.03381133 0.02705407 0.03058004 0.02823019]

mean value: 0.030529475212097167

key: score_time
value: [0.01726365 0.02387595 0.02088284 0.02278829 0.02153826 0.01731133
 0.01608276 0.01754308 0.02704978 0.01536942]

mean value: 0.01997053623199463

key: test_mcc
value: [0.83743579 0.39393939 0.66414149 1.         0.74242424 0.91666667
 0.91605722 1.         0.91287093 0.81818182]

mean value: 0.820171755257551

key: train_mcc
value: [0.98048734 0.99029034 0.99029034 0.99029126 0.98067223 0.98048734
 0.99029034 0.99029034 0.96189066 0.99033794]

mean value: 0.9845328141111404

key: test_accuracy
value: [0.91304348 0.69565217 0.82608696 1.         0.86956522 0.95652174
 0.95652174 1.         0.95454545 0.90909091]

mean value: 0.908102766798419

key: train_accuracy
value: [0.9902439  0.99512195 0.99512195 0.99512195 0.9902439  0.9902439
 0.99512195 0.99512195 0.98058252 0.99514563]

mean value: 0.992206961875444

key: test_fscore
value: [0.9        0.69565217 0.83333333 1.         0.86956522 0.95652174
 0.96       1.         0.95238095 0.90909091]

mean value: 0.9076544325239977

key: train_fscore
value: [0.99029126 0.99516908 0.99516908 0.99512195 0.99009901 0.99019608
 0.99507389 0.99507389 0.98019802 0.99512195]

mean value: 0.9921514220211729

key: test_precision
value: [1.         0.66666667 0.76923077 1.         0.90909091 1.
 0.92307692 1.         1.         0.90909091]

mean value: 0.9177156177156177

key: train_precision
value: [0.99029126 0.99038462 0.99038462 1.         1.         0.99019608
 1.         1.         1.         1.        ]

mean value: 0.9961256571336525

key: test_recall
value: [0.81818182 0.72727273 0.90909091 1.         0.83333333 0.91666667
 1.         1.         0.90909091 0.90909091]

mean value: 0.9022727272727272

key: train_recall
value: [0.99029126 1.         1.         0.99029126 0.98039216 0.99019608
 0.99019608 0.99019608 0.96116505 0.99029126]

mean value: 0.988301922710832

key: test_roc_auc
value: [0.90909091 0.6969697  0.82954545 1.         0.87121212 0.95833333
 0.95454545 1.         0.95454545 0.90909091]

mean value: 0.9083333333333333

key: train_roc_auc
value: [0.99024367 0.99509804 0.99509804 0.99514563 0.99019608 0.99024367
 0.99509804 0.99509804 0.98058252 0.99514563]

mean value: 0.992194936226918

key: test_jcc
value: [0.81818182 0.53333333 0.71428571 1.         0.76923077 0.91666667
 0.92307692 1.         0.90909091 0.83333333]

mean value: 0.8417199467199468

key: train_jcc
value: [0.98076923 0.99038462 0.99038462 0.99029126 0.98039216 0.98058252
 0.99019608 0.99019608 0.96116505 0.99029126]

mean value: 0.984465287235133

MCC on Blind test: 0.07

Accuracy on Blind test: 0.53

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.0418961  0.05101323 0.05104518 0.05136704 0.05105376 0.05128527
 0.05111003 0.04877734 0.05096364 0.04899049]

mean value: 0.04975020885467529

key: score_time
value: [0.02236867 0.01667118 0.022753   0.02080393 0.02091503 0.02079797
 0.01660824 0.01153994 0.01939178 0.01141834]

mean value: 0.018326807022094726

key: test_mcc
value: [0.12336594 0.39393939 0.56490196 0.39727608 0.74047959 0.41096386
 0.58930667 0.65151515 0.48795004 0.2773501 ]

mean value: 0.46370487732579513

key: train_mcc
value: [0.95126131 0.92355447 0.90401389 0.93283198 0.93209539 0.92211753
 0.91257158 0.91325992 0.90308289 0.89358299]

mean value: 0.9188371932639088

key: test_accuracy
value: [0.56521739 0.69565217 0.7826087  0.69565217 0.86956522 0.69565217
 0.7826087  0.82608696 0.72727273 0.63636364]

mean value: 0.7276679841897233

key: train_accuracy
value: [0.97560976 0.96097561 0.95121951 0.96585366 0.96585366 0.96097561
 0.95609756 0.95609756 0.95145631 0.94660194]

mean value: 0.9590741179256452

key: test_fscore
value: [0.44444444 0.69565217 0.76190476 0.63157895 0.88       0.66666667
 0.76190476 0.83333333 0.66666667 0.6       ]

mean value: 0.69421517562021

key: train_fscore
value: [0.97584541 0.96       0.95       0.96517413 0.96517413 0.96039604
 0.95522388 0.95477387 0.95098039 0.94581281]

mean value: 0.9583380658920831

key: test_precision
value: [0.57142857 0.66666667 0.8        0.75       0.84615385 0.77777778
 0.88888889 0.83333333 0.85714286 0.66666667]

mean value: 0.7658058608058608

key: train_precision
value: [0.97115385 0.98969072 0.97938144 0.98979592 0.97979798 0.97
 0.96969697 0.97938144 0.96039604 0.96      ]

mean value: 0.9749294361867525

key: test_recall
value: [0.36363636 0.72727273 0.72727273 0.54545455 0.91666667 0.58333333
 0.66666667 0.83333333 0.54545455 0.54545455]

mean value: 0.6454545454545455

key: train_recall
value: [0.98058252 0.93203883 0.9223301  0.94174757 0.95098039 0.95098039
 0.94117647 0.93137255 0.94174757 0.93203883]

mean value: 0.9424995240814773

key: test_roc_auc
value: [0.55681818 0.6969697  0.78030303 0.68939394 0.86742424 0.70075758
 0.78787879 0.82575758 0.72727273 0.63636364]

mean value: 0.7268939393939393

key: train_roc_auc
value: [0.97558538 0.96111746 0.95136113 0.96597183 0.96578146 0.96092709
 0.95602513 0.95597754 0.95145631 0.94660194]

mean value: 0.9590805254140491

key: test_jcc
value: [0.28571429 0.53333333 0.61538462 0.46153846 0.78571429 0.5
 0.61538462 0.71428571 0.5        0.42857143]

mean value: 0.543992673992674

key: train_jcc
value: [0.95283019 0.92307692 0.9047619  0.93269231 0.93269231 0.92380952
 0.91428571 0.91346154 0.90654206 0.89719626]

mean value: 0.9201348726216474

MCC on Blind test: 0.32

Accuracy on Blind test: 0.66

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.15189385 0.14156413 0.14368677 0.14271593 0.13952732 0.14208126
 0.13928676 0.1380887  0.14191723 0.14071012]

mean value: 0.14214720726013183

key: score_time
value: [0.00953889 0.00933433 0.0094347  0.00952435 0.00953817 0.00918531
 0.00900173 0.00848866 0.00957823 0.00928164]

mean value: 0.009290599822998047

key: test_mcc
value: [0.76277007 0.5164589  0.66414149 0.83743579 0.74242424 0.91666667
 0.91605722 0.91666667 1.         0.81818182]

mean value: 0.8090802870032009

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.73913043 0.82608696 0.91304348 0.86956522 0.95652174
 0.95652174 0.95652174 1.         0.90909091]

mean value: 0.899604743083004

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84210526 0.76923077 0.83333333 0.9        0.86956522 0.95652174
 0.96       0.95652174 1.         0.90909091]

mean value: 0.8996368970465081

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.66666667 0.76923077 1.         0.90909091 1.
 0.92307692 1.         1.         0.90909091]

mean value: 0.9177156177156177

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667
 1.         0.91666667 1.         0.90909091]

mean value: 0.8939393939393939

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86363636 0.74621212 0.82954545 0.90909091 0.87121212 0.95833333
 0.95454545 0.95833333 1.         0.90909091]

mean value: 0.9

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.72727273 0.625      0.71428571 0.81818182 0.76923077 0.91666667
 0.92307692 0.91666667 1.         0.83333333]

mean value: 0.8243714618714619

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.54

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01663566 0.01202774 0.01275182 0.01185322 0.01200604 0.01168084
 0.01452398 0.0116775  0.01183033 0.01218224]

mean value: 0.012716937065124511

key: score_time
value: [0.01137972 0.01086974 0.01099038 0.01081109 0.01094103 0.01092815
 0.01124692 0.01074839 0.01087403 0.01108217]

mean value: 0.010987162590026855

key: test_mcc
value: [0.15096491 0.56879646 0.29359034 0.33371191 0.55048188 0.65909298
 0.40451992 0.55048188 0.56694671 0.54232614]

mean value: 0.4620913131591764

key: train_mcc
value: [0.58647158 0.65859127 0.56715421 0.63490794 0.52720108 0.49387839
 0.4975669  0.55024014 0.59539971 0.63353022]

mean value: 0.5744941432717103

key: test_accuracy
value: [0.56521739 0.73913043 0.60869565 0.65217391 0.73913043 0.82608696
 0.65217391 0.73913043 0.77272727 0.72727273]

mean value: 0.7021739130434782

key: train_accuracy
value: [0.76097561 0.8097561  0.76585366 0.79512195 0.72195122 0.73170732
 0.69756098 0.73170732 0.77669903 0.78640777]

mean value: 0.7577740942457968

key: test_fscore
value: [0.61538462 0.78571429 0.68965517 0.69230769 0.8        0.84615385
 0.75       0.8        0.8        0.78571429]

mean value: 0.7564929897688518

key: train_fscore
value: [0.80632411 0.83817427 0.80165289 0.82786885 0.77992278 0.76987448
 0.76691729 0.78764479 0.81147541 0.824     ]

mean value: 0.8013854877176021

key: test_precision
value: [0.53333333 0.64705882 0.55555556 0.6        0.66666667 0.78571429
 0.6        0.66666667 0.71428571 0.64705882]

mean value: 0.6416339869281046

key: train_precision
value: [0.68       0.73188406 0.69784173 0.71631206 0.6433121  0.67153285
 0.62195122 0.64968153 0.70212766 0.70068027]

mean value: 0.6815323469811392

key: test_recall
value: [0.72727273 1.         0.90909091 0.81818182 1.         0.91666667
 1.         1.         0.90909091 1.        ]

mean value: 0.928030303030303

key: train_recall
value: [0.99029126 0.98058252 0.94174757 0.98058252 0.99019608 0.90196078
 1.         1.         0.96116505 1.        ]

mean value: 0.9746525794783933

key: test_roc_auc
value: [0.5719697  0.75       0.62121212 0.65909091 0.72727273 0.8219697
 0.63636364 0.72727273 0.77272727 0.72727273]

mean value: 0.7015151515151515

key: train_roc_auc
value: [0.75985151 0.80891871 0.76499143 0.79421283 0.72325338 0.73253379
 0.69902913 0.73300971 0.77669903 0.78640777]

mean value: 0.7578907291071768

key: test_jcc
value: [0.44444444 0.64705882 0.52631579 0.52941176 0.66666667 0.73333333
 0.6        0.66666667 0.66666667 0.64705882]

mean value: 0.6127622979016167

key: train_jcc
value: [0.67549669 0.72142857 0.66896552 0.70629371 0.63924051 0.62585034
 0.62195122 0.64968153 0.68275862 0.70068027]

mean value: 0.6692346971143661

MCC on Blind test: 0.36

Accuracy on Blind test: 0.61

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01373434 0.01035643 0.01028872 0.01046038 0.01034403 0.01036429
 0.01039052 0.01046538 0.01036501 0.01036739]

mean value: 0.010713648796081544

key: score_time
value: [0.01092839 0.01048851 0.01048732 0.01047397 0.01050997 0.01047373
 0.01039219 0.0105195  0.01046062 0.01039433]

mean value: 0.010512852668762207

key: test_mcc
value: [0.62050523 0.74242424 0.47727273 0.91605722 0.66414149 0.82575758
 0.82575758 0.83971912 0.91287093 0.73029674]

mean value: 0.7554802857310763

key: train_mcc
value: [0.87320324 0.85404174 0.8742382  0.82504775 0.87320324 0.83447633
 0.86356283 0.85368872 0.86424061 0.86424061]

mean value: 0.8579943273085286

key: test_accuracy
value: [0.7826087  0.86956522 0.73913043 0.95652174 0.82608696 0.91304348
 0.91304348 0.91304348 0.95454545 0.86363636]

mean value: 0.8731225296442687

key: train_accuracy
value: [0.93658537 0.92682927 0.93658537 0.91219512 0.93658537 0.91707317
 0.93170732 0.92682927 0.93203883 0.93203883]

mean value: 0.9288467913805352

key: test_fscore
value: [0.70588235 0.86956522 0.72727273 0.95238095 0.81818182 0.91666667
 0.91666667 0.90909091 0.95238095 0.85714286]

mean value: 0.862523112011603

key: train_fscore
value: [0.93719807 0.92610837 0.93532338 0.91089109 0.93596059 0.91542289
 0.93069307 0.92610837 0.93137255 0.93137255]

mean value: 0.9280450932646102

key: test_precision
value: [1.         0.83333333 0.72727273 1.         0.9        0.91666667
 0.91666667 1.         1.         0.9       ]

mean value: 0.9193939393939394

key: train_precision
value: [0.93269231 0.94       0.95918367 0.92929293 0.94059406 0.92929293
 0.94       0.93069307 0.94059406 0.94059406]

mean value: 0.9382937087272306

key: test_recall
value: [0.54545455 0.90909091 0.72727273 0.90909091 0.75       0.91666667
 0.91666667 0.83333333 0.90909091 0.81818182]

mean value: 0.8234848484848485

key: train_recall
value: [0.94174757 0.91262136 0.91262136 0.89320388 0.93137255 0.90196078
 0.92156863 0.92156863 0.9223301  0.9223301 ]

mean value: 0.9181324957167333

key: test_roc_auc
value: [0.77272727 0.87121212 0.73863636 0.95454545 0.82954545 0.91287879
 0.91287879 0.91666667 0.95454545 0.86363636]

mean value: 0.8727272727272727

key: train_roc_auc
value: [0.93656006 0.92689891 0.93670284 0.91228822 0.93656006 0.91699981
 0.9316581  0.92680373 0.93203883 0.93203883]

mean value: 0.9288549400342662

key: test_jcc
value: [0.54545455 0.76923077 0.57142857 0.90909091 0.69230769 0.84615385
 0.84615385 0.83333333 0.90909091 0.75      ]

mean value: 0.7672244422244422

key: train_jcc
value: [0.88181818 0.86238532 0.87850467 0.83636364 0.87962963 0.8440367
 0.87037037 0.86238532 0.87155963 0.87155963]

mean value: 0.8658613096583602

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.08771014 0.08218527 0.08186293 0.08213329 0.14164805 0.1171093
 0.08225846 0.0825932  0.08844352 0.08227658]

mean value: 0.09282207489013672

key: score_time
value: [0.01075149 0.01060557 0.01062846 0.01066661 0.01069403 0.01066709
 0.01072025 0.01071143 0.01065779 0.01065183]

mean value: 0.010675454139709472

key: test_mcc
value: [0.69084928 0.74242424 0.39393939 1.         0.66414149 0.91666667
 0.65909298 0.91666667 0.91287093 0.91287093]

mean value: 0.7809522578121664

key: train_mcc
value: [0.87321531 0.85404174 0.90261781 0.85404174 0.87320324 0.88292404
 0.88308106 0.86341138 0.87382759 0.8544092 ]

mean value: 0.8714773106645599

key: test_accuracy
value: [0.82608696 0.86956522 0.69565217 1.         0.82608696 0.95652174
 0.82608696 0.95652174 0.95454545 0.95454545]

mean value: 0.8865612648221344

key: train_accuracy
value: [0.93658537 0.92682927 0.95121951 0.92682927 0.93658537 0.94146341
 0.94146341 0.93170732 0.9368932  0.92718447]

mean value: 0.9356760596732181

key: test_fscore
value: [0.77777778 0.86956522 0.69565217 1.         0.81818182 0.95652174
 0.84615385 0.95652174 0.95238095 0.95652174]

mean value: 0.8829277003190047

key: train_fscore
value: [0.93658537 0.92610837 0.95098039 0.92610837 0.93596059 0.94117647
 0.94059406 0.93137255 0.93658537 0.92682927]

mean value: 0.9352300811072124

key: test_precision
value: [1.         0.83333333 0.66666667 1.         0.9        1.
 0.78571429 1.         1.         0.91666667]

mean value: 0.9102380952380953

key: train_precision
value: [0.94117647 0.94       0.96039604 0.94       0.94059406 0.94117647
 0.95       0.93137255 0.94117647 0.93137255]

mean value: 0.9417264608813822

key: test_recall
value: [0.63636364 0.90909091 0.72727273 1.         0.75       0.91666667
 0.91666667 0.91666667 0.90909091 1.        ]

mean value: 0.8681818181818182

key: train_recall
value: [0.93203883 0.91262136 0.94174757 0.91262136 0.93137255 0.94117647
 0.93137255 0.93137255 0.93203883 0.9223301 ]

mean value: 0.9288692175899487

key: test_roc_auc
value: [0.81818182 0.87121212 0.6969697  1.         0.82954545 0.95833333
 0.8219697  0.95833333 0.95454545 0.95454545]

mean value: 0.8863636363636364

key: train_roc_auc
value: [0.93660765 0.92689891 0.95126594 0.92689891 0.93656006 0.94146202
 0.94141443 0.93170569 0.9368932  0.92718447]

mean value: 0.9356891300209405

key: test_jcc
value: [0.63636364 0.76923077 0.53333333 1.         0.69230769 0.91666667
 0.73333333 0.91666667 0.90909091 0.91666667]

mean value: 0.8023659673659673

key: train_jcc
value: [0.88073394 0.86238532 0.90654206 0.86238532 0.87962963 0.88888889
 0.88785047 0.87155963 0.88073394 0.86363636]

mean value: 0.8784345570656983

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01983237 0.0227282  0.0234642  0.02106881 0.01955771 0.02060318
 0.02057695 0.02065086 0.02190328 0.02053022]

mean value: 0.021091580390930176

key: score_time
value: [0.01070762 0.01104808 0.01071906 0.01067495 0.01204062 0.0107677
 0.01073122 0.01068878 0.01067734 0.01063418]

mean value: 0.0108689546585083

key: test_mcc
value: [0.58002308 0.48856385 0.23262105 0.65909298 0.65909298 0.83971912
 0.91605722 0.82575758 1.         0.27272727]

mean value: 0.6473655141770323

key: train_mcc
value: [0.78548989 0.77565201 0.83417421 0.74754561 0.77565201 0.75613935
 0.76601619 0.77565201 0.77673564 0.78640777]

mean value: 0.7779464673314072

key: test_accuracy
value: [0.7826087  0.73913043 0.60869565 0.82608696 0.82608696 0.91304348
 0.95652174 0.91304348 1.         0.63636364]

mean value: 0.8201581027667985

key: train_accuracy
value: [0.89268293 0.88780488 0.91707317 0.87317073 0.88780488 0.87804878
 0.88292683 0.88780488 0.88834951 0.89320388]

mean value: 0.8888870471228985

key: test_fscore
value: [0.73684211 0.75       0.64       0.8        0.84615385 0.90909091
 0.96       0.91666667 1.         0.63636364]

mean value: 0.8195117163538216

key: train_fscore
value: [0.89423077 0.88780488 0.9178744  0.87735849 0.88780488 0.87804878
 0.88349515 0.88780488 0.88888889 0.89320388]

mean value: 0.8896514988581322

key: test_precision
value: [0.875      0.69230769 0.57142857 0.88888889 0.78571429 1.
 0.92307692 0.91666667 1.         0.63636364]

mean value: 0.8289446664446665

key: train_precision
value: [0.88571429 0.89215686 0.91346154 0.85321101 0.88349515 0.87378641
 0.875      0.88349515 0.88461538 0.89320388]

mean value: 0.8838139663234891

key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.72727273 0.91666667 0.83333333
 1.         0.91666667 1.         0.63636364]

mean value: 0.8212121212121212

key: train_recall
value: [0.90291262 0.88349515 0.9223301  0.90291262 0.89215686 0.88235294
 0.89215686 0.89215686 0.89320388 0.89320388]

mean value: 0.895688178183895

key: test_roc_auc
value: [0.77651515 0.74242424 0.61363636 0.8219697  0.8219697  0.91666667
 0.95454545 0.91287879 1.         0.63636364]

mean value: 0.8196969696969697

key: train_roc_auc
value: [0.89263278 0.887826   0.9170474  0.87302494 0.887826   0.87806967
 0.88297164 0.887826   0.88834951 0.89320388]

mean value: 0.8888777841233582

key: test_jcc
value: [0.58333333 0.6        0.47058824 0.66666667 0.73333333 0.83333333
 0.92307692 0.84615385 1.         0.46666667]

mean value: 0.7123152337858221

key: train_jcc
value: [0.80869565 0.79824561 0.84821429 0.78151261 0.79824561 0.7826087
 0.79130435 0.79824561 0.8        0.80701754]

mean value: 0.8014089972373388

MCC on Blind test: 0.35

Accuracy on Blind test: 0.67

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.63725948 0.8160131  0.64619517 0.67831397 0.81363416 0.64932156
 0.64309788 0.77185845 0.69442797 0.65856528]

mean value: 0.7008687019348144

key: score_time
value: [0.01391268 0.01416373 0.01514912 0.01428485 0.01425004 0.01425481
 0.01451612 0.01446199 0.01425672 0.01421475]

mean value: 0.014346480369567871

key: test_mcc
value: [0.58002308 0.74242424 0.58930667 0.74047959 0.74242424 0.82575758
 0.74242424 0.58930667 0.75592895 0.75592895]

mean value: 0.7064004193432867

key: train_mcc
value: [1.         0.99029034 0.93174679 1.         0.96116136 0.93174679
 0.87355997 0.92194936 0.99033794 0.97091955]

mean value: 0.9571712098991773

key: test_accuracy
value: [0.7826087  0.86956522 0.7826087  0.86956522 0.86956522 0.91304348
 0.86956522 0.7826087  0.86363636 0.86363636]

mean value: 0.8466403162055336

key: train_accuracy
value: [1.         0.99512195 0.96585366 1.         0.9804878  0.96585366
 0.93658537 0.96097561 0.99514563 0.98543689]

mean value: 0.9785460573052333

key: test_fscore
value: [0.73684211 0.86956522 0.8        0.85714286 0.86956522 0.91666667
 0.86956522 0.76190476 0.84210526 0.88      ]

mean value: 0.8403357306309251

key: train_fscore
value: [1.         0.99516908 0.96618357 1.         0.98058252 0.96551724
 0.93719807 0.96078431 0.99512195 0.98550725]

mean value: 0.978606400161065

key: test_precision
value: [0.875      0.83333333 0.71428571 0.9        0.90909091 0.91666667
 0.90909091 0.88888889 1.         0.78571429]

mean value: 0.8732070707070707

key: train_precision
value: [1.         0.99038462 0.96153846 1.         0.97115385 0.97029703
 0.92380952 0.96078431 1.         0.98076923]

mean value: 0.9758737021084138

key: test_recall
value: [0.63636364 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667
 0.83333333 0.66666667 0.72727273 1.        ]

mean value: 0.825

key: train_recall
value: [1.         1.         0.97087379 1.         0.99019608 0.96078431
 0.95098039 0.96078431 0.99029126 0.99029126]

mean value: 0.9814201408718828

key: test_roc_auc
value: [0.77651515 0.87121212 0.78787879 0.86742424 0.87121212 0.91287879
 0.87121212 0.78787879 0.86363636 0.86363636]

mean value: 0.8473484848484848

key: train_roc_auc
value: [1.         0.99509804 0.96582905 1.         0.98053493 0.96582905
 0.93665524 0.96097468 0.99514563 0.98543689]

mean value: 0.9785503521797069

key: test_jcc
value: [0.58333333 0.76923077 0.66666667 0.75       0.76923077 0.84615385
 0.76923077 0.61538462 0.72727273 0.78571429]

mean value: 0.7282217782217782

key: train_jcc
value: [1.         0.99038462 0.93457944 1.         0.96190476 0.93333333
 0.88181818 0.9245283  0.99029126 0.97142857]

mean value: 0.9588268467144515

MCC on Blind test: 0.08

Accuracy on Blind test: 0.54

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00974512 0.00931644 0.00723195 0.00713229 0.00715876 0.00732064
 0.00752759 0.00799894 0.00718284 0.00744987]

mean value: 0.00780644416809082

key: score_time
value: [0.01071715 0.00873137 0.00839925 0.0081389  0.008322   0.00864697
 0.00856805 0.0087285  0.00848913 0.00818849]

mean value: 0.00869297981262207

key: test_mcc
value: [0.2096648  0.56879646 0.29359034 0.31298622 0.32232919 0.65151515
 0.40451992 0.01343038 0.54232614 0.29277002]

mean value: 0.36119286228684955

key: train_mcc
value: [0.42185455 0.44881052 0.49019032 0.45523737 0.44991626 0.44991626
 0.4598332  0.51034181 0.41615085 0.43864549]

mean value: 0.45408966367968817

key: test_accuracy
value: [0.56521739 0.73913043 0.60869565 0.65217391 0.60869565 0.82608696
 0.65217391 0.52173913 0.72727273 0.63636364]

mean value: 0.6537549407114625

key: train_accuracy
value: [0.65853659 0.69268293 0.71707317 0.69268293 0.68780488 0.68780488
 0.69756098 0.73658537 0.67961165 0.68932039]

mean value: 0.6939663746152025

key: test_fscore
value: [0.66666667 0.78571429 0.68965517 0.66666667 0.72727273 0.83333333
 0.75       0.66666667 0.78571429 0.69230769]

mean value: 0.7263997496756118

key: train_fscore
value: [0.74452555 0.75675676 0.77165354 0.75862069 0.75384615 0.75384615
 0.7578125  0.7768595  0.74418605 0.75193798]

mean value: 0.7570044879996563

key: test_precision
value: [0.52631579 0.64705882 0.55555556 0.61538462 0.57142857 0.83333333
 0.6        0.52380952 0.64705882 0.6       ]

mean value: 0.6119945036044108

key: train_precision
value: [0.59649123 0.62820513 0.64900662 0.62658228 0.62025316 0.62025316
 0.62987013 0.67142857 0.61935484 0.62580645]

mean value: 0.6287251578008078

key: test_recall
value: [0.90909091 1.         0.90909091 0.72727273 1.         0.83333333
 1.         0.91666667 1.         0.81818182]

mean value: 0.9113636363636364

key: train_recall
value: [0.99029126 0.95145631 0.95145631 0.96116505 0.96078431 0.96078431
 0.95098039 0.92156863 0.93203883 0.94174757]

mean value: 0.9522272986864648

key: test_roc_auc
value: [0.57954545 0.75       0.62121212 0.65530303 0.59090909 0.82575758
 0.63636364 0.50378788 0.72727273 0.63636364]

mean value: 0.6526515151515152

key: train_roc_auc
value: [0.65691034 0.69141443 0.71592423 0.69136684 0.68913002 0.68913002
 0.69879117 0.73748334 0.67961165 0.68932039]

mean value: 0.693908242908814

key: test_jcc
value: [0.5        0.64705882 0.52631579 0.5        0.57142857 0.71428571
 0.6        0.5        0.64705882 0.52941176]

mean value: 0.5735559486952676

key: train_jcc
value: [0.59302326 0.60869565 0.62820513 0.61111111 0.60493827 0.60493827
 0.61006289 0.63513514 0.59259259 0.60248447]

mean value: 0.609118678337316

MCC on Blind test: 0.48

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0070138  0.00690508 0.00694633 0.00693202 0.00696373 0.00683618
 0.00693846 0.0069778  0.0068872  0.00699329]

mean value: 0.006939387321472168

key: score_time
value: [0.00785279 0.00785279 0.00780988 0.00785208 0.0078671  0.00781655
 0.00785685 0.00782061 0.0078702  0.00784469]

mean value: 0.007844352722167968

key: test_mcc
value: [ 0.39393939  0.06579517 -0.03816905  0.38932432  0.33946383  0.56490196
  0.33946383  0.21452908  0.54772256  0.36514837]

mean value: 0.318211945518085

key: train_mcc
value: [0.39749865 0.36390677 0.37171873 0.369368   0.40852696 0.36225341
 0.37286188 0.38354703 0.38043802 0.34401398]

mean value: 0.37541334377025193

key: test_accuracy
value: [0.69565217 0.52173913 0.47826087 0.69565217 0.65217391 0.7826087
 0.65217391 0.60869565 0.77272727 0.68181818]

mean value: 0.6541501976284585

key: train_accuracy
value: [0.69756098 0.67804878 0.68292683 0.68292683 0.70243902 0.67804878
 0.68292683 0.68780488 0.68932039 0.66504854]

mean value: 0.6847051858868103

key: test_fscore
value: [0.69565217 0.59259259 0.5        0.66666667 0.73333333 0.8
 0.73333333 0.66666667 0.7826087  0.66666667]

mean value: 0.6837520128824477

key: train_fscore
value: [0.71559633 0.71052632 0.71111111 0.70588235 0.71889401 0.7027027
 0.70852018 0.71428571 0.7037037  0.70638298]

mean value: 0.7097605398121303

key: test_precision
value: [0.66666667 0.5        0.46153846 0.7        0.61111111 0.76923077
 0.61111111 0.6        0.75       0.7       ]

mean value: 0.6369658119658119

key: train_precision
value: [0.67826087 0.648      0.6557377  0.66101695 0.67826087 0.65
 0.65289256 0.6557377  0.67256637 0.62878788]

mean value: 0.6581260910571809

key: test_recall
value: [0.72727273 0.72727273 0.54545455 0.63636364 0.91666667 0.83333333
 0.91666667 0.75       0.81818182 0.63636364]

mean value: 0.7507575757575757

key: train_recall
value: [0.75728155 0.78640777 0.77669903 0.75728155 0.76470588 0.76470588
 0.7745098  0.78431373 0.73786408 0.80582524]

mean value: 0.7709594517418618

key: test_roc_auc
value: [0.6969697  0.53030303 0.48106061 0.69318182 0.64015152 0.78030303
 0.64015152 0.60227273 0.77272727 0.68181818]

mean value: 0.6518939393939394

key: train_roc_auc
value: [0.69726823 0.67751761 0.68246716 0.68256235 0.70274129 0.67846945
 0.68337141 0.68827337 0.68932039 0.66504854]

mean value: 0.6847039786788501

key: test_jcc
value: [0.53333333 0.42105263 0.33333333 0.5        0.57894737 0.66666667
 0.57894737 0.5        0.64285714 0.5       ]

mean value: 0.5255137844611529

key: train_jcc
value: [0.55714286 0.55102041 0.55172414 0.54545455 0.56115108 0.54166667
 0.54861111 0.55555556 0.54285714 0.54605263]

mean value: 0.5501236135597817

MCC on Blind test: 0.47

Accuracy on Blind test: 0.73

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00674987 0.00651455 0.00655532 0.00653768 0.0065999  0.007195
 0.00682569 0.00735092 0.00722623 0.0072701 ]

mean value: 0.006882524490356446

key: score_time
value: [0.01375031 0.0088625  0.00889778 0.01025677 0.00893879 0.00916982
 0.00883722 0.00970483 0.00950575 0.00965738]

mean value: 0.0097581148147583

key: test_mcc
value: [0.21452908 0.39393939 0.33371191 0.48075018 0.39727608 0.12878788
 0.25495628 0.30240737 0.48795004 0.18898224]

mean value: 0.31832904389236555

key: train_mcc
value: [0.63902904 0.59060621 0.60982579 0.63382493 0.67133261 0.60982579
 0.68889027 0.67805807 0.59504408 0.6617241 ]

mean value: 0.6378160892933147

key: test_accuracy
value: [0.60869565 0.69565217 0.65217391 0.73913043 0.69565217 0.56521739
 0.60869565 0.65217391 0.72727273 0.59090909]

mean value: 0.6535573122529644

key: train_accuracy
value: [0.8195122  0.79512195 0.80487805 0.81463415 0.83414634 0.80487805
 0.84390244 0.83902439 0.7961165  0.83009709]

mean value: 0.8182311153208619

key: test_fscore
value: [0.52631579 0.69565217 0.69230769 0.7        0.74074074 0.58333333
 0.52631579 0.69230769 0.66666667 0.52631579]

mean value: 0.6349955667690221

key: train_fscore
value: [0.82125604 0.8        0.80769231 0.80412371 0.82474227 0.8019802
 0.83838384 0.83743842 0.78571429 0.83568075]

mean value: 0.8157011822658049

key: test_precision
value: [0.625      0.66666667 0.6        0.77777778 0.66666667 0.58333333
 0.71428571 0.64285714 0.85714286 0.625     ]

mean value: 0.6758730158730158

key: train_precision
value: [0.81730769 0.78504673 0.8        0.85714286 0.86956522 0.81
 0.86458333 0.84158416 0.82795699 0.80909091]

mean value: 0.8282277885901212

key: test_recall
value: [0.45454545 0.72727273 0.81818182 0.63636364 0.83333333 0.58333333
 0.41666667 0.75       0.54545455 0.45454545]

mean value: 0.621969696969697

key: train_recall
value: [0.82524272 0.81553398 0.81553398 0.75728155 0.78431373 0.79411765
 0.81372549 0.83333333 0.74757282 0.86407767]

mean value: 0.8050732914525033

key: test_roc_auc
value: [0.60227273 0.6969697  0.65909091 0.73484848 0.68939394 0.56439394
 0.61742424 0.64772727 0.72727273 0.59090909]

mean value: 0.6530303030303031

key: train_roc_auc
value: [0.8194841  0.79502189 0.80482581 0.81491529 0.83390444 0.80482581
 0.84375595 0.83899676 0.7961165  0.83009709]

mean value: 0.8181943651246907

key: test_jcc
value: [0.35714286 0.53333333 0.52941176 0.53846154 0.58823529 0.41176471
 0.35714286 0.52941176 0.5        0.35714286]

mean value: 0.47020469726352077

key: train_jcc
value: [0.69672131 0.66666667 0.67741935 0.67241379 0.70175439 0.66942149
 0.72173913 0.72033898 0.64705882 0.71774194]

mean value: 0.6891275872151366

MCC on Blind test: 0.3

Accuracy on Blind test: 0.65

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00949764 0.00926423 0.00878119 0.00982475 0.00883627 0.00911903
 0.00922585 0.00873303 0.00967121 0.00942421]

mean value: 0.00923774242401123

key: score_time
value: [0.00828195 0.00830245 0.00834298 0.00822258 0.00852466 0.00888777
 0.00881886 0.00821614 0.00887179 0.00847054]

mean value: 0.008493971824645997

key: test_mcc
value: [0.47727273 0.48856385 0.3030303  0.48075018 0.58002308 0.76764947
 0.74047959 0.58002308 0.68313005 0.09090909]

mean value: 0.5191831414788004

key: train_mcc
value: [0.76709739 0.73662669 0.77590489 0.75693529 0.75611614 0.69845687
 0.70790488 0.71798813 0.73789886 0.74813718]

mean value: 0.7403066312362871

key: test_accuracy
value: [0.73913043 0.73913043 0.65217391 0.73913043 0.7826087  0.86956522
 0.86956522 0.7826087  0.81818182 0.54545455]

mean value: 0.7537549407114624

key: train_accuracy
value: [0.88292683 0.86829268 0.88780488 0.87804878 0.87804878 0.84878049
 0.85365854 0.85853659 0.86893204 0.87378641]

mean value: 0.8698816007577551

key: test_fscore
value: [0.72727273 0.75       0.63636364 0.7        0.81481481 0.85714286
 0.88       0.81481481 0.77777778 0.54545455]

mean value: 0.7503641173641173

key: train_fscore
value: [0.88679245 0.86829268 0.88995215 0.88151659 0.87684729 0.85167464
 0.85576923 0.86124402 0.86829268 0.87619048]

mean value: 0.8716572217358802

key: test_precision
value: [0.72727273 0.69230769 0.63636364 0.77777778 0.73333333 1.
 0.84615385 0.73333333 1.         0.54545455]

mean value: 0.7691996891996892

key: train_precision
value: [0.86238532 0.87254902 0.87735849 0.86111111 0.88118812 0.8317757
 0.83962264 0.8411215  0.87254902 0.85981308]

mean value: 0.8599474002688899

key: test_recall
value: [0.72727273 0.81818182 0.63636364 0.63636364 0.91666667 0.75
 0.91666667 0.91666667 0.63636364 0.54545455]

mean value: 0.75

key: train_recall
value: [0.91262136 0.86407767 0.90291262 0.90291262 0.87254902 0.87254902
 0.87254902 0.88235294 0.86407767 0.89320388]

mean value: 0.8839805825242718

key: test_roc_auc
value: [0.73863636 0.74242424 0.65151515 0.73484848 0.77651515 0.875
 0.86742424 0.77651515 0.81818182 0.54545455]

mean value: 0.7526515151515151

key: train_roc_auc
value: [0.88278127 0.86831334 0.88773082 0.8779269  0.87802208 0.84889587
 0.85375024 0.8586522  0.86893204 0.87378641]

mean value: 0.8698791166952218

key: test_jcc
value: [0.57142857 0.6        0.46666667 0.53846154 0.6875     0.75
 0.78571429 0.6875     0.63636364 0.375     ]

mean value: 0.6098634698634698

key: train_jcc
value: [0.79661017 0.76724138 0.80172414 0.78813559 0.78070175 0.74166667
 0.74789916 0.75630252 0.76724138 0.77966102]

mean value: 0.7727183777937642

MCC on Blind test: 0.45

Accuracy on Blind test: 0.72

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.60935068 0.83438659 0.7129097  0.18024588 0.55924106 0.85111785
 0.75755429 0.51584601 0.85538912 0.78311491]

mean value: 0.6659156084060669

key: score_time
value: [0.01095915 0.01519394 0.01093698 0.0109117  0.01094437 0.01403475
 0.02076578 0.01094198 0.01319242 0.01348639]

mean value: 0.013136744499206543

key: test_mcc
value: [0.62050523 0.58930667 0.21969697 0.69084928 0.65151515 0.82575758
 0.83743579 0.50168817 0.91287093 0.48795004]

mean value: 0.6337575793672167

key: train_mcc
value: [0.88361919 0.86356283 0.91435567 0.5161037  0.84404459 0.8742382
 0.88447331 0.78922439 0.88349515 0.86407767]

mean value: 0.831719470643712

key: test_accuracy
value: [0.7826087  0.7826087  0.60869565 0.82608696 0.82608696 0.91304348
 0.91304348 0.73913043 0.95454545 0.72727273]

mean value: 0.8073122529644269

key: train_accuracy
value: [0.94146341 0.93170732 0.95609756 0.75121951 0.92195122 0.93658537
 0.94146341 0.89268293 0.94174757 0.93203883]

mean value: 0.9146957139474308

key: test_fscore
value: [0.70588235 0.8        0.60869565 0.77777778 0.83333333 0.91666667
 0.92307692 0.78571429 0.95238095 0.66666667]

mean value: 0.7970194610731695

key: train_fscore
value: [0.94059406 0.93269231 0.95477387 0.72131148 0.92079208 0.93779904
 0.94285714 0.89719626 0.94174757 0.93203883]

mean value: 0.9121802646431316

key: test_precision
value: [1.         0.71428571 0.58333333 1.         0.83333333 0.91666667
 0.85714286 0.6875     1.         0.85714286]

mean value: 0.8449404761904762

key: train_precision
value: [0.95959596 0.92380952 0.98958333 0.825      0.93       0.91588785
 0.91666667 0.85714286 0.94174757 0.93203883]

mean value: 0.9191472598782621

key: test_recall
value: [0.54545455 0.90909091 0.63636364 0.63636364 0.83333333 0.91666667
 1.         0.91666667 0.90909091 0.54545455]

mean value: 0.7848484848484848

key: train_recall
value: [0.9223301  0.94174757 0.9223301  0.6407767  0.91176471 0.96078431
 0.97058824 0.94117647 0.94174757 0.93203883]

mean value: 0.9085284599276604

key: test_roc_auc
value: [0.77272727 0.78787879 0.60984848 0.81818182 0.82575758 0.91287879
 0.90909091 0.73106061 0.95454545 0.72727273]

mean value: 0.8049242424242424

key: train_roc_auc
value: [0.94155721 0.9316581  0.95626309 0.7517609  0.92190177 0.93670284
 0.9416048  0.89291833 0.94174757 0.93203883]

mean value: 0.9148153436131735

key: test_jcc
value: [0.54545455 0.66666667 0.4375     0.63636364 0.71428571 0.84615385
 0.85714286 0.64705882 0.90909091 0.5       ]

mean value: 0.6759716998687587

key: train_jcc
value: [0.88785047 0.87387387 0.91346154 0.56410256 0.85321101 0.88288288
 0.89189189 0.81355932 0.88990826 0.87272727]

mean value: 0.8443469079318687

MCC on Blind test: 0.32

Accuracy on Blind test: 0.66

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01073289 0.01022744 0.00823569 0.00801015 0.00778437 0.00778627
 0.0077095  0.00775599 0.00761223 0.00784159]

mean value: 0.00836961269378662

key: score_time
value: [0.01050305 0.00813007 0.00807691 0.00800776 0.00780082 0.00781107
 0.00772762 0.00769615 0.00769758 0.00771093]

mean value: 0.008116197586059571

key: test_mcc
value: [0.91666667 0.58930667 0.76277007 0.83743579 0.82575758 0.83971912
 1.         0.91666667 0.81818182 0.73029674]

mean value: 0.8236801120713376

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95652174 0.7826087  0.86956522 0.91304348 0.91304348 0.91304348
 1.         0.95652174 0.90909091 0.86363636]

mean value: 0.9077075098814229

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95652174 0.8        0.84210526 0.9        0.91666667 0.90909091
 1.         0.95652174 0.90909091 0.85714286]

mean value: 0.9047140083410107

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91666667 0.71428571 1.         1.         0.91666667 1.
 1.         1.         0.90909091 0.9       ]

mean value: 0.9356709956709957

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.72727273 0.81818182 0.91666667 0.83333333
 1.         0.91666667 0.90909091 0.81818182]

mean value: 0.8848484848484849

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95833333 0.78787879 0.86363636 0.90909091 0.91287879 0.91666667
 1.         0.95833333 0.90909091 0.86363636]

mean value: 0.9079545454545455

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91666667 0.66666667 0.72727273 0.81818182 0.84615385 0.83333333
 1.         0.91666667 0.83333333 0.75      ]

mean value: 0.8308275058275059

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.51

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.0877099  0.08413386 0.08387136 0.08425379 0.08437729 0.0846684
 0.08445048 0.08430481 0.08534908 0.08467293]

mean value: 0.08477919101715088

key: score_time
value: [0.01659155 0.01651287 0.01647377 0.01637912 0.01636243 0.01650667
 0.01641607 0.01648188 0.01631093 0.01695395]

mean value: 0.016498923301696777

key: test_mcc
value: [0.48075018 0.76764947 0.66414149 0.91605722 0.74047959 1.
 1.         1.         0.81818182 0.81818182]

mean value: 0.8205441588214263

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.86956522 0.82608696 0.95652174 0.86956522 1.
 1.         1.         0.90909091 0.90909091]

mean value: 0.9079051383399209

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7        0.88       0.83333333 0.95238095 0.88       1.
 1.         1.         0.90909091 0.90909091]

mean value: 0.9063896103896104

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.78571429 0.76923077 1.         0.84615385 1.
 1.         1.         0.90909091 0.90909091]

mean value: 0.8997058497058497

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 1.         0.90909091 0.90909091 0.91666667 1.
 1.         1.         0.90909091 0.90909091]

mean value: 0.918939393939394

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73484848 0.875      0.82954545 0.95454545 0.86742424 1.
 1.         1.         0.90909091 0.90909091]

mean value: 0.9079545454545455

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.53846154 0.78571429 0.71428571 0.90909091 0.78571429 1.
 1.         1.         0.83333333 0.83333333]

mean value: 0.83999333999334

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.33

Accuracy on Blind test: 0.63

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00697589 0.00684834 0.00684047 0.00689197 0.00688601 0.00683427
 0.00684476 0.00691366 0.00708032 0.00721502]

mean value: 0.006933069229125977

key: score_time
value: [0.00782681 0.00778484 0.00773478 0.00783086 0.0077734  0.00773144
 0.00774217 0.00777936 0.00781941 0.00777721]

mean value: 0.007780027389526367

key: test_mcc
value: [0.39393939 0.66414149 0.03816905 0.50168817 0.47727273 0.44411739
 0.83971912 0.31252706 0.68313005 0.36514837]

mean value: 0.4719852822351723

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.69565217 0.82608696 0.52173913 0.73913043 0.73913043 0.69565217
 0.91304348 0.65217391 0.81818182 0.68181818]

mean value: 0.7282608695652174

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.69565217 0.83333333 0.47619048 0.66666667 0.75       0.63157895
 0.90909091 0.71428571 0.77777778 0.69565217]

mean value: 0.7150228172539385

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.76923077 0.5        0.85714286 0.75       0.85714286
 1.         0.625      1.         0.66666667]

mean value: 0.7691849816849816

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.90909091 0.45454545 0.54545455 0.75       0.5
 0.83333333 0.83333333 0.63636364 0.72727273]

mean value: 0.6916666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.6969697  0.82954545 0.51893939 0.73106061 0.73863636 0.70454545
 0.91666667 0.64393939 0.81818182 0.68181818]

mean value: 0.728030303030303

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.53333333 0.71428571 0.3125     0.5        0.6        0.46153846
 0.83333333 0.55555556 0.63636364 0.53333333]

mean value: 0.5680243367743367

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.09106922 1.09859252 1.08210254 1.08648729 1.08477712 1.10239196
 1.09634829 1.08665371 1.15152287 1.16227579]

mean value: 1.1042221307754516

key: score_time
value: [0.09068799 0.14433622 0.09434557 0.09718037 0.09297609 0.09529161
 0.09361553 0.08816409 0.09734464 0.0969758 ]

mean value: 0.09909179210662841

key: test_mcc
value: [0.74047959 0.6992059  0.66414149 1.         0.91605722 0.91666667
 1.         1.         0.91287093 0.91287093]

mean value: 0.8762292726696667

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.82608696 0.82608696 1.         0.95652174 0.95652174
 1.         1.         0.95454545 0.95454545]

mean value: 0.9343873517786562

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.84615385 0.83333333 1.         0.96       0.95652174
 1.         1.         0.95652174 0.95652174]

mean value: 0.9366195254021341

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.73333333 0.76923077 1.         0.92307692 1.
 1.         1.         0.91666667 0.91666667]

mean value: 0.9158974358974359

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.90909091 1.         1.         0.91666667
 1.         1.         1.         1.        ]

mean value: 0.9643939393939394

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.83333333 0.82954545 1.         0.95454545 0.95833333
 1.         1.         0.95454545 0.95454545]

mean value: 0.9352272727272727

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.73333333 0.71428571 1.         0.92307692 0.91666667
 1.         1.         0.91666667 0.91666667]

mean value: 0.8870695970695971

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.55

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.84751892 0.83743048 0.99197769 0.88181758 0.90272188 0.897192
 0.83777547 0.89692569 0.9466176  0.89896107]

mean value: 0.893893837928772

key: score_time
value: [0.18853641 0.20011663 0.1846509  0.19059825 0.2371254  0.21251607
 0.20379901 0.20380235 0.17668462 0.19526935]

mean value: 0.19930989742279054

key: test_mcc
value: [0.65909298 0.6992059  0.58930667 0.76277007 0.83743579 0.83971912
 0.91605722 0.91605722 0.91287093 0.91287093]

mean value: 0.8045386839254043

key: train_mcc
value: [0.90516294 0.94216887 0.93386476 0.91325992 0.92355447 0.90523324
 0.90523324 0.88720829 0.92389898 0.91473626]

mean value: 0.915432096195379

key: test_accuracy
value: [0.82608696 0.82608696 0.7826087  0.86956522 0.91304348 0.91304348
 0.95652174 0.95652174 0.95454545 0.95454545]

mean value: 0.8952569169960475

key: train_accuracy
value: [0.95121951 0.97073171 0.96585366 0.95609756 0.96097561 0.95121951
 0.95121951 0.94146341 0.96116505 0.95631068]

mean value: 0.9566256215960217

key: test_fscore
value: [0.8        0.84615385 0.8        0.84210526 0.92307692 0.90909091
 0.96       0.96       0.95652174 0.95652174]

mean value: 0.8953470419740442

key: train_fscore
value: [0.95327103 0.97142857 0.96713615 0.95734597 0.96190476 0.95283019
 0.95283019 0.94392523 0.96226415 0.95774648]

mean value: 0.9580682723989425

key: test_precision
value: [0.88888889 0.73333333 0.71428571 1.         0.85714286 1.
 0.92307692 0.92307692 0.91666667 0.91666667]

mean value: 0.8873137973137973

key: train_precision
value: [0.91891892 0.95327103 0.93636364 0.93518519 0.93518519 0.91818182
 0.91818182 0.90178571 0.93577982 0.92727273]

mean value: 0.9280125848126148

key: test_recall
value: [0.72727273 1.         0.90909091 0.72727273 1.         0.83333333
 1.         1.         1.         1.        ]

mean value: 0.9196969696969697

key: train_recall
value: [0.99029126 0.99029126 1.         0.98058252 0.99019608 0.99019608
 0.99019608 0.99019608 0.99029126 0.99029126]

mean value: 0.9902531886541024

key: test_roc_auc
value: [0.8219697  0.83333333 0.78787879 0.86363636 0.90909091 0.91666667
 0.95454545 0.95454545 0.95454545 0.95454545]

mean value: 0.8950757575757575

key: train_roc_auc
value: [0.95102798 0.97063583 0.96568627 0.95597754 0.96111746 0.95140872
 0.95140872 0.94169998 0.96116505 0.95631068]

mean value: 0.9566438225775747

key: test_jcc
value: [0.66666667 0.73333333 0.66666667 0.72727273 0.85714286 0.83333333
 0.92307692 0.92307692 0.91666667 0.91666667]

mean value: 0.8163902763902764

key: train_jcc
value: [0.91071429 0.94444444 0.93636364 0.91818182 0.9266055  0.90990991
 0.90990991 0.89380531 0.92727273 0.91891892]

mean value: 0.919612646503732

MCC on Blind test: 0.34

Accuracy on Blind test: 0.64

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02035141 0.00765848 0.00785589 0.00786734 0.00790954 0.0078063
 0.00794077 0.00780129 0.00784397 0.00801635]

mean value: 0.00910513401031494

key: score_time
value: [0.01214242 0.00866485 0.00880075 0.00867152 0.00868607 0.00868559
 0.0086391  0.00873423 0.00867414 0.00874686]

mean value: 0.009044551849365234

key: test_mcc
value: [ 0.39393939  0.06579517 -0.03816905  0.38932432  0.33946383  0.56490196
  0.33946383  0.21452908  0.54772256  0.36514837]

mean value: 0.318211945518085

key: train_mcc
value: [0.39749865 0.36390677 0.37171873 0.369368   0.40852696 0.36225341
 0.37286188 0.38354703 0.38043802 0.34401398]

mean value: 0.37541334377025193

key: test_accuracy
value: [0.69565217 0.52173913 0.47826087 0.69565217 0.65217391 0.7826087
 0.65217391 0.60869565 0.77272727 0.68181818]

mean value: 0.6541501976284585

key: train_accuracy
value: [0.69756098 0.67804878 0.68292683 0.68292683 0.70243902 0.67804878
 0.68292683 0.68780488 0.68932039 0.66504854]

mean value: 0.6847051858868103

key: test_fscore
value: [0.69565217 0.59259259 0.5        0.66666667 0.73333333 0.8
 0.73333333 0.66666667 0.7826087  0.66666667]

mean value: 0.6837520128824477

key: train_fscore
value: [0.71559633 0.71052632 0.71111111 0.70588235 0.71889401 0.7027027
 0.70852018 0.71428571 0.7037037  0.70638298]

mean value: 0.7097605398121303

key: test_precision
value: [0.66666667 0.5        0.46153846 0.7        0.61111111 0.76923077
 0.61111111 0.6        0.75       0.7       ]

mean value: 0.6369658119658119

key: train_precision
value: [0.67826087 0.648      0.6557377  0.66101695 0.67826087 0.65
 0.65289256 0.6557377  0.67256637 0.62878788]

mean value: 0.6581260910571809

key: test_recall
value: [0.72727273 0.72727273 0.54545455 0.63636364 0.91666667 0.83333333
 0.91666667 0.75       0.81818182 0.63636364]

mean value: 0.7507575757575757

key: train_recall
value: [0.75728155 0.78640777 0.77669903 0.75728155 0.76470588 0.76470588
 0.7745098  0.78431373 0.73786408 0.80582524]

mean value: 0.7709594517418618

key: test_roc_auc
value: [0.6969697  0.53030303 0.48106061 0.69318182 0.64015152 0.78030303
 0.64015152 0.60227273 0.77272727 0.68181818]

mean value: 0.6518939393939394

key: train_roc_auc
value: [0.69726823 0.67751761 0.68246716 0.68256235 0.70274129 0.67846945
 0.68337141 0.68827337 0.68932039 0.66504854]

mean value: 0.6847039786788501

key: test_jcc
value: [0.53333333 0.42105263 0.33333333 0.5        0.57894737 0.66666667
 0.57894737 0.5        0.64285714 0.5       ]

mean value: 0.5255137844611529

key: train_jcc
value: [0.55714286 0.55102041 0.55172414 0.54545455 0.56115108 0.54166667
 0.54861111 0.55555556 0.54285714 0.54605263]

mean value: 0.5501236135597817

MCC on Blind test: 0.47

Accuracy on Blind test: 0.73

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09073043 0.0420742  0.04870391 0.04456186 0.04261518 0.04289222
 0.04772949 0.04729652 0.04701734 0.04742146]

mean value: 0.05010426044464111

key: score_time
value: [0.00988626 0.01027012 0.01072621 0.00994897 0.00984526 0.0098393
 0.01010942 0.01019359 0.01010847 0.01034665]

mean value: 0.010127425193786621

key: test_mcc
value: [0.82575758 0.6992059  0.74242424 0.91605722 0.74242424 0.91666667
 0.91605722 1.         0.91287093 0.81818182]

mean value: 0.8489645823067301

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.82608696 0.86956522 0.95652174 0.86956522 0.95652174
 0.95652174 1.         0.95454545 0.90909091]

mean value: 0.9211462450592885

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.84615385 0.86956522 0.95238095 0.86956522 0.95652174
 0.96       1.         0.95652174 0.90909091]

mean value: 0.9228890529760094

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 0.73333333 0.83333333 1.         0.90909091 1.
 0.92307692 1.         0.91666667 0.90909091]

mean value: 0.9133682983682984

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         0.90909091 0.90909091 0.83333333 0.91666667
 1.         1.         1.         0.90909091]

mean value: 0.9386363636363636

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91287879 0.83333333 0.87121212 0.95454545 0.87121212 0.95833333
 0.95454545 1.         0.95454545 0.90909091]

mean value: 0.921969696969697

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.73333333 0.76923077 0.90909091 0.76923077 0.91666667
 0.92307692 1.         0.91666667 0.83333333]

mean value: 0.8603962703962704

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.52

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01050711 0.02742529 0.02827191 0.03134918 0.03139067 0.03144336
 0.03741717 0.03171349 0.02641296 0.03151488]

mean value: 0.028744602203369142

key: score_time
value: [0.01013732 0.0210259  0.01963973 0.01852012 0.0103941  0.02017403
 0.01831055 0.01890373 0.02068543 0.01523781]

mean value: 0.017302870750427246

key: test_mcc
value: [0.69084928 0.65151515 0.39393939 0.91605722 0.65151515 0.91666667
 0.74047959 0.91605722 0.75592895 0.91287093]

mean value: 0.7545879558265087

key: train_mcc
value: [0.86404384 0.86356283 0.9024367  0.83417421 0.87321531 0.83418999
 0.85370265 0.86358877 0.86407767 0.8544092 ]

mean value: 0.8607401153973653

key: test_accuracy
value: [0.82608696 0.82608696 0.69565217 0.95652174 0.82608696 0.95652174
 0.86956522 0.95652174 0.86363636 0.95454545]

mean value: 0.8731225296442688

key: train_accuracy
value: [0.93170732 0.93170732 0.95121951 0.91707317 0.93658537 0.91707317
 0.92682927 0.93170732 0.93203883 0.92718447]

mean value: 0.9303125739995264

key: test_fscore
value: [0.77777778 0.81818182 0.69565217 0.95238095 0.83333333 0.95652174
 0.88       0.96       0.84210526 0.95238095]

mean value: 0.8668334010256207

key: train_fscore
value: [0.93333333 0.93269231 0.95145631 0.9178744  0.93658537 0.91707317
 0.92682927 0.93203883 0.93203883 0.92682927]

mean value: 0.9306751090914163

key: test_precision
value: [1.         0.81818182 0.66666667 1.         0.83333333 1.
 0.84615385 0.92307692 1.         1.        ]

mean value: 0.9087412587412588

key: train_precision
value: [0.91588785 0.92380952 0.95145631 0.91346154 0.93203883 0.91262136
 0.9223301  0.92307692 0.93203883 0.93137255]

mean value: 0.9258093821728087

key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.90909091 0.83333333 0.91666667
 0.91666667 1.         0.72727273 0.90909091]

mean value: 0.8393939393939394

key: train_recall
value: [0.95145631 0.94174757 0.95145631 0.9223301  0.94117647 0.92156863
 0.93137255 0.94117647 0.93203883 0.9223301 ]

mean value: 0.935665334094803

key: test_roc_auc
value: [0.81818182 0.82575758 0.6969697  0.95454545 0.82575758 0.95833333
 0.86742424 0.95454545 0.86363636 0.95454545]

mean value: 0.871969696969697

key: train_roc_auc
value: [0.93161051 0.9316581  0.95121835 0.9170474  0.93660765 0.91709499
 0.92685132 0.93175328 0.93203883 0.92718447]

mean value: 0.9303064915286503

key: test_jcc
value: [0.63636364 0.69230769 0.53333333 0.90909091 0.71428571 0.91666667
 0.78571429 0.92307692 0.72727273 0.90909091]

mean value: 0.7747202797202797

key: train_jcc
value: [0.875      0.87387387 0.90740741 0.84821429 0.88073394 0.84684685
 0.86363636 0.87272727 0.87272727 0.86363636]

mean value: 0.8704803631523815

MCC on Blind test: 0.1

Accuracy on Blind test: 0.55

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00954151 0.00753498 0.0070641  0.00683141 0.00684738 0.00718474
 0.00700259 0.00671387 0.00674605 0.00678587]

mean value: 0.007225251197814942

key: score_time
value: [0.00951552 0.00836635 0.0077734  0.00772834 0.00780702 0.00820851
 0.00769925 0.00763559 0.00767422 0.00770903]

mean value: 0.008011722564697265

key: test_mcc
value: [ 0.38932432  0.58930667  0.23262105  0.38932432  0.38932432  0.66414149
  0.56490196 -0.06579517  0.29277002  0.36514837]

mean value: 0.3811067348212412

key: train_mcc
value: [0.48336719 0.44537263 0.49337247 0.42577585 0.49527272 0.41611143
 0.48421652 0.47567594 0.44763689 0.43896694]

mean value: 0.4605768597645512

key: test_accuracy
value: [0.69565217 0.7826087  0.60869565 0.69565217 0.69565217 0.82608696
 0.7826087  0.47826087 0.63636364 0.68181818]

mean value: 0.6883399209486166

key: train_accuracy
value: [0.74146341 0.72195122 0.74634146 0.71219512 0.74634146 0.70731707
 0.74146341 0.73658537 0.72330097 0.7184466 ]

mean value: 0.7295406109400899

key: test_fscore
value: [0.66666667 0.8        0.64       0.66666667 0.72       0.81818182
 0.8        0.57142857 0.69230769 0.66666667]

mean value: 0.7041918081918082

key: train_fscore
value: [0.74881517 0.73488372 0.75471698 0.7255814  0.75700935 0.71698113
 0.74881517 0.74766355 0.73239437 0.73148148]

mean value: 0.7398342306115098

key: test_precision
value: [0.7        0.71428571 0.57142857 0.7        0.69230769 0.9
 0.76923077 0.5        0.6        0.7       ]

mean value: 0.6847252747252747

key: train_precision
value: [0.73148148 0.70535714 0.73394495 0.69642857 0.72321429 0.69090909
 0.72477064 0.71428571 0.70909091 0.69911504]

mean value: 0.7128597836345258

key: test_recall
value: [0.63636364 0.90909091 0.72727273 0.63636364 0.75       0.75
 0.83333333 0.66666667 0.81818182 0.63636364]

mean value: 0.7363636363636363

key: train_recall
value: [0.76699029 0.76699029 0.77669903 0.75728155 0.79411765 0.74509804
 0.7745098  0.78431373 0.75728155 0.76699029]

mean value: 0.7690272225395012

key: test_roc_auc
value: [0.69318182 0.78787879 0.61363636 0.69318182 0.69318182 0.82954545
 0.78030303 0.46969697 0.63636364 0.68181818]

mean value: 0.6878787878787879

key: train_roc_auc
value: [0.74133828 0.72173044 0.74619265 0.71197411 0.74657339 0.70750048
 0.74162383 0.73681706 0.72330097 0.7184466 ]

mean value: 0.7295497810774796

key: test_jcc
value: [0.5        0.66666667 0.47058824 0.5        0.5625     0.69230769
 0.66666667 0.4        0.52941176 0.5       ]

mean value: 0.5488141025641026

key: train_jcc
value: [0.59848485 0.58088235 0.60606061 0.56934307 0.60902256 0.55882353
 0.59848485 0.59701493 0.57777778 0.57664234]

mean value: 0.5872536846384988

MCC on Blind test: 0.43

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00825429 0.01047587 0.01012659 0.01016808 0.01010203 0.00998354
 0.01053905 0.01024556 0.01085734 0.01041651]

mean value: 0.010116887092590333

key: score_time
value: [0.00801587 0.01022434 0.01021957 0.01031947 0.01033974 0.01022935
 0.01024055 0.01027536 0.01019835 0.01020384]

mean value: 0.010026645660400391

key: test_mcc
value: [0.65909298 0.42228828 0.37057951 0.76764947 0.76277007 0.74242424
 0.55048188 0.40451992 0.91287093 0.75592895]

mean value: 0.6348606236325591

key: train_mcc
value: [0.85690497 0.73153872 0.87320324 0.70302948 0.80930285 0.8300002
 0.72436632 0.58762141 0.80469539 0.81866523]

mean value: 0.7739327829063736

key: test_accuracy
value: [0.82608696 0.69565217 0.65217391 0.86956522 0.86956522 0.86956522
 0.73913043 0.65217391 0.95454545 0.86363636]

mean value: 0.799209486166008

key: train_accuracy
value: [0.92682927 0.85365854 0.93658537 0.83414634 0.89756098 0.91219512
 0.84390244 0.75609756 0.89805825 0.90776699]

mean value: 0.8766800852474544

key: test_fscore
value: [0.8        0.58823529 0.71428571 0.88       0.88888889 0.86956522
 0.8        0.75       0.95238095 0.88      ]

mean value: 0.8123356067064507

key: train_fscore
value: [0.93023256 0.83333333 0.93719807 0.85714286 0.9058296  0.90625
 0.86440678 0.80314961 0.89005236 0.91162791]

mean value: 0.8839223061619048

key: test_precision
value: [0.88888889 0.83333333 0.58823529 0.78571429 0.8        0.90909091
 0.66666667 0.6        1.         0.78571429]

mean value: 0.7857643663526016

key: train_precision
value: [0.89285714 0.97402597 0.93269231 0.75555556 0.83471074 0.96666667
 0.76119403 0.67105263 0.96590909 0.875     ]

mean value: 0.8629664142938084

key: test_recall
value: [0.72727273 0.45454545 0.90909091 1.         1.         0.83333333
 1.         1.         0.90909091 1.        ]

mean value: 0.8833333333333333

key: train_recall
value: [0.97087379 0.72815534 0.94174757 0.99029126 0.99019608 0.85294118
 1.         1.         0.82524272 0.95145631]

mean value: 0.9250904245193223

key: test_roc_auc
value: [0.8219697  0.68560606 0.66287879 0.875      0.86363636 0.87121212
 0.72727273 0.63636364 0.95454545 0.86363636]

mean value: 0.7962121212121211

key: train_roc_auc
value: [0.92661336 0.85427375 0.93656006 0.83338093 0.89801066 0.91190748
 0.84466019 0.75728155 0.89805825 0.90776699]

mean value: 0.8768513230534933

key: test_jcc
value: [0.66666667 0.41666667 0.55555556 0.78571429 0.8        0.76923077
 0.66666667 0.6        0.90909091 0.78571429]

mean value: 0.6955305805305805

key: train_jcc
value: [0.86956522 0.71428571 0.88181818 0.75       0.82786885 0.82857143
 0.76119403 0.67105263 0.80188679 0.83760684]

mean value: 0.7943849686015007

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00993443 0.01050615 0.00986814 0.00974298 0.01031899 0.01094747
 0.00993729 0.01018286 0.01007271 0.01059127]

mean value: 0.010210227966308594

key: score_time
value: [0.01083398 0.01023293 0.01024818 0.01021266 0.01024604 0.01035571
 0.01024151 0.01023602 0.01023817 0.01028705]

mean value: 0.01031322479248047

key: test_mcc
value: [0.56490196 0.58930667 0.48075018 1.         0.47923384 0.82575758
 0.76277007 0.58930667 0.83205029 0.63636364]

mean value: 0.6760440880009018

key: train_mcc
value: [0.78910244 0.834498   0.74442173 0.81555702 0.74362503 0.82697375
 0.85470694 0.87321531 0.82432211 0.83815726]

mean value: 0.814457959189338

key: test_accuracy
value: [0.7826087  0.7826087  0.73913043 1.         0.69565217 0.91304348
 0.86956522 0.7826087  0.90909091 0.81818182]

mean value: 0.8292490118577075

key: train_accuracy
value: [0.88780488 0.91219512 0.86341463 0.90731707 0.85853659 0.91219512
 0.92682927 0.93658537 0.90776699 0.91747573]

mean value: 0.903012076722709

key: test_fscore
value: [0.76190476 0.8        0.7        1.         0.77419355 0.91666667
 0.88888889 0.76190476 0.91666667 0.81818182]

mean value: 0.8338407112600661

key: train_fscore
value: [0.89777778 0.91891892 0.84782609 0.90995261 0.87445887 0.91509434
 0.92822967 0.93658537 0.91402715 0.91370558]

mean value: 0.9056576368372846

key: test_precision
value: [0.8        0.71428571 0.77777778 1.         0.63157895 0.91666667
 0.8        0.88888889 0.84615385 0.81818182]

mean value: 0.8193533659323133

key: train_precision
value: [0.82786885 0.85714286 0.96296296 0.88888889 0.78294574 0.88181818
 0.90654206 0.93203883 0.8559322  0.95744681]

mean value: 0.8853587382632707

key: test_recall
value: [0.72727273 0.90909091 0.63636364 1.         1.         0.91666667
 1.         0.66666667 1.         0.81818182]

mean value: 0.8674242424242424

key: train_recall
value: [0.98058252 0.99029126 0.75728155 0.93203883 0.99019608 0.95098039
 0.95098039 0.94117647 0.98058252 0.87378641]

mean value: 0.934789644012945

key: test_roc_auc
value: [0.78030303 0.78787879 0.73484848 1.         0.68181818 0.91287879
 0.86363636 0.78787879 0.90909091 0.81818182]

mean value: 0.8276515151515151

key: train_roc_auc
value: [0.88735009 0.9118123  0.86393489 0.90719589 0.85917571 0.9123834
 0.92694651 0.93660765 0.90776699 0.91747573]

mean value: 0.903064915286503

key: test_jcc
value: [0.61538462 0.66666667 0.53846154 1.         0.63157895 0.84615385
 0.8        0.61538462 0.84615385 0.69230769]

mean value: 0.7252091767881241

key: train_jcc
value: [0.81451613 0.85       0.73584906 0.83478261 0.77692308 0.84347826
 0.86607143 0.88073394 0.84166667 0.8411215 ]

mean value: 0.8285142667643652

MCC on Blind test: 0.15

Accuracy on Blind test: 0.57

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.09048128 0.06841111 0.06838012 0.06843686 0.06842685 0.06856799
 0.06842709 0.06839943 0.06843877 0.06855869]

mean value: 0.07065281867980958

key: score_time
value: [0.01429415 0.01409531 0.0138483  0.01383138 0.01385355 0.01397872
 0.01384473 0.01389337 0.01391649 0.01385307]

mean value: 0.013940906524658203

key: test_mcc
value: [0.82575758 0.58930667 0.74242424 0.83743579 0.74242424 0.91666667
 0.91605722 0.91666667 0.73029674 0.81818182]

mean value: 0.8035217636234444

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.7826087  0.86956522 0.91304348 0.86956522 0.95652174
 0.95652174 0.95652174 0.86363636 0.90909091]

mean value: 0.8990118577075099

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.8        0.86956522 0.9        0.86956522 0.95652174
 0.96       0.95652174 0.85714286 0.90909091]

mean value: 0.8987498588368154

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 0.71428571 0.83333333 1.         0.90909091 1.
 0.92307692 1.         0.9        0.90909091]

mean value: 0.9097968697968698

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667
 1.         0.91666667 0.81818182 0.90909091]

mean value: 0.8939393939393939

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91287879 0.78787879 0.87121212 0.90909091 0.87121212 0.95833333
 0.95454545 0.95833333 0.86363636 0.90909091]

mean value: 0.8996212121212122

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.66666667 0.76923077 0.81818182 0.76923077 0.91666667
 0.92307692 0.91666667 0.75       0.83333333]

mean value: 0.8196386946386947

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.51

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03047633 0.02374792 0.02200127 0.03589177 0.03106284 0.02499914
 0.04321909 0.03124619 0.04236913 0.03466034]

mean value: 0.0319674015045166

key: score_time
value: [0.02318501 0.01713753 0.02661037 0.02022576 0.02133465 0.01644993
 0.08056641 0.02024651 0.02388358 0.01804686]

mean value: 0.02676866054534912

key: test_mcc
value: [0.91605722 0.58930667 0.65151515 1.         0.74242424 0.91666667
 0.91605722 0.91666667 0.91287093 0.91287093]

mean value: 0.8474435701866356

key: train_mcc
value: [0.99029126 0.98048734 1.         1.         0.99029034 1.
 0.99029034 0.99029034 0.9613463  0.98076744]

mean value: 0.9883763362991548

key: test_accuracy
value: [0.95652174 0.7826087  0.82608696 1.         0.86956522 0.95652174
 0.95652174 0.95652174 0.95454545 0.95454545]

mean value: 0.9213438735177866

key: train_accuracy
value: [0.99512195 0.9902439  1.         1.         0.99512195 1.
 0.99512195 0.99512195 0.98058252 0.99029126]

mean value: 0.994160549372484

key: test_fscore
value: [0.95238095 0.8        0.81818182 1.         0.86956522 0.95652174
 0.96       0.95652174 0.95652174 0.95652174]

mean value: 0.9226214944475815

key: train_fscore
value: [0.99512195 0.99029126 1.         1.         0.99507389 1.
 0.99507389 0.99507389 0.98039216 0.99019608]

mean value: 0.9941223123526399

key: test_precision
value: [1.         0.71428571 0.81818182 1.         0.90909091 1.
 0.92307692 1.         0.91666667 0.91666667]

mean value: 0.9197968697968698

key: train_precision
value: [1.         0.99029126 1.         1.         1.         1.
 1.         1.         0.99009901 1.        ]

mean value: 0.9980390272036912

key: test_recall
value: [0.90909091 0.90909091 0.81818182 1.         0.83333333 0.91666667
 1.         0.91666667 1.         1.        ]

mean value: 0.9303030303030303

key: train_recall
value: [0.99029126 0.99029126 1.         1.         0.99019608 1.
 0.99019608 0.99019608 0.97087379 0.98058252]

mean value: 0.9902627070245574

key: test_roc_auc
value: [0.95454545 0.78787879 0.82575758 1.         0.87121212 0.95833333
 0.95454545 0.95833333 0.95454545 0.95454545]

mean value: 0.921969696969697

key: train_roc_auc
value: [0.99514563 0.99024367 1.         1.         0.99509804 1.
 0.99509804 0.99509804 0.98058252 0.99029126]

mean value: 0.9941557205406435

key: test_jcc
value: [0.90909091 0.66666667 0.69230769 1.         0.76923077 0.91666667
 0.92307692 0.91666667 0.91666667 0.91666667]

mean value: 0.8627039627039627

key: train_jcc
value: [0.99029126 0.98076923 1.         1.         0.99019608 1.
 0.99019608 0.99019608 0.96153846 0.98058252]

mean value: 0.9883769714009577

MCC on Blind test: 0.14

Accuracy on Blind test: 0.55

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.0428648  0.05020165 0.05070519 0.05125546 0.05086875 0.05098724
 0.05125928 0.05102658 0.0502708  0.05003095]

mean value: 0.049947071075439456

key: score_time
value: [0.02246642 0.02030492 0.02142429 0.02147484 0.02218461 0.02291584
 0.02219915 0.02224135 0.01964617 0.02045941]

mean value: 0.02153170108795166

key: test_mcc
value: [0.48075018 0.48856385 0.39393939 0.65909298 0.56490196 0.6992059
 0.91666667 0.38932432 0.68313005 0.18257419]

mean value: 0.5458149483128502

key: train_mcc
value: [0.94306341 0.92211753 0.9024367  0.90261781 0.92211753 0.9024367
 0.93175328 0.91224062 0.90291262 0.88366175]

mean value: 0.9125357952633087

key: test_accuracy
value: [0.73913043 0.73913043 0.69565217 0.82608696 0.7826087  0.82608696
 0.95652174 0.69565217 0.81818182 0.59090909]

mean value: 0.76699604743083

key: train_accuracy
value: [0.97073171 0.96097561 0.95121951 0.95121951 0.96097561 0.95121951
 0.96585366 0.95609756 0.95145631 0.94174757]

mean value: 0.9561496566421974

key: test_fscore
value: [0.7        0.75       0.69565217 0.8        0.8        0.8
 0.95652174 0.72       0.77777778 0.57142857]

mean value: 0.7571380262249827

key: train_fscore
value: [0.97169811 0.96153846 0.95145631 0.95098039 0.96039604 0.95098039
 0.96585366 0.95609756 0.95145631 0.94230769]

mean value: 0.9562764931842805

key: test_precision
value: [0.77777778 0.69230769 0.66666667 0.88888889 0.76923077 1.
 1.         0.69230769 1.         0.6       ]

mean value: 0.8087179487179487

key: train_precision
value: [0.94495413 0.95238095 0.95145631 0.96039604 0.97       0.95098039
 0.96116505 0.95145631 0.95145631 0.93333333]

mean value: 0.9527578826498

key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.72727273 0.83333333 0.66666667
 0.91666667 0.75       0.63636364 0.54545455]

mean value: 0.7257575757575757

key: train_recall
value: [1.         0.97087379 0.95145631 0.94174757 0.95098039 0.95098039
 0.97058824 0.96078431 0.95145631 0.95145631]

mean value: 0.9600323624595469

key: test_roc_auc
value: [0.73484848 0.74242424 0.6969697  0.8219697  0.78030303 0.83333333
 0.95833333 0.69318182 0.81818182 0.59090909]

mean value: 0.7670454545454545

key: train_roc_auc
value: [0.97058824 0.96092709 0.95121835 0.95126594 0.96092709 0.95121835
 0.96587664 0.95612031 0.95145631 0.94174757]

mean value: 0.9561345897582334

key: test_jcc
value: [0.53846154 0.6        0.53333333 0.66666667 0.66666667 0.66666667
 0.91666667 0.5625     0.63636364 0.4       ]

mean value: 0.6187325174825175

key: train_jcc
value: [0.94495413 0.92592593 0.90740741 0.90654206 0.92380952 0.90654206
 0.93396226 0.91588785 0.90740741 0.89090909]

mean value: 0.9163347710667489

MCC on Blind test: 0.34

Accuracy on Blind test: 0.67

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.14630938 0.13025904 0.12856793 0.12697887 0.1226666  0.12366438
 0.12613249 0.12406492 0.1238749  0.1242547 ]

mean value: 0.127677321434021

key: score_time
value: [0.0091486  0.00918722 0.00929856 0.0083158  0.00833082 0.00817156
 0.00841713 0.00818658 0.00882602 0.0082829 ]

mean value: 0.0086165189743042

key: test_mcc
value: [0.82575758 0.58930667 0.74242424 1.         0.74242424 1.
 1.         1.         0.91287093 0.91287093]

mean value: 0.8725654585542313

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.7826087  0.86956522 1.         0.86956522 1.
 1.         1.         0.95454545 0.95454545]

mean value: 0.9343873517786562

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.8        0.86956522 1.         0.86956522 1.
 1.         1.         0.95652174 0.95652174]

mean value: 0.9361264822134387

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 0.71428571 0.83333333 1.         0.90909091 1.
 1.         1.         0.91666667 0.91666667]

mean value: 0.9199134199134199

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 0.90909091 1.         0.83333333 1.
 1.         1.         1.         1.        ]

mean value: 0.956060606060606

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91287879 0.78787879 0.87121212 1.         0.87121212 1.
 1.         1.         0.95454545 0.95454545]

mean value: 0.9352272727272727

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.66666667 0.76923077 1.         0.76923077 1.
 1.         1.         0.91666667 0.91666667]

mean value: 0.8871794871794871

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.55

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00901484 0.01138854 0.01411247 0.01156902 0.01335311 0.01166534
 0.01175451 0.01167417 0.01154208 0.01187825]

mean value: 0.011795234680175782

key: score_time
value: [0.01050758 0.01051354 0.0105617  0.01048422 0.0105226  0.01054072
 0.01047564 0.01055694 0.01061535 0.01278687]

mean value: 0.010756516456604004

key: test_mcc
value: [0.17236256 0.6992059  0.29359034 0.76764947 0.22268089 0.74242424
 0.         0.22268089 0.56694671 0.61237244]

mean value: 0.4299913432106543

key: train_mcc
value: [0.56341118 0.60589978 0.60122852 0.61135735 0.48234717 0.56859428
 0.4515346  0.56519801 0.56644742 0.60352167]

mean value: 0.5619539992238989

key: test_accuracy
value: [0.56521739 0.82608696 0.60869565 0.86956522 0.56521739 0.86956522
 0.52173913 0.56521739 0.77272727 0.77272727]

mean value: 0.6936758893280632

key: train_accuracy
value: [0.74634146 0.7804878  0.7902439  0.7804878  0.68780488 0.76585366
 0.66829268 0.74146341 0.76213592 0.76699029]

mean value: 0.7490101823348331

key: test_fscore
value: [0.64285714 0.84615385 0.68965517 0.88       0.70588235 0.86956522
 0.68571429 0.70588235 0.8        0.81481481]

mean value: 0.7640525185227539

key: train_fscore
value: [0.796875   0.81632653 0.81545064 0.81781377 0.76119403 0.8
 0.75       0.79377432 0.8        0.81102362]

mean value: 0.7962457910535393

key: test_precision
value: [0.52941176 0.73333333 0.55555556 0.78571429 0.54545455 0.90909091
 0.52173913 0.54545455 0.71428571 0.6875    ]

mean value: 0.6527539784029553

key: train_precision
value: [0.66666667 0.70422535 0.73076923 0.70138889 0.61445783 0.69565217
 0.6        0.65806452 0.69014085 0.68211921]

mean value: 0.6743484710173275

key: test_recall
value: [0.81818182 1.         0.90909091 1.         1.         0.83333333
 1.         1.         0.90909091 1.        ]

mean value: 0.946969696969697

key: train_recall
value: [0.99029126 0.97087379 0.9223301  0.98058252 1.         0.94117647
 1.         1.         0.95145631 1.        ]

mean value: 0.9756710451170759

key: test_roc_auc
value: [0.57575758 0.83333333 0.62121212 0.875      0.54545455 0.87121212
 0.5        0.54545455 0.77272727 0.77272727]

mean value: 0.6912878787878788

key: train_roc_auc
value: [0.74514563 0.77955454 0.78959642 0.77950695 0.68932039 0.76670474
 0.66990291 0.74271845 0.76213592 0.76699029]

mean value: 0.7491576242147344

key: test_jcc
value: [0.47368421 0.73333333 0.52631579 0.78571429 0.54545455 0.76923077
 0.52173913 0.54545455 0.66666667 0.6875    ]

mean value: 0.6255093276288928

key: train_jcc
value: [0.66233766 0.68965517 0.6884058  0.69178082 0.61445783 0.66666667
 0.6        0.65806452 0.66666667 0.68211921]

mean value: 0.6620154339856393

MCC on Blind test: 0.32

Accuracy on Blind test: 0.6

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01143074 0.0101707  0.01010108 0.0101788  0.01018381 0.01013422
 0.01016164 0.01012945 0.01014853 0.010252  ]

mean value: 0.01028909683227539

key: score_time
value: [0.01026726 0.0102675  0.01020479 0.01026201 0.01026845 0.01027465
 0.01028585 0.01047206 0.01028395 0.0104382 ]

mean value: 0.010302472114562988

key: test_mcc
value: [0.62050523 0.74242424 0.39393939 0.83743579 0.74047959 0.91666667
 0.91605722 0.82575758 1.         0.83205029]

mean value: 0.7825316005376914

key: train_mcc
value: [0.84404459 0.82455974 0.86409538 0.82438607 0.86356283 0.81495251
 0.83417421 0.84389872 0.83499081 0.83499081]

mean value: 0.8383655666865866

key: test_accuracy
value: [0.7826087  0.86956522 0.69565217 0.91304348 0.86956522 0.95652174
 0.95652174 0.91304348 1.         0.90909091]

mean value: 0.8865612648221344

key: train_accuracy
value: [0.92195122 0.91219512 0.93170732 0.91219512 0.93170732 0.90731707
 0.91707317 0.92195122 0.91747573 0.91747573]

mean value: 0.919104901728629

key: test_fscore
value: [0.70588235 0.86956522 0.69565217 0.9        0.88       0.95652174
 0.96       0.91666667 1.         0.9       ]

mean value: 0.8784288150042625

key: train_fscore
value: [0.92307692 0.91176471 0.93069307 0.91262136 0.93069307 0.90547264
 0.91625616 0.92156863 0.9178744  0.9178744 ]

mean value: 0.9187895340969339

key: test_precision
value: [1.         0.83333333 0.66666667 1.         0.84615385 1.
 0.92307692 0.91666667 1.         1.        ]

mean value: 0.9185897435897435

key: train_precision
value: [0.91428571 0.92079208 0.94949495 0.91262136 0.94       0.91919192
 0.92079208 0.92156863 0.91346154 0.91346154]

mean value: 0.9225669804985783

key: test_recall
value: [0.54545455 0.90909091 0.72727273 0.81818182 0.91666667 0.91666667
 1.         0.91666667 1.         0.81818182]

mean value: 0.8568181818181818

key: train_recall
value: [0.93203883 0.90291262 0.91262136 0.91262136 0.92156863 0.89215686
 0.91176471 0.92156863 0.9223301  0.9223301 ]

mean value: 0.915191319246145

key: test_roc_auc
value: [0.77272727 0.87121212 0.6969697  0.90909091 0.86742424 0.95833333
 0.95454545 0.91287879 1.         0.90909091]

mean value: 0.8852272727272728

key: train_roc_auc
value: [0.92190177 0.91224062 0.93180088 0.91219303 0.9316581  0.90724348
 0.9170474  0.92194936 0.91747573 0.91747573]

mean value: 0.9190986103179135

key: test_jcc
value: [0.54545455 0.76923077 0.53333333 0.81818182 0.78571429 0.91666667
 0.92307692 0.84615385 1.         0.81818182]

mean value: 0.7955994005994006

key: train_jcc
value: [0.85714286 0.83783784 0.87037037 0.83928571 0.87037037 0.82727273
 0.84545455 0.85454545 0.84821429 0.84821429]

mean value: 0.8498708448708449

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.1247797  0.08746672 0.08165932 0.08129358 0.08163166 0.08223915
 0.0814786  0.08128548 0.08197618 0.09512329]

mean value: 0.08789336681365967

key: score_time
value: [0.01067257 0.01061177 0.01049924 0.01049209 0.01054454 0.01051307
 0.01054621 0.0105319  0.01048279 0.01061583]

mean value: 0.010550999641418457

key: test_mcc
value: [0.69084928 0.65151515 0.39393939 0.91605722 0.74047959 0.91666667
 0.74047959 0.91605722 0.83205029 0.83205029]

mean value: 0.7630144710301748

key: train_mcc
value: [0.85400014 0.87352395 0.89272796 0.81467733 0.86356283 0.84389872
 0.84389872 0.86358877 0.83499081 0.83499081]

mean value: 0.8519860051204752

key: test_accuracy
value: [0.82608696 0.82608696 0.69565217 0.95652174 0.86956522 0.95652174
 0.86956522 0.95652174 0.90909091 0.90909091]

mean value: 0.8774703557312253

key: train_accuracy
value: [0.92682927 0.93658537 0.94634146 0.90731707 0.93170732 0.92195122
 0.92195122 0.93170732 0.91747573 0.91747573]

mean value: 0.925934170021312

key: test_fscore
value: [0.77777778 0.81818182 0.69565217 0.95238095 0.88       0.95652174
 0.88       0.96       0.9        0.9       ]

mean value: 0.8720514461384027

key: train_fscore
value: [0.92822967 0.93779904 0.94634146 0.90731707 0.93069307 0.92156863
 0.92156863 0.93203883 0.9178744  0.9178744 ]

mean value: 0.9261305196150217

key: test_precision
value: [1.         0.81818182 0.66666667 1.         0.84615385 1.
 0.84615385 0.92307692 1.         1.        ]

mean value: 0.91002331002331

key: train_precision
value: [0.91509434 0.9245283  0.95098039 0.91176471 0.94       0.92156863
 0.92156863 0.92307692 0.91346154 0.91346154]

mean value: 0.9235504994450611

key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.90909091 0.91666667 0.91666667
 0.91666667 1.         0.81818182 0.81818182]

mean value: 0.8477272727272728

key: train_recall
value: [0.94174757 0.95145631 0.94174757 0.90291262 0.92156863 0.92156863
 0.92156863 0.94117647 0.9223301  0.9223301 ]

mean value: 0.9288406624785837

key: test_roc_auc
value: [0.81818182 0.82575758 0.6969697  0.95454545 0.86742424 0.95833333
 0.86742424 0.95454545 0.90909091 0.90909091]

mean value: 0.8761363636363636

key: train_roc_auc
value: [0.92675614 0.93651247 0.94636398 0.90733866 0.9316581  0.92194936
 0.92194936 0.93175328 0.91747573 0.91747573]

mean value: 0.9259232819341329

key: test_jcc
value: [0.63636364 0.69230769 0.53333333 0.90909091 0.78571429 0.91666667
 0.78571429 0.92307692 0.81818182 0.81818182]

mean value: 0.7818631368631369

key: train_jcc
value: [0.86607143 0.88288288 0.89814815 0.83035714 0.87037037 0.85454545
 0.85454545 0.87272727 0.84821429 0.84821429]

mean value: 0.8626076726076726

MCC on Blind test: 0.12

Accuracy on Blind test: 0.56

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01771116 0.01344275 0.0129807  0.01292109 0.01246119 0.01508546
 0.01303816 0.0129621  0.01367259 0.01364589]

mean value: 0.013792109489440919

key: score_time
value: [0.0105226  0.00815105 0.00787663 0.00783443 0.00784397 0.00815797
 0.00775981 0.00775981 0.0080657  0.0077579 ]

mean value: 0.008172988891601562

key: test_mcc
value: [ 0.56407607  0.875       0.63245553  0.57735027  0.57735027  0.57735027
  1.         -0.14285714  0.31622777  0.28867513]

mean value: 0.5265628172174828

key: train_mcc
value: [0.8114612  0.76470609 0.70321085 0.75       0.73446466 0.78278036
 0.71910121 0.75146915 0.73446466 0.78163175]

mean value: 0.7533289937125471

key: test_accuracy
value: [0.73333333 0.93333333 0.78571429 0.78571429 0.78571429 0.78571429
 1.         0.42857143 0.64285714 0.64285714]

mean value: 0.7523809523809524

key: train_accuracy
value: [0.90551181 0.88188976 0.8515625  0.875      0.8671875  0.890625
 0.859375   0.875      0.8671875  0.890625  ]

mean value: 0.876396407480315

key: test_fscore
value: [0.77777778 0.93333333 0.82352941 0.76923077 0.76923077 0.76923077
 1.         0.42857143 0.70588235 0.66666667]

mean value: 0.7643453278747396

key: train_fscore
value: [0.9047619  0.88372093 0.85271318 0.875      0.86821705 0.89393939
 0.86153846 0.87878788 0.86821705 0.89230769]

mean value: 0.8779203548389595

key: test_precision
value: [0.63636364 1.         0.7        0.83333333 0.83333333 0.83333333
 1.         0.42857143 0.6        0.625     ]

mean value: 0.7489935064935065

key: train_precision
value: [0.91935484 0.86363636 0.84615385 0.875      0.86153846 0.86764706
 0.84848485 0.85294118 0.86153846 0.87878788]

mean value: 0.8675082934143655

key: test_recall
value: [1.         0.875      1.         0.71428571 0.71428571 0.71428571
 1.         0.42857143 0.85714286 0.71428571]

mean value: 0.8017857142857143

key: train_recall
value: [0.890625  0.9047619 0.859375  0.875     0.875     0.921875  0.875
 0.90625   0.875     0.90625  ]

mean value: 0.8889136904761905

key: test_roc_auc
value: [0.75       0.9375     0.78571429 0.78571429 0.78571429 0.78571429
 1.         0.42857143 0.64285714 0.64285714]

mean value: 0.7544642857142857

key: train_roc_auc
value: [0.90562996 0.88206845 0.8515625  0.875      0.8671875  0.890625
 0.859375   0.875      0.8671875  0.890625  ]

mean value: 0.8764260912698413

key: test_jcc
value: [0.63636364 0.875      0.7        0.625      0.625      0.625
 1.         0.27272727 0.54545455 0.5       ]

mean value: 0.6404545454545454

key: train_jcc
value: [0.82608696 0.79166667 0.74324324 0.77777778 0.76712329 0.80821918
 0.75675676 0.78378378 0.76712329 0.80555556]

mean value: 0.782733649373018

MCC on Blind test: 0.38

Accuracy on Blind test: 0.69

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.38585448 0.3669436  0.36463881 0.37112308 0.37720537 0.37863564
 0.39239931 0.38488078 0.38222742 0.37554836]

mean value: 0.37794568538665774

key: score_time
value: [0.00842977 0.00810385 0.00817847 0.00810599 0.00813341 0.00875115
 0.00849652 0.00883865 0.00872874 0.0085032 ]

mean value: 0.008426976203918458

key: test_mcc
value: [0.56407607 0.46428571 0.71428571 0.4472136  0.8660254  0.57735027
 1.         0.4472136  0.42857143 0.42857143]

mean value: 0.5937593224506033

key: train_mcc
value: [1.         0.95287698 1.         0.9379581  1.         0.95417386
 0.98449518 0.96922337 1.         0.95324137]

mean value: 0.9751968870278498

key: test_accuracy
value: [0.73333333 0.73333333 0.85714286 0.71428571 0.92857143 0.78571429
 1.         0.71428571 0.71428571 0.71428571]

mean value: 0.7895238095238095

key: train_accuracy
value: [1.         0.97637795 1.         0.96875    1.         0.9765625
 0.9921875  0.984375   1.         0.9765625 ]

mean value: 0.9874815452755905

key: test_fscore
value: [0.77777778 0.75       0.85714286 0.66666667 0.92307692 0.76923077
 1.         0.66666667 0.71428571 0.71428571]

mean value: 0.7839133089133089

key: train_fscore
value: [1.         0.97637795 1.         0.96923077 1.         0.97709924
 0.99224806 0.98461538 1.         0.97674419]

mean value: 0.9876315591305296

key: test_precision
value: [0.63636364 0.75       0.85714286 0.8        1.         0.83333333
 1.         0.8        0.71428571 0.71428571]

mean value: 0.8105411255411256

key: train_precision
value: [1.         0.96875    1.         0.95454545 1.         0.95522388
 0.98461538 0.96969697 1.         0.96923077]

mean value: 0.9802062458685593

key: test_recall
value: [1.         0.75       0.85714286 0.57142857 0.85714286 0.71428571
 1.         0.57142857 0.71428571 0.71428571]

mean value: 0.775

key: train_recall
value: [1.         0.98412698 1.         0.984375   1.         1.
 1.         1.         1.         0.984375  ]

mean value: 0.9952876984126984

key: test_roc_auc
value: [0.75       0.73214286 0.85714286 0.71428571 0.92857143 0.78571429
 1.         0.71428571 0.71428571 0.71428571]

mean value: 0.7910714285714286

key: train_roc_auc
value: [1.         0.97643849 1.         0.96875    1.         0.9765625
 0.9921875  0.984375   1.         0.9765625 ]

mean value: 0.9874875992063492

key: test_jcc
value: [0.63636364 0.6        0.75       0.5        0.85714286 0.625
 1.         0.5        0.55555556 0.55555556]

mean value: 0.6579617604617605

key: train_jcc
value: [1.         0.95384615 1.         0.94029851 1.         0.95522388
 0.98461538 0.96969697 1.         0.95454545]

mean value: 0.9758226350763665

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00919986 0.0087204  0.0068109  0.00678849 0.00648975 0.00657201
 0.00651431 0.00667238 0.00647902 0.00650811]

mean value: 0.0070755243301391605

key: score_time
value: [0.0102222  0.01021814 0.00809312 0.00810909 0.00769401 0.00771046
 0.0077498  0.00773859 0.00767064 0.0077889 ]

mean value: 0.008299493789672851

key: test_mcc
value: [ 0.56407607  0.34247476  0.2773501   0.63245553  0.52223297  0.
  0.52223297 -0.17407766  0.17407766  0.31622777]

mean value: 0.3177050166425868

key: train_mcc
value: [0.42609813 0.36309219 0.42452948 0.45355737 0.43819207 0.40213949
 0.40574111 0.51298918 0.40574111 0.46530981]

mean value: 0.4297389942041204

key: test_accuracy
value: [0.73333333 0.66666667 0.57142857 0.78571429 0.71428571 0.5
 0.71428571 0.42857143 0.57142857 0.64285714]

mean value: 0.6328571428571429

key: train_accuracy
value: [0.68503937 0.61417323 0.6875     0.6875     0.6953125  0.6640625
 0.671875   0.734375   0.671875   0.7109375 ]

mean value: 0.6822650098425197

key: test_fscore
value: [0.77777778 0.73684211 0.7        0.82352941 0.77777778 0.58823529
 0.77777778 0.55555556 0.66666667 0.70588235]

mean value: 0.7110044719642242

key: train_fscore
value: [0.75       0.72       0.74683544 0.75609756 0.75159236 0.73939394
 0.74074074 0.77922078 0.74074074 0.76129032]

mean value: 0.7485911883378328

key: test_precision
value: [0.63636364 0.63636364 0.53846154 0.7        0.63636364 0.5
 0.63636364 0.45454545 0.54545455 0.6       ]

mean value: 0.5883916083916084

key: train_precision
value: [0.625      0.5625     0.62765957 0.62       0.6344086  0.6039604
 0.6122449  0.66666667 0.6122449  0.64835165]

mean value: 0.6213036683594909

key: test_recall
value: [1.         0.875      1.         1.         1.         0.71428571
 1.         0.71428571 0.85714286 0.85714286]

mean value: 0.9017857142857143

key: train_recall
value: [0.9375   1.       0.921875 0.96875  0.921875 0.953125 0.9375   0.9375
 0.9375   0.921875]

mean value: 0.94375

key: test_roc_auc
value: [0.75       0.65178571 0.57142857 0.78571429 0.71428571 0.5
 0.71428571 0.42857143 0.57142857 0.64285714]

mean value: 0.6330357142857143

key: train_roc_auc
value: [0.68303571 0.6171875  0.6875     0.6875     0.6953125  0.6640625
 0.671875   0.734375   0.671875   0.7109375 ]

mean value: 0.6823660714285714

key: test_jcc
value: [0.63636364 0.58333333 0.53846154 0.7        0.63636364 0.41666667
 0.63636364 0.38461538 0.5        0.54545455]

mean value: 0.5577622377622378

key: train_jcc
value: [0.6        0.5625     0.5959596  0.60784314 0.60204082 0.58653846
 0.58823529 0.63829787 0.58823529 0.61458333]

mean value: 0.5984233804988544

MCC on Blind test: 0.42

Accuracy on Blind test: 0.69

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00683665 0.00669193 0.00670457 0.00669694 0.00665402 0.00670123
 0.00669932 0.00667048 0.00668812 0.0066514 ]

mean value: 0.006699466705322265

key: score_time
value: [0.00776768 0.0077312  0.0078032  0.00768924 0.0077219  0.00774479
 0.00771761 0.00768328 0.00771379 0.00770187]

mean value: 0.0077274560928344725

key: test_mcc
value: [-0.19642857  0.47245559  0.63245553 -0.14285714  0.          0.4472136
  0.4472136   0.          0.28867513  0.1490712 ]

mean value: 0.20977989331042105

key: train_mcc
value: [0.35590281 0.40535457 0.36154406 0.34995662 0.43771378 0.36480373
 0.40704579 0.34391797 0.37665889 0.375     ]

mean value: 0.37778982176154485

key: test_accuracy
value: [0.4        0.73333333 0.78571429 0.42857143 0.5        0.71428571
 0.71428571 0.5        0.64285714 0.57142857]

mean value: 0.599047619047619

key: train_accuracy
value: [0.67716535 0.7007874  0.6796875  0.671875   0.71875    0.6796875
 0.703125   0.671875   0.6875     0.6875    ]

mean value: 0.6877952755905512

key: test_fscore
value: [0.4        0.77777778 0.82352941 0.42857143 0.53333333 0.66666667
 0.75       0.53333333 0.66666667 0.5       ]

mean value: 0.6079878618113912

key: train_fscore
value: [0.6962963  0.71641791 0.6962963  0.7        0.71428571 0.70503597
 0.71212121 0.67692308 0.70149254 0.6875    ]

mean value: 0.7006369014906811

key: test_precision
value: [0.375      0.7        0.7        0.42857143 0.5        0.8
 0.66666667 0.5        0.625      0.6       ]

mean value: 0.5895238095238096

key: train_precision
value: [0.66197183 0.67605634 0.66197183 0.64473684 0.72580645 0.65333333
 0.69117647 0.66666667 0.67142857 0.6875    ]

mean value: 0.6740648335734973

key: test_recall
value: [0.42857143 0.875      1.         0.42857143 0.57142857 0.57142857
 0.85714286 0.57142857 0.71428571 0.42857143]

mean value: 0.6446428571428571

key: train_recall
value: [0.734375   0.76190476 0.734375   0.765625   0.703125   0.765625
 0.734375   0.6875     0.734375   0.6875    ]

mean value: 0.7308779761904762

key: test_roc_auc
value: [0.40178571 0.72321429 0.78571429 0.42857143 0.5        0.71428571
 0.71428571 0.5        0.64285714 0.57142857]

mean value: 0.5982142857142857

key: train_roc_auc
value: [0.67671131 0.70126488 0.6796875  0.671875   0.71875    0.6796875
 0.703125   0.671875   0.6875     0.6875    ]

mean value: 0.687797619047619

key: test_jcc
value: [0.25       0.63636364 0.7        0.27272727 0.36363636 0.5
 0.6        0.36363636 0.5        0.33333333]

mean value: 0.45196969696969697

key: train_jcc
value: [0.53409091 0.55813953 0.53409091 0.53846154 0.55555556 0.54444444
 0.55294118 0.51162791 0.54022989 0.52380952]

mean value: 0.5393391383841405

MCC on Blind test: 0.39

Accuracy on Blind test: 0.69

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00661182 0.00731397 0.0074687  0.00730467 0.00664449 0.00739288
 0.00724483 0.00748181 0.0071733  0.00716472]

mean value: 0.007180118560791015

key: score_time
value: [0.00903392 0.00946617 0.01499748 0.01391649 0.00954628 0.01419783
 0.01409173 0.00956845 0.00938296 0.00936103]

mean value: 0.011356234550476074

key: test_mcc
value: [ 0.34247476  0.04029115  0.14285714 -0.1490712  -0.31622777  0.28867513
  0.14285714  0.14285714  0.14285714  0.1490712 ]

mean value: 0.0926641847918602

key: train_mcc
value: [0.52955101 0.59052579 0.59491308 0.5172058  0.48729852 0.51568795
 0.37518324 0.50221186 0.5787612  0.53229065]

mean value: 0.5223629097954794

key: test_accuracy
value: [0.66666667 0.53333333 0.57142857 0.42857143 0.35714286 0.64285714
 0.57142857 0.57142857 0.57142857 0.57142857]

mean value: 0.5485714285714286

key: train_accuracy
value: [0.76377953 0.79527559 0.796875   0.7578125  0.7421875  0.7578125
 0.6875     0.75       0.7890625  0.765625  ]

mean value: 0.7605930118110236

key: test_fscore
value: [0.54545455 0.63157895 0.57142857 0.5        0.18181818 0.66666667
 0.57142857 0.57142857 0.57142857 0.5       ]

mean value: 0.53112326270221

key: train_fscore
value: [0.7761194  0.79365079 0.79032258 0.76691729 0.72727273 0.75590551
 0.68253968 0.73770492 0.784      0.75806452]

mean value: 0.7572497426299365

key: test_precision
value: [0.75       0.54545455 0.57142857 0.44444444 0.25       0.625
 0.57142857 0.57142857 0.57142857 0.6       ]

mean value: 0.5500613275613275

key: train_precision
value: [0.74285714 0.79365079 0.81666667 0.73913043 0.77192982 0.76190476
 0.69354839 0.77586207 0.80327869 0.78333333]

mean value: 0.7682162102343593

key: test_recall
value: [0.42857143 0.75       0.57142857 0.57142857 0.14285714 0.71428571
 0.57142857 0.57142857 0.57142857 0.42857143]

mean value: 0.5321428571428571

key: train_recall
value: [0.8125     0.79365079 0.765625   0.796875   0.6875     0.75
 0.671875   0.703125   0.765625   0.734375  ]

mean value: 0.7481150793650794

key: test_roc_auc
value: [0.65178571 0.51785714 0.57142857 0.42857143 0.35714286 0.64285714
 0.57142857 0.57142857 0.57142857 0.57142857]

mean value: 0.5455357142857142

key: train_roc_auc
value: [0.76339286 0.7952629  0.796875   0.7578125  0.7421875  0.7578125
 0.6875     0.75       0.7890625  0.765625  ]

mean value: 0.7605530753968254

key: test_jcc
value: [0.375      0.46153846 0.4        0.33333333 0.1        0.5
 0.4        0.4        0.4        0.33333333]

mean value: 0.3703205128205128

key: train_jcc
value: [0.63414634 0.65789474 0.65333333 0.62195122 0.57142857 0.60759494
 0.51807229 0.58441558 0.64473684 0.61038961]

mean value: 0.6103963465355565

MCC on Blind test: 0.2

Accuracy on Blind test: 0.6

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.00949454 0.00895095 0.0088582  0.00882721 0.00884652 0.00871706
 0.00863242 0.00808167 0.00882864 0.00886631]

mean value: 0.00881035327911377

key: score_time
value: [0.00895262 0.00876713 0.00880933 0.00889349 0.00878167 0.00939393
 0.00883389 0.00905347 0.00875926 0.00885344]

mean value: 0.008909821510314941

key: test_mcc
value: [ 0.49099025  0.6000992   0.31622777  0.42857143  0.28867513  0.57735027
  0.57735027 -0.17407766  0.          0.1490712 ]

mean value: 0.3254257861286581

key: train_mcc
value: [0.76388889 0.75156113 0.70389875 0.68884672 0.67195703 0.73518314
 0.62776482 0.67261436 0.67195703 0.76571848]

mean value: 0.7053390350757779

key: test_accuracy
value: [0.73333333 0.8        0.64285714 0.71428571 0.64285714 0.78571429
 0.78571429 0.42857143 0.5        0.57142857]

mean value: 0.6604761904761904

key: train_accuracy
value: [0.88188976 0.87401575 0.8515625  0.84375    0.8359375  0.8671875
 0.8125     0.8359375  0.8359375  0.8828125 ]

mean value: 0.8521530511811024

key: test_fscore
value: [0.75       0.82352941 0.70588235 0.71428571 0.61538462 0.76923077
 0.76923077 0.55555556 0.53333333 0.5       ]

mean value: 0.673643252172664

key: train_fscore
value: [0.88188976 0.87878788 0.85496183 0.84848485 0.83464567 0.87022901
 0.82089552 0.83969466 0.83464567 0.88372093]

mean value: 0.8547955778438756

key: test_precision
value: [0.66666667 0.77777778 0.6        0.71428571 0.66666667 0.83333333
 0.83333333 0.45454545 0.5        0.6       ]

mean value: 0.6646608946608946

key: train_precision
value: [0.88888889 0.84057971 0.8358209  0.82352941 0.84126984 0.85074627
 0.78571429 0.82089552 0.84126984 0.87692308]

mean value: 0.8405637742542732

key: test_recall
value: [0.85714286 0.875      0.85714286 0.71428571 0.57142857 0.71428571
 0.71428571 0.71428571 0.57142857 0.42857143]

mean value: 0.7017857142857142

key: train_recall
value: [0.875      0.92063492 0.875      0.875      0.828125   0.890625
 0.859375   0.859375   0.828125   0.890625  ]

mean value: 0.870188492063492

key: test_roc_auc
value: [0.74107143 0.79464286 0.64285714 0.71428571 0.64285714 0.78571429
 0.78571429 0.42857143 0.5        0.57142857]

mean value: 0.6607142857142857

key: train_roc_auc
value: [0.88194444 0.87437996 0.8515625  0.84375    0.8359375  0.8671875
 0.8125     0.8359375  0.8359375  0.8828125 ]

mean value: 0.8521949404761905

key: test_jcc
value: [0.6        0.7        0.54545455 0.55555556 0.44444444 0.625
 0.625      0.38461538 0.36363636 0.33333333]

mean value: 0.5177039627039627

key: train_jcc
value: [0.78873239 0.78378378 0.74666667 0.73684211 0.71621622 0.77027027
 0.69620253 0.72368421 0.71621622 0.79166667]

mean value: 0.747028106162106

MCC on Blind test: 0.33

Accuracy on Blind test: 0.67

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.4545188  0.45730591 0.4523077  0.58290815 0.4624598  0.48649025
 0.46417975 0.6089313  0.45516968 0.47067261]

mean value: 0.4894943952560425

key: score_time
value: [0.01082301 0.01304746 0.01290703 0.01316142 0.01308894 0.01309848
 0.01328969 0.01329231 0.01081371 0.01333547]

mean value: 0.012685751914978028

key: test_mcc
value: [0.49099025 0.05455447 0.42857143 0.42857143 0.57735027 0.57735027
 1.         0.14285714 0.         0.1490712 ]

mean value: 0.38493164624692183

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73333333 0.53333333 0.71428571 0.71428571 0.78571429 0.78571429
 1.         0.57142857 0.5        0.57142857]

mean value: 0.690952380952381

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.58823529 0.71428571 0.71428571 0.76923077 0.76923077
 1.         0.57142857 0.53333333 0.5       ]

mean value: 0.6910030165912519

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.55555556 0.71428571 0.71428571 0.83333333 0.83333333
 1.         0.57142857 0.5        0.6       ]

mean value: 0.6988888888888889

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.625      0.71428571 0.71428571 0.71428571 0.71428571
 1.         0.57142857 0.57142857 0.42857143]

mean value: 0.6910714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.74107143 0.52678571 0.71428571 0.71428571 0.78571429 0.78571429
 1.         0.57142857 0.5        0.57142857]

mean value: 0.6910714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.41666667 0.55555556 0.55555556 0.625      0.625
 1.         0.4        0.36363636 0.33333333]

mean value: 0.5474747474747474

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.15

Accuracy on Blind test: 0.57

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01064634 0.00947118 0.00758362 0.00736046 0.00720119 0.00724173
 0.00704122 0.00714111 0.00716519 0.00737357]

mean value: 0.007822561264038085

key: score_time
value: [0.01261735 0.00878739 0.00803566 0.0079627  0.00771618 0.00781584
 0.00776696 0.00778174 0.00768185 0.00766587]

mean value: 0.00838315486907959

key: test_mcc
value: [0.66143783 0.875      1.         0.52223297 0.71428571 0.8660254
 0.74535599 0.1490712  0.8660254  0.28867513]

mean value: 0.6688109643082562

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        0.93333333 1.         0.71428571 0.85714286 0.92857143
 0.85714286 0.57142857 0.92857143 0.64285714]

mean value: 0.8233333333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.93333333 1.         0.6        0.85714286 0.93333333
 0.83333333 0.625      0.93333333 0.61538462]

mean value: 0.8154390217625511

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        1.         1.         1.         0.85714286 0.875
 1.         0.55555556 0.875      0.66666667]

mean value: 0.8529365079365079

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         0.42857143 0.85714286 1.
 0.71428571 0.71428571 1.         0.57142857]

mean value: 0.8160714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.9375     1.         0.71428571 0.85714286 0.92857143
 0.85714286 0.57142857 0.92857143 0.64285714]

mean value: 0.8250000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.875      1.         0.42857143 0.75       0.875
 0.71428571 0.45454545 0.875      0.44444444]

mean value: 0.7116847041847042

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.55

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.07988238 0.07995939 0.07999921 0.08072162 0.07959414 0.08084035
 0.08072901 0.08017445 0.08056831 0.07979488]

mean value: 0.08022637367248535

key: score_time
value: [0.01623416 0.01625252 0.01616454 0.01613855 0.01608205 0.01615548
 0.01745152 0.01737761 0.01639533 0.01741838]

mean value: 0.016567015647888185

key: test_mcc
value: [0.37796447 0.875      0.8660254  0.71428571 0.8660254  0.8660254
 1.         0.14285714 0.57735027 0.28867513]

mean value: 0.6574208945289839

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.93333333 0.92857143 0.85714286 0.92857143 0.92857143
 1.         0.57142857 0.78571429 0.64285714]

mean value: 0.8242857142857143

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.70588235 0.93333333 0.93333333 0.85714286 0.92307692 0.92307692
 1.         0.57142857 0.8        0.61538462]

mean value: 0.8262658909717733

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.6        1.         0.875      0.85714286 1.         1.
 1.         0.57142857 0.75       0.66666667]

mean value: 0.8320238095238095

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.875      1.         0.85714286 0.85714286 0.85714286
 1.         0.57142857 0.85714286 0.57142857]

mean value: 0.8303571428571428

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.9375     0.92857143 0.85714286 0.92857143 0.92857143
 1.         0.57142857 0.78571429 0.64285714]

mean value: 0.8258928571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.54545455 0.875      0.875      0.75       0.85714286 0.85714286
 1.         0.4        0.66666667 0.44444444]

mean value: 0.7270851370851371

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.28

Accuracy on Blind test: 0.64

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00677133 0.00666976 0.00663257 0.00673056 0.00663328 0.00662422
 0.00719857 0.00701356 0.00682855 0.00671649]

mean value: 0.006781888008117676

key: score_time
value: [0.00769711 0.00769949 0.00777817 0.00801635 0.00804901 0.00826836
 0.00769639 0.00777459 0.00769544 0.00789642]

mean value: 0.007857131958007812

key: test_mcc
value: [-0.07142857  0.33928571  0.28867513  0.28867513  0.42857143  0.
  0.4472136   0.1490712   0.          0.        ]

mean value: 0.1870063634618141

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.46666667 0.66666667 0.64285714 0.64285714 0.71428571 0.5
 0.71428571 0.57142857 0.5        0.5       ]

mean value: 0.5919047619047619

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.42857143 0.66666667 0.61538462 0.66666667 0.71428571 0.46153846
 0.75       0.5        0.58823529 0.36363636]

mean value: 0.5754985210867564

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.42857143 0.71428571 0.66666667 0.625      0.71428571 0.5
 0.66666667 0.6        0.5        0.5       ]

mean value: 0.591547619047619

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.42857143 0.625      0.57142857 0.71428571 0.71428571 0.42857143
 0.85714286 0.42857143 0.71428571 0.28571429]

mean value: 0.5767857142857142

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.46428571 0.66964286 0.64285714 0.64285714 0.71428571 0.5
 0.71428571 0.57142857 0.5        0.5       ]

mean value: 0.5919642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.27272727 0.5        0.44444444 0.5        0.55555556 0.3
 0.6        0.33333333 0.41666667 0.22222222]

mean value: 0.41449494949494947

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.03270388 1.02527833 1.038306   1.02501392 1.02789044 1.00910234
 1.033144   1.01801777 1.01170015 1.01512265]

mean value: 1.0236279487609863

key: score_time
value: [0.09661889 0.0889287  0.09118915 0.0894289  0.09080982 0.08694053
 0.09180784 0.09045506 0.08735704 0.09318399]

mean value: 0.09067199230194092

key: test_mcc
value: [0.56407607 0.875      1.         0.71428571 0.71428571 1.
 1.         0.         0.8660254  0.57735027]

mean value: 0.7311023176363259

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73333333 0.93333333 1.         0.85714286 0.85714286 1.
 1.         0.5        0.92857143 0.78571429]

mean value: 0.8595238095238095

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.77777778 0.93333333 1.         0.85714286 0.85714286 1.
 1.         0.53333333 0.93333333 0.8       ]

mean value: 0.8692063492063492

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.63636364 1.         1.         0.85714286 0.85714286 1.
 1.         0.5        0.875      0.75      ]

mean value: 0.847564935064935

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         0.85714286 0.85714286 1.
 1.         0.57142857 1.         0.85714286]

mean value: 0.9017857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.9375     1.         0.85714286 0.85714286 1.
 1.         0.5        0.92857143 0.78571429]

mean value: 0.8616071428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63636364 0.875      1.         0.75       0.75       1.
 1.         0.36363636 0.875      0.66666667]

mean value: 0.7916666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.59

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.85148787 0.9046824  0.85188794 0.83856058 0.89779782 0.83885241
 0.90653539 0.82567453 0.8553915  0.97453666]

mean value: 0.8745407104492188

key: score_time
value: [0.34471321 0.19580126 0.22999787 0.1582098  0.21987033 0.21252227
 0.18508577 0.23894954 0.15599632 0.22014427]

mean value: 0.21612906455993652

key: test_mcc
value: [ 0.56407607  0.875       0.74535599  0.71428571  0.57735027  0.8660254
  0.8660254  -0.14285714  0.71428571  0.42857143]

mean value: 0.6208118858361914

key: train_mcc
value: [0.93745372 0.93748452 0.92288947 0.92288947 0.9379581  0.95417386
 0.95324137 0.93933644 0.90802522 0.93933644]

mean value: 0.9352788622064109

key: test_accuracy
value: [0.73333333 0.93333333 0.85714286 0.85714286 0.78571429 0.92857143
 0.92857143 0.42857143 0.85714286 0.71428571]

mean value: 0.8023809523809524

key: train_accuracy
value: [0.96850394 0.96850394 0.9609375  0.9609375  0.96875    0.9765625
 0.9765625  0.96875    0.953125   0.96875   ]

mean value: 0.9671382874015748

key: test_fscore
value: [0.77777778 0.93333333 0.875      0.85714286 0.76923077 0.92307692
 0.92307692 0.42857143 0.85714286 0.71428571]

mean value: 0.8058638583638583

key: train_fscore
value: [0.96923077 0.96875    0.96183206 0.96183206 0.96923077 0.97709924
 0.97674419 0.96969697 0.95454545 0.96969697]

mean value: 0.967865847722607

key: test_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[0.63636364 1.         0.77777778 0.85714286 0.83333333 1.
 1.         0.42857143 0.85714286 0.71428571]

mean value: 0.8104617604617604

key: train_precision
value: [0.95454545 0.95384615 0.94029851 0.94029851 0.95454545 0.95522388
 0.96923077 0.94117647 0.92647059 0.94117647]

mean value: 0.9476812257101985

key: test_recall
value: [1.         0.875      1.         0.85714286 0.71428571 0.85714286
 0.85714286 0.42857143 0.85714286 0.71428571]

mean value: 0.8160714285714286

key: train_recall
value: [0.984375   0.98412698 0.984375   0.984375   0.984375   1.
 0.984375   1.         0.984375   1.        ]

mean value: 0.9890376984126984

key: test_roc_auc
value: [0.75       0.9375     0.85714286 0.85714286 0.78571429 0.92857143
 0.92857143 0.42857143 0.85714286 0.71428571]

mean value: 0.8044642857142857

key: train_roc_auc
value: [0.96837798 0.96862599 0.9609375  0.9609375  0.96875    0.9765625
 0.9765625  0.96875    0.953125   0.96875   ]

mean value: 0.9671378968253969

key: test_jcc
value: [0.63636364 0.875      0.77777778 0.75       0.625      0.85714286
 0.85714286 0.27272727 0.75       0.55555556]

mean value: 0.6956709956709957

key: train_jcc
value: [0.94029851 0.93939394 0.92647059 0.92647059 0.94029851 0.95522388
 0.95454545 0.94117647 0.91304348 0.94117647]

mean value: 0.9378097885369711

MCC on Blind test: 0.29

Accuracy on Blind test: 0.63

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01670504 0.00717616 0.00681663 0.00686646 0.00743771 0.00687003
 0.00733042 0.00698042 0.00724292 0.00684094]

mean value: 0.008026671409606934

key: score_time
value: [0.0158987  0.00824714 0.00798559 0.0079875  0.00846529 0.00800943
 0.00837207 0.00800943 0.00845408 0.00799131]

mean value: 0.008942055702209472

key: test_mcc
value: [-0.19642857  0.47245559  0.63245553 -0.14285714  0.          0.4472136
  0.4472136   0.          0.28867513  0.1490712 ]

mean value: 0.20977989331042105

key: train_mcc
value: [0.35590281 0.40535457 0.36154406 0.34995662 0.43771378 0.36480373
 0.40704579 0.34391797 0.37665889 0.375     ]

mean value: 0.37778982176154485

key: test_accuracy
value: [0.4        0.73333333 0.78571429 0.42857143 0.5        0.71428571
 0.71428571 0.5        0.64285714 0.57142857]

mean value: 0.599047619047619

key: train_accuracy
value: [0.67716535 0.7007874  0.6796875  0.671875   0.71875    0.6796875
 0.703125   0.671875   0.6875     0.6875    ]

mean value: 0.6877952755905512

key: test_fscore
value: [0.4        0.77777778 0.82352941 0.42857143 0.53333333 0.66666667
 0.75       0.53333333 0.66666667 0.5       ]

mean value: 0.6079878618113912

key: train_fscore
value: [0.6962963  0.71641791 0.6962963  0.7        0.71428571 0.70503597
 0.71212121 0.67692308 0.70149254 0.6875    ]

mean value: 0.7006369014906811

key: test_precision
value: [0.375      0.7        0.7        0.42857143 0.5        0.8
 0.66666667 0.5        0.625      0.6       ]

mean value: 0.5895238095238096

key: train_precision
value: [0.66197183 0.67605634 0.66197183 0.64473684 0.72580645 0.65333333
 0.69117647 0.66666667 0.67142857 0.6875    ]

mean value: 0.6740648335734973

key: test_recall
value: [0.42857143 0.875      1.         0.42857143 0.57142857 0.57142857
 0.85714286 0.57142857 0.71428571 0.42857143]

mean value: 0.6446428571428571

key: train_recall
value: [0.734375   0.76190476 0.734375   0.765625   0.703125   0.765625
 0.734375   0.6875     0.734375   0.6875    ]

mean value: 0.7308779761904762

key: test_roc_auc
value: [0.40178571 0.72321429 0.78571429 0.42857143 0.5        0.71428571
 0.71428571 0.5        0.64285714 0.57142857]

mean value: 0.5982142857142857

key: train_roc_auc
value: [0.67671131 0.70126488 0.6796875  0.671875   0.71875    0.6796875
 0.703125   0.671875   0.6875     0.6875    ]

mean value: 0.687797619047619

key: test_jcc
value: [0.25       0.63636364 0.7        0.27272727 0.36363636 0.5
 0.6        0.36363636 0.5        0.33333333]

mean value: 0.45196969696969697

key: train_jcc
value: [0.53409091 0.55813953 0.53409091 0.53846154 0.55555556 0.54444444
 0.55294118 0.51162791 0.54022989 0.52380952]

mean value: 0.5393391383841405

MCC on Blind test: 0.39

Accuracy on Blind test: 0.69

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06879473 0.03695536 0.03799057 0.03917074 0.03711796 0.03794122
 0.04741096 0.0387218  0.04068446 0.03751087]

mean value: 0.042229866981506346

key: score_time
value: [0.00955296 0.00969815 0.0103898  0.0105443  0.01052046 0.01191258
 0.01033878 0.01035762 0.01077509 0.0104599 ]

mean value: 0.010454964637756348

key: test_mcc
value: [0.66143783 1.         1.         0.8660254  0.71428571 1.
 1.         0.57735027 0.8660254  0.71428571]

mean value: 0.8399410333096079

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        1.         1.         0.92857143 0.85714286 1.
 1.         0.78571429 0.92857143 0.85714286]

mean value: 0.9157142857142857

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 1.         1.         0.92307692 0.85714286 1.
 1.         0.76923077 0.93333333 0.85714286]

mean value: 0.9163456151691446

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        1.         1.         1.         0.85714286 1.
 1.         0.83333333 0.875      0.85714286]

mean value: 0.9122619047619047

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.85714286 0.85714286 1.
 1.         0.71428571 1.         0.85714286]

mean value: 0.9285714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     1.         1.         0.92857143 0.85714286 1.
 1.         0.78571429 0.92857143 0.85714286]

mean value: 0.9169642857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        1.         1.         0.85714286 0.75       1.
 1.         0.625      0.875      0.75      ]

mean value: 0.8557142857142856

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.04

Accuracy on Blind test: 0.48

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01021028 0.01137257 0.01115346 0.01160097 0.01162314 0.01166773
 0.01157808 0.01191115 0.01153421 0.01156759]

mean value: 0.011421918869018555

key: score_time
value: [0.01013207 0.01012731 0.01019645 0.01036048 0.01039767 0.01046324
 0.01033401 0.01034856 0.01044607 0.01029301]

mean value: 0.010309886932373048

key: test_mcc
value: [0.66143783 0.46428571 0.8660254  0.74535599 0.63245553 0.57735027
 0.8660254  0.71428571 0.28867513 0.4472136 ]

mean value: 0.6263110587724456

key: train_mcc
value: [0.93745372 0.90550595 0.92198755 0.9375     0.92198755 0.95324137
 0.95417386 0.9379581  0.95324137 0.90669283]

mean value: 0.9329742313821212

key: test_accuracy
value: [0.8        0.73333333 0.92857143 0.85714286 0.78571429 0.78571429
 0.92857143 0.85714286 0.64285714 0.71428571]

mean value: 0.8033333333333333

key: train_accuracy
value: [0.96850394 0.95275591 0.9609375  0.96875    0.9609375  0.9765625
 0.9765625  0.96875    0.9765625  0.953125  ]

mean value: 0.9663447342519685

key: test_fscore
value: [0.82352941 0.75       0.92307692 0.83333333 0.72727273 0.76923077
 0.92307692 0.85714286 0.66666667 0.66666667]

mean value: 0.7939996278231571

key: train_fscore
value: [0.96923077 0.95238095 0.96062992 0.96875    0.96062992 0.97674419
 0.97709924 0.96923077 0.97674419 0.95384615]

mean value: 0.9665286095942575

key: test_precision
value: [0.7        0.75       1.         1.         1.         0.83333333
 1.         0.85714286 0.625      0.8       ]

mean value: 0.856547619047619

key: train_precision
value: [0.95454545 0.95238095 0.96825397 0.96875    0.96825397 0.96923077
 0.95522388 0.95454545 0.96923077 0.93939394]

mean value: 0.9599809156432291

key: test_recall
value: [1.         0.75       0.85714286 0.71428571 0.57142857 0.71428571
 0.85714286 0.85714286 0.71428571 0.57142857]

mean value: 0.7607142857142857

key: train_recall
value: [0.984375   0.95238095 0.953125   0.96875    0.953125   0.984375
 1.         0.984375   0.984375   0.96875   ]

mean value: 0.9733630952380953

key: test_roc_auc
value: [0.8125     0.73214286 0.92857143 0.85714286 0.78571429 0.78571429
 0.92857143 0.85714286 0.64285714 0.71428571]

mean value: 0.8044642857142857

key: train_roc_auc
value: [0.96837798 0.95275298 0.9609375  0.96875    0.9609375  0.9765625
 0.9765625  0.96875    0.9765625  0.953125  ]

mean value: 0.9663318452380952

key: test_jcc
value: [0.7        0.6        0.85714286 0.71428571 0.57142857 0.625
 0.85714286 0.75       0.5        0.5       ]

mean value: 0.6675

key: train_jcc
value: [0.94029851 0.90909091 0.92424242 0.93939394 0.92424242 0.95454545
 0.95522388 0.94029851 0.95454545 0.91176471]

mean value: 0.9353646207465347

MCC on Blind test: 0.02

Accuracy on Blind test: 0.51

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01899028 0.00702119 0.00694108 0.00665832 0.00675297 0.00682497
 0.00661945 0.00672174 0.00678945 0.00670505]

mean value: 0.008002448081970214

key: score_time
value: [0.01023316 0.00807333 0.0078764  0.00824809 0.00777531 0.00779223
 0.0078361  0.00779271 0.00800109 0.00779223]

mean value: 0.00814206600189209

key: test_mcc
value: [ 0.56407607  0.6000992   0.40824829  0.14285714  0.          0.4472136
  0.28867513 -0.1490712   0.         -0.1490712 ]

mean value: 0.21530270393825499

key: train_mcc
value: [0.40417056 0.34191645 0.4113018  0.47245559 0.39067269 0.34646743
 0.29691125 0.43943537 0.4429404  0.39105486]

mean value: 0.39373264045213846

key: test_accuracy
value: [0.73333333 0.8        0.64285714 0.57142857 0.5        0.71428571
 0.64285714 0.42857143 0.5        0.42857143]

mean value: 0.5961904761904762

key: train_accuracy
value: [0.7007874  0.66929134 0.703125   0.734375   0.6953125  0.671875
 0.6484375  0.71875    0.71875    0.6953125 ]

mean value: 0.695601624015748

key: test_fscore
value: [0.77777778 0.82352941 0.73684211 0.57142857 0.58823529 0.66666667
 0.66666667 0.33333333 0.53333333 0.33333333]

mean value: 0.6031146493685193

key: train_fscore
value: [0.72058824 0.68656716 0.72463768 0.75       0.69767442 0.69117647
 0.65116279 0.73134328 0.73913043 0.70229008]

mean value: 0.7094570555223779

key: test_precision
value: [0.63636364 0.77777778 0.58333333 0.57142857 0.5        0.8
 0.625      0.4        0.5        0.4       ]

mean value: 0.579390331890332

key: train_precision
value: [0.68055556 0.64788732 0.67567568 0.70833333 0.69230769 0.65277778
 0.64615385 0.7        0.68918919 0.68656716]

mean value: 0.6779447558115836

key: test_recall
value: [1.         0.875      1.         0.57142857 0.71428571 0.57142857
 0.71428571 0.28571429 0.57142857 0.28571429]

mean value: 0.6589285714285714

key: train_recall
value: [0.765625   0.73015873 0.78125    0.796875   0.703125   0.734375
 0.65625    0.765625   0.796875   0.71875   ]

mean value: 0.744890873015873

key: test_roc_auc
value: [0.75       0.79464286 0.64285714 0.57142857 0.5        0.71428571
 0.64285714 0.42857143 0.5        0.42857143]

mean value: 0.5973214285714286

key: train_roc_auc
value: [0.70027282 0.66976687 0.703125   0.734375   0.6953125  0.671875
 0.6484375  0.71875    0.71875    0.6953125 ]

mean value: 0.6955977182539682

key: test_jcc
value: [0.63636364 0.7        0.58333333 0.4        0.41666667 0.5
 0.5        0.2        0.36363636 0.2       ]

mean value: 0.45

key: train_jcc
value: [0.56321839 0.52272727 0.56818182 0.6        0.53571429 0.52808989
 0.48275862 0.57647059 0.5862069  0.54117647]

mean value: 0.5504544231133333

MCC on Blind test: 0.38

Accuracy on Blind test: 0.69

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00755644 0.00730038 0.00778127 0.00782037 0.00711942 0.00752926
 0.00717807 0.0077436  0.00730443 0.00738955]

mean value: 0.00747227668762207

key: score_time
value: [0.00840449 0.00777626 0.00787759 0.00786304 0.00776672 0.00783229
 0.00777388 0.00788474 0.00788546 0.00791168]

mean value: 0.007897615432739258

key: test_mcc
value: [0.56407607 0.60714286 0.57735027 0.40824829 0.40824829 0.1490712
 1.         0.28867513 0.57735027 0.31622777]

mean value: 0.48963901503792373

key: train_mcc
value: [0.69592496 0.84250992 0.92198755 0.72374686 0.60141677 0.78756153
 0.62554324 0.84375    0.8226036  0.7617394 ]

mean value: 0.7626783838444016

key: test_accuracy
value: [0.73333333 0.8        0.78571429 0.64285714 0.64285714 0.57142857
 1.         0.64285714 0.78571429 0.64285714]

mean value: 0.7247619047619047

key: train_accuracy
value: [0.82677165 0.92125984 0.9609375  0.84375    0.765625   0.8828125
 0.78125    0.921875   0.90625    0.8671875 ]

mean value: 0.8677718996062992

key: test_fscore
value: [0.77777778 0.8        0.76923077 0.44444444 0.44444444 0.625
 1.         0.61538462 0.76923077 0.70588235]

mean value: 0.6951395173453997

key: train_fscore
value: [0.85333333 0.92063492 0.96124031 0.81481481 0.69387755 0.8951049
 0.82051282 0.921875   0.89830508 0.88275862]

mean value: 0.866245735093413

key: test_precision
value: [0.63636364 0.85714286 0.83333333 1.         1.         0.55555556
 1.         0.66666667 0.83333333 0.6       ]

mean value: 0.7982395382395382

key: train_precision
value: [0.74418605 0.92063492 0.95384615 1.         1.         0.81012658
 0.69565217 0.921875   0.98148148 0.79012346]

mean value: 0.8817925815455832

key: test_recall
value: [1.         0.75       0.71428571 0.28571429 0.28571429 0.71428571
 1.         0.57142857 0.71428571 0.85714286]

mean value: 0.6892857142857143

key: train_recall
value: [1.         0.92063492 0.96875    0.6875     0.53125    1.
 1.         0.921875   0.828125   1.        ]

mean value: 0.885813492063492

key: test_roc_auc
value: [0.75       0.80357143 0.78571429 0.64285714 0.64285714 0.57142857
 1.         0.64285714 0.78571429 0.64285714]

mean value: 0.7267857142857143

key: train_roc_auc
value: [0.82539683 0.92125496 0.9609375  0.84375    0.765625   0.8828125
 0.78125    0.921875   0.90625    0.8671875 ]

mean value: 0.8676339285714285

key: test_jcc
value: [0.63636364 0.66666667 0.625      0.28571429 0.28571429 0.45454545
 1.         0.44444444 0.625      0.54545455]

mean value: 0.5568903318903319

key: train_jcc
value: [0.74418605 0.85294118 0.92537313 0.6875     0.53125    0.81012658
 0.69565217 0.85507246 0.81538462 0.79012346]

mean value: 0.7707609649444953

MCC on Blind test: 0.28

Accuracy on Blind test: 0.63

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00977015 0.00938821 0.00748348 0.00760007 0.00707746 0.00716782
 0.00706244 0.00752711 0.00705504 0.00710464]

mean value: 0.0077236413955688475

key: score_time
value: [0.01031303 0.00984597 0.00794601 0.00786161 0.00812817 0.00785327
 0.00777006 0.008039   0.0077517  0.0078032 ]

mean value: 0.008331203460693359

key: test_mcc
value: [0.46770717 0.76376262 0.57735027 0.57735027 0.52223297 0.74535599
 0.8660254  0.40824829 0.57735027 0.4472136 ]

mean value: 0.5952596846856876

key: train_mcc
value: [0.70849191 0.58496906 0.8138413  0.8542422  0.63764677 0.60141677
 0.81409158 0.72374686 0.72932496 0.90669283]

mean value: 0.7374464237869991

key: test_accuracy
value: [0.66666667 0.86666667 0.78571429 0.78571429 0.71428571 0.85714286
 0.92857143 0.64285714 0.78571429 0.71428571]

mean value: 0.7747619047619048

key: train_accuracy
value: [0.83464567 0.75590551 0.8984375  0.921875   0.7890625  0.765625
 0.90625    0.84375    0.8515625  0.953125  ]

mean value: 0.8520238681102362

key: test_fscore
value: [0.73684211 0.85714286 0.8        0.76923077 0.6        0.83333333
 0.93333333 0.73684211 0.76923077 0.75      ]

mean value: 0.7785955272797378

key: train_fscore
value: [0.8590604  0.67368421 0.90780142 0.92753623 0.73267327 0.69387755
 0.90909091 0.86486486 0.82882883 0.95384615]

mean value: 0.8351263838512551

key: test_precision
value: [0.58333333 1.         0.75       0.83333333 1.         1.
 0.875      0.58333333 0.83333333 0.66666667]

mean value: 0.8125

key: train_precision
value: [0.75294118 1.         0.83116883 0.86486486 1.         1.
 0.88235294 0.76190476 0.9787234  0.93939394]

mean value: 0.9011349919234776

key: test_recall
value: [1.         0.75       0.85714286 0.71428571 0.42857143 0.71428571
 1.         1.         0.71428571 0.85714286]

mean value: 0.8035714285714286

key: train_recall
value: [1.         0.50793651 1.         1.         0.578125   0.53125
 0.9375     1.         0.71875    0.96875   ]

mean value: 0.8242311507936508

key: test_roc_auc
value: [0.6875     0.875      0.78571429 0.78571429 0.71428571 0.85714286
 0.92857143 0.64285714 0.78571429 0.71428571]

mean value: 0.7776785714285714

key: train_roc_auc
value: [0.83333333 0.75396825 0.8984375  0.921875   0.7890625  0.765625
 0.90625    0.84375    0.8515625  0.953125  ]

mean value: 0.8516989087301587

key: test_jcc
value: [0.58333333 0.75       0.66666667 0.625      0.42857143 0.71428571
 0.875      0.58333333 0.625      0.6       ]

mean value: 0.6451190476190476

key: train_jcc
value: [0.75294118 0.50793651 0.83116883 0.86486486 0.578125   0.53125
 0.83333333 0.76190476 0.70769231 0.91176471]

mean value: 0.7280981489253548

MCC on Blind test: 0.16

Accuracy on Blind test: 0.57

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.0766685  0.06215739 0.06248546 0.06247211 0.06284809 0.0624218
 0.06243992 0.06250739 0.06269693 0.06271696]

mean value: 0.06394145488739014

key: score_time
value: [0.01422691 0.0138514  0.01398492 0.01401758 0.01408148 0.01418185
 0.01409602 0.01414037 0.01412201 0.01409864]

mean value: 0.014080119132995606

key: test_mcc
value: [0.66143783 1.         0.8660254  1.         0.8660254  0.8660254
 1.         0.28867513 0.71428571 0.57735027]

mean value: 0.7839825157189617

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        1.         0.92857143 1.         0.92857143 0.92857143
 1.         0.64285714 0.85714286 0.78571429]

mean value: 0.8871428571428571

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 1.         0.92307692 1.         0.93333333 0.93333333
 1.         0.66666667 0.85714286 0.76923077]

mean value: 0.8906313294548589

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        1.         1.         1.         0.875      0.875
 1.         0.625      0.85714286 0.83333333]

mean value: 0.876547619047619

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.85714286 1.         1.         1.
 1.         0.71428571 0.85714286 0.71428571]

mean value: 0.9142857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     1.         0.92857143 1.         0.92857143 0.92857143
 1.         0.64285714 0.85714286 0.78571429]

mean value: 0.8883928571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        1.         0.85714286 1.         0.875      0.875
 1.         0.5        0.75       0.625     ]

mean value: 0.8182142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.06

Accuracy on Blind test: 0.47

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.02604771 0.02509189 0.03977823 0.03567004 0.04593778 0.0436461
 0.04469895 0.03649855 0.02204108 0.02417731]

mean value: 0.034358763694763185

key: score_time
value: [0.01960158 0.01544213 0.03531432 0.02577138 0.03630829 0.03665662
 0.03008604 0.02398133 0.01645422 0.02618575]

mean value: 0.026580166816711426

key: test_mcc
value: [0.66143783 0.875      1.         0.8660254  0.4472136  0.71428571
 1.         0.31622777 0.71428571 0.8660254 ]

mean value: 0.7460501425423249

key: train_mcc
value: [1.         0.9689752  0.96922337 1.         0.95324137 1.
 1.         1.         1.         0.98449518]

mean value: 0.9875935124101023

key: test_accuracy
value: [0.8        0.93333333 1.         0.92857143 0.71428571 0.85714286
 1.         0.64285714 0.85714286 0.92857143]

mean value: 0.8661904761904762

key: train_accuracy
value: [1.         0.98425197 0.984375   1.         0.9765625  1.
 1.         1.         1.         0.9921875 ]

mean value: 0.9937376968503937

key: test_fscore
value: [0.82352941 0.93333333 1.         0.92307692 0.66666667 0.85714286
 1.         0.70588235 0.85714286 0.92307692]

mean value: 0.8689851325145442

key: train_fscore
value: [1.         0.98387097 0.98412698 1.         0.97637795 1.
 1.         1.         1.         0.99212598]

mean value: 0.9936501888876793

key: test_precision
value: [0.7        1.         1.         1.         0.8        0.85714286
 1.         0.6        0.85714286 1.        ]

mean value: 0.8814285714285715

key: train_precision
value: [1.         1.         1.         1.         0.98412698 1.
 1.         1.         1.         1.        ]

mean value: 0.9984126984126984

key: test_recall
value: [1.         0.875      1.         0.85714286 0.57142857 0.85714286
 1.         0.85714286 0.85714286 0.85714286]

mean value: 0.8732142857142857

key: train_recall
value: [1.         0.96825397 0.96875    1.         0.96875    1.
 1.         1.         1.         0.984375  ]

mean value: 0.9890128968253968

key: test_roc_auc
value: [0.8125     0.9375     1.         0.92857143 0.71428571 0.85714286
 1.         0.64285714 0.85714286 0.92857143]

mean value: 0.8678571428571429

key: train_roc_auc
value: [1.         0.98412698 0.984375   1.         0.9765625  1.
 1.         1.         1.         0.9921875 ]

mean value: 0.9937251984126985

key: test_jcc
value: [0.7        0.875      1.         0.85714286 0.5        0.75
 1.         0.54545455 0.75       0.85714286]

mean value: 0.7834740259740259

key: train_jcc
value: [1.         0.96825397 0.96875    1.         0.95384615 1.
 1.         1.         1.         0.984375  ]

mean value: 0.9875225122100122

MCC on Blind test: 0.07

Accuracy on Blind test: 0.53

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.029562   0.03599644 0.0356493  0.03583241 0.035743   0.01903772
 0.02019906 0.02188826 0.0147891  0.04324818]

mean value: 0.02919454574584961

key: score_time
value: [0.02010798 0.01939225 0.01914334 0.01897407 0.01091695 0.01093078
 0.01089573 0.01088786 0.01081705 0.01087356]

mean value: 0.014293956756591796

key: test_mcc
value: [ 0.60714286  0.46428571  0.14285714  0.28867513 -0.1490712   0.4472136
  0.28867513  0.14285714  0.          0.        ]

mean value: 0.22326355233324546

key: train_mcc
value: [0.95287698 0.96850198 0.95324137 0.96922337 0.9379581  0.98449518
 0.92198755 0.95417386 0.95417386 0.96875   ]

mean value: 0.9565382271774051

key: test_accuracy
value: [0.8        0.73333333 0.57142857 0.64285714 0.42857143 0.71428571
 0.64285714 0.57142857 0.5        0.5       ]

mean value: 0.6104761904761905

key: train_accuracy
value: [0.97637795 0.98425197 0.9765625  0.984375   0.96875    0.9921875
 0.9609375  0.9765625  0.9765625  0.984375  ]

mean value: 0.9780942421259843

key: test_fscore
value: [0.8        0.75       0.57142857 0.61538462 0.33333333 0.66666667
 0.61538462 0.57142857 0.46153846 0.36363636]

mean value: 0.5748801198801199

key: train_fscore
value: [0.97637795 0.98412698 0.97637795 0.98412698 0.96825397 0.99224806
 0.96062992 0.976      0.976      0.984375  ]

mean value: 0.9778516825295094

key: test_precision
value: [0.75       0.75       0.57142857 0.66666667 0.4        0.8
 0.66666667 0.57142857 0.5        0.5       ]

mean value: 0.6176190476190476

key: train_precision
value: [0.98412698 0.98412698 0.98412698 1.         0.98387097 0.98461538
 0.96825397 1.         1.         0.984375  ]

mean value: 0.987349627299224

key: test_recall
value: [0.85714286 0.75       0.57142857 0.57142857 0.28571429 0.57142857
 0.57142857 0.57142857 0.42857143 0.28571429]

mean value: 0.5464285714285714

key: train_recall
value: [0.96875    0.98412698 0.96875    0.96875    0.953125   1.
 0.953125   0.953125   0.953125   0.984375  ]

mean value: 0.9687251984126984

key: test_roc_auc
value: [0.80357143 0.73214286 0.57142857 0.64285714 0.42857143 0.71428571
 0.64285714 0.57142857 0.5        0.5       ]

mean value: 0.6107142857142857

key: train_roc_auc
value: [0.97643849 0.98425099 0.9765625  0.984375   0.96875    0.9921875
 0.9609375  0.9765625  0.9765625  0.984375  ]

mean value: 0.9781001984126985

key: test_jcc
value: [0.66666667 0.6        0.4        0.44444444 0.2        0.5
 0.44444444 0.4        0.3        0.22222222]

mean value: 0.41777777777777775

key: train_jcc
value: [0.95384615 0.96875    0.95384615 0.96875    0.93846154 0.98461538
 0.92424242 0.953125   0.953125   0.96923077]

mean value: 0.9567992424242424

MCC on Blind test: 0.29

Accuracy on Blind test: 0.64

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.1164434  0.10044718 0.10379887 0.09790683 0.10323715 0.10379982
 0.10525608 0.10259223 0.1029501  0.10186577]

mean value: 0.10382974147796631

key: score_time
value: [0.00966978 0.00841308 0.00894523 0.00920153 0.00936794 0.00911546
 0.00893807 0.00919414 0.00939775 0.00913119]

mean value: 0.009137415885925293

key: test_mcc
value: [0.66143783 1.         1.         0.8660254  0.8660254  0.8660254
 1.         0.4472136  0.8660254  0.42857143]

mean value: 0.8001324466975289

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        1.         1.         0.92857143 0.92857143 0.92857143
 1.         0.71428571 0.92857143 0.71428571]

mean value: 0.8942857142857144

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 1.         1.         0.92307692 0.93333333 0.93333333
 1.         0.75       0.93333333 0.71428571]

mean value: 0.9010892049127344

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        1.         1.         1.         0.875      0.875
 1.         0.66666667 0.875      0.71428571]

mean value: 0.8705952380952381

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.85714286 1.         1.
 1.         0.85714286 1.         0.71428571]

mean value: 0.9428571428571428

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     1.         1.         0.92857143 0.92857143 0.92857143
 1.         0.71428571 0.92857143 0.71428571]

mean value: 0.8955357142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        1.         1.         0.85714286 0.875      0.875
 1.         0.6        0.875      0.55555556]

mean value: 0.8337698412698412

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.51

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00956202 0.01067328 0.01087284 0.01171899 0.0111239  0.01345611
 0.01117897 0.01136255 0.011199   0.01894307]

mean value: 0.012009072303771972

key: score_time
value: [0.01031303 0.01022744 0.01023555 0.01084542 0.01065707 0.01072264
 0.01063037 0.01066041 0.0107913  0.01101661]

mean value: 0.010609984397888184

key: test_mcc
value: [0.66143783 0.18898224 0.40824829 0.52223297 0.4472136  0.28867513
 0.52223297 0.17407766 0.2773501  0.17407766]

mean value: 0.36645284305875925

key: train_mcc
value: [0.70849191 0.59989919 0.64978629 0.57735027 0.76571848 0.73658951
 0.58937969 0.7617394  0.71641857 0.71125407]

mean value: 0.6816627369382001

key: test_accuracy
value: [0.8        0.6        0.64285714 0.71428571 0.71428571 0.64285714
 0.71428571 0.57142857 0.57142857 0.57142857]

mean value: 0.6542857142857142

key: train_accuracy
value: [0.83464567 0.76377953 0.796875   0.75       0.8828125  0.859375
 0.7578125  0.8671875  0.84375    0.8359375 ]

mean value: 0.8192175196850393

key: test_fscore
value: [0.82352941 0.66666667 0.73684211 0.77777778 0.66666667 0.61538462
 0.77777778 0.66666667 0.7        0.66666667]

mean value: 0.7097978354634701

key: train_fscore
value: [0.8590604  0.80769231 0.83116883 0.8        0.88188976 0.87323944
 0.80503145 0.88275862 0.8630137  0.8590604 ]

mean value: 0.8462914910490185

key: test_precision
value: [0.7        0.6        0.58333333 0.63636364 0.8        0.66666667
 0.63636364 0.54545455 0.53846154 0.54545455]

mean value: 0.6252097902097902

key: train_precision
value: [0.75294118 0.67741935 0.71111111 0.66666667 0.88888889 0.79487179
 0.67368421 0.79012346 0.76829268 0.75294118]

mean value: 0.7476940519561616

key: test_recall
value: [1.         0.75       1.         1.         0.57142857 0.57142857
 1.         0.85714286 1.         0.85714286]

mean value: 0.8607142857142857

key: train_recall
value: [1.       1.       1.       1.       0.875    0.96875  1.       1.
 0.984375 1.      ]

mean value: 0.9828125

key: test_roc_auc
value: [0.8125     0.58928571 0.64285714 0.71428571 0.71428571 0.64285714
 0.71428571 0.57142857 0.57142857 0.57142857]

mean value: 0.6544642857142857

key: train_roc_auc
value: [0.83333333 0.765625   0.796875   0.75       0.8828125  0.859375
 0.7578125  0.8671875  0.84375    0.8359375 ]

mean value: 0.8192708333333334

key: test_jcc
value: [0.7        0.5        0.58333333 0.63636364 0.5        0.44444444
 0.63636364 0.5        0.53846154 0.5       ]

mean value: 0.5538966588966588

key: train_jcc
value: [0.75294118 0.67741935 0.71111111 0.66666667 0.78873239 0.775
 0.67368421 0.79012346 0.75903614 0.75294118]

mean value: 0.7347655691818613

MCC on Blind test: 0.27

Accuracy on Blind test: 0.64

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01034141 0.01005673 0.00851965 0.00838447 0.00835919 0.00818896
 0.00820589 0.00816202 0.00818706 0.00815248]

mean value: 0.008655786514282227

key: score_time
value: [0.01045799 0.00901937 0.00879979 0.00867701 0.00855756 0.00852752
 0.00862861 0.00858521 0.00861907 0.00860906]

mean value: 0.008848118782043456

key: test_mcc
value: [0.66143783 0.76376262 0.8660254  0.63245553 0.74535599 0.74535599
 1.         0.42857143 0.74535599 0.42857143]

mean value: 0.7016892214052882

key: train_mcc
value: [0.87447286 0.88988095 0.85947992 0.875      0.87542756 0.90669283
 0.87542756 0.84375    0.89073374 0.89073374]

mean value: 0.8781599163560809

key: test_accuracy
value: [0.8        0.86666667 0.92857143 0.78571429 0.85714286 0.85714286
 1.         0.71428571 0.85714286 0.71428571]

mean value: 0.8380952380952381

key: train_accuracy
value: [0.93700787 0.94488189 0.9296875  0.9375     0.9375     0.953125
 0.9375     0.921875   0.9453125  0.9453125 ]

mean value: 0.9389702263779528

key: test_fscore
value: [0.82352941 0.85714286 0.92307692 0.72727273 0.83333333 0.83333333
 1.         0.71428571 0.875      0.71428571]

mean value: 0.8301260014495309

key: train_fscore
value: [0.93650794 0.94488189 0.92913386 0.9375     0.93846154 0.95384615
 0.93846154 0.921875   0.94573643 0.94573643]

mean value: 0.9392140783525718

key: test_precision
value: [0.7        1.         1.         1.         1.         1.
 1.         0.71428571 0.77777778 0.71428571]

mean value: 0.8906349206349207

key: train_precision
value: [0.9516129  0.9375     0.93650794 0.9375     0.92424242 0.93939394
 0.92424242 0.921875   0.93846154 0.93846154]

mean value: 0.9349797704535607

key: test_recall
value: [1.         0.75       0.85714286 0.57142857 0.71428571 0.71428571
 1.         0.71428571 1.         0.71428571]

mean value: 0.8035714285714286

key: train_recall
value: [0.921875   0.95238095 0.921875   0.9375     0.953125   0.96875
 0.953125   0.921875   0.953125   0.953125  ]

mean value: 0.9436755952380952

key: test_roc_auc
value: [0.8125     0.875      0.92857143 0.78571429 0.85714286 0.85714286
 1.         0.71428571 0.85714286 0.71428571]

mean value: 0.8401785714285714

key: train_roc_auc
value: [0.93712798 0.94494048 0.9296875  0.9375     0.9375     0.953125
 0.9375     0.921875   0.9453125  0.9453125 ]

mean value: 0.9389880952380952

key: test_jcc
value: [0.7        0.75       0.85714286 0.57142857 0.71428571 0.71428571
 1.         0.55555556 0.77777778 0.55555556]

mean value: 0.7196031746031746

key: train_jcc
value: [0.88059701 0.89552239 0.86764706 0.88235294 0.88405797 0.91176471
 0.88405797 0.85507246 0.89705882 0.89705882]

mean value: 0.8855190161723353

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.07449079 0.06415701 0.06411934 0.06436348 0.06481528 0.06425691
 0.06450534 0.06399703 0.0644269  0.06456614]

mean value: 0.0653698205947876

key: score_time
value: [0.00915241 0.00884914 0.0087738  0.00880289 0.00888062 0.00877213
 0.00884771 0.00880075 0.00886726 0.00878549]

mean value: 0.00885322093963623

key: test_mcc
value: [0.66143783 0.76376262 0.8660254  0.63245553 0.63245553 0.74535599
 1.         0.42857143 0.74535599 0.42857143]

mean value: 0.6903991753586628

key: train_mcc
value: [0.87447286 0.88988095 0.85947992 0.875      0.92198755 0.95417386
 0.87542756 0.84375    0.89073374 0.89073374]

mean value: 0.887564019240932

key: test_accuracy
value: [0.8        0.86666667 0.92857143 0.78571429 0.78571429 0.85714286
 1.         0.71428571 0.85714286 0.71428571]

mean value: 0.830952380952381

key: train_accuracy
value: [0.93700787 0.94488189 0.9296875  0.9375     0.9609375  0.9765625
 0.9375     0.921875   0.9453125  0.9453125 ]

mean value: 0.9436577263779528

key: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
test_fscore
value: [0.82352941 0.85714286 0.92307692 0.72727273 0.72727273 0.83333333
 1.         0.71428571 0.875      0.71428571]

mean value: 0.8195199408434702

key: train_fscore
value: [0.93650794 0.94488189 0.92913386 0.9375     0.96124031 0.97709924
 0.93846154 0.921875   0.94573643 0.94573643]

mean value: 0.9438172637936766

key: test_precision
value: [0.7        1.         1.         1.         1.         1.
 1.         0.71428571 0.77777778 0.71428571]

mean value: 0.8906349206349207

key: train_precision
value: [0.9516129  0.9375     0.93650794 0.9375     0.95384615 0.95522388
 0.92424242 0.921875   0.93846154 0.93846154]

mean value: 0.9395231375342413

key: test_recall
value: [1.         0.75       0.85714286 0.57142857 0.57142857 0.71428571
 1.         0.71428571 1.         0.71428571]

mean value: 0.7892857142857143

key: train_recall
value: [0.921875   0.95238095 0.921875   0.9375     0.96875    1.
 0.953125   0.921875   0.953125   0.953125  ]

mean value: 0.9483630952380953

key: test_roc_auc
value: [0.8125     0.875      0.92857143 0.78571429 0.78571429 0.85714286
 1.         0.71428571 0.85714286 0.71428571]

mean value: 0.8330357142857143

key: train_roc_auc
value: [0.93712798 0.94494048 0.9296875  0.9375     0.9609375  0.9765625
 0.9375     0.921875   0.9453125  0.9453125 ]

mean value: 0.9436755952380952

key: test_jcc
value: [0.7        0.75       0.85714286 0.57142857 0.57142857 0.71428571
 1.         0.55555556 0.77777778 0.55555556]

mean value: 0.7053174603174603

key: train_jcc
value: [0.88059701 0.89552239 0.86764706 0.88235294 0.92537313 0.95522388
 0.88405797 0.85507246 0.89705882 0.89705882]

mean value: 0.893996449975188

MCC on Blind test: 0.08

Accuracy on Blind test: 0.54

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02492595 0.01881099 0.02069116 0.02081108 0.0185163  0.02902484
 0.02047849 0.03224087 0.03128767 0.02006626]

mean value: 0.023685359954833986

key: score_time
value: [0.0105257  0.0105207  0.01081491 0.01050901 0.01055932 0.01063681
 0.01051402 0.01091695 0.01072741 0.010885  ]

mean value: 0.010660982131958008

key: test_mcc
value: [0.48075018 0.56818182 0.56490196 0.47727273 0.91605722 0.74242424
 0.83743579 0.82575758 0.91287093 0.54772256]

mean value: 0.6873374995187688

key: train_mcc
value: [0.82452636 0.74645342 0.7859188  0.7954287  0.73693234 0.78548989
 0.75613935 0.78536075 0.76756932 0.79615403]

mean value: 0.777997295379433

key: test_accuracy
value: [0.73913043 0.7826087  0.7826087  0.73913043 0.95652174 0.86956522
 0.91304348 0.91304348 0.95454545 0.77272727]

mean value: 0.8422924901185771

key: train_accuracy
value: [0.91219512 0.87317073 0.89268293 0.89756098 0.86829268 0.89268293
 0.87804878 0.89268293 0.88349515 0.89805825]

mean value: 0.8888870471228985

key: test_fscore
value: [0.7        0.7826087  0.76190476 0.72727273 0.96       0.86956522
 0.92307692 0.91666667 0.95652174 0.76190476]

mean value: 0.8359521492999754

key: train_fscore
value: [0.91346154 0.875      0.8952381  0.89952153 0.86956522 0.89108911
 0.87804878 0.89215686 0.88571429 0.89855072]

mean value: 0.8898346144687178

key: test_precision
value: [0.77777778 0.75       0.8        0.72727273 0.92307692 0.90909091
 0.85714286 0.91666667 0.91666667 0.8       ]

mean value: 0.8377694527694528

key: train_precision
value: [0.9047619  0.86666667 0.87850467 0.88679245 0.85714286 0.9
 0.87378641 0.89215686 0.86915888 0.89423077]

mean value: 0.8823201472546344

key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.72727273 1.         0.83333333
 1.         0.91666667 1.         0.72727273]

mean value: 0.8386363636363636

key: train_recall
value: [0.9223301  0.88349515 0.91262136 0.91262136 0.88235294 0.88235294
 0.88235294 0.89215686 0.90291262 0.90291262]

mean value: 0.8976108890158006

key: test_roc_auc
value: [0.73484848 0.78409091 0.78030303 0.73863636 0.95454545 0.87121212
 0.90909091 0.91287879 0.95454545 0.77272727]

mean value: 0.8412878787878788

key: train_roc_auc
value: [0.91214544 0.87312012 0.89258519 0.89748715 0.86836094 0.89263278
 0.87806967 0.89268037 0.88349515 0.89805825]

mean value: 0.8888635065676757

key: test_jcc
value: [0.53846154 0.64285714 0.61538462 0.57142857 0.92307692 0.76923077
 0.85714286 0.84615385 0.91666667 0.61538462]

mean value: 0.7295787545787545

key: train_jcc
value: [0.84070796 0.77777778 0.81034483 0.8173913  0.76923077 0.80357143
 0.7826087  0.80530973 0.79487179 0.81578947]

mean value: 0.8017603770837232

MCC on Blind test: 0.35

Accuracy on Blind test: 0.67

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.64007044 0.59848499 0.60545135 0.81422853 0.62158179 0.65589404
 0.78231335 0.678689   0.63352537 0.76259565]

mean value: 0.6792834520339965

key: score_time
value: [0.01374793 0.01383209 0.01083827 0.01381516 0.01120448 0.01409101
 0.01422381 0.01409006 0.01428652 0.01422524]

mean value: 0.013435459136962891

key: test_mcc
value: [0.65909298 0.74242424 0.74047959 0.56818182 0.83971912 0.82575758
 0.65151515 0.74242424 0.68313005 0.73029674]

mean value: 0.7183021519902297

key: train_mcc
value: [0.90310636 1.         0.88308106 1.         0.88292404 0.88361919
 0.86358877 0.94146202 0.99033794 0.89358299]

mean value: 0.9241702374683562

key: test_accuracy
value: [0.82608696 0.86956522 0.86956522 0.7826087  0.91304348 0.91304348
 0.82608696 0.86956522 0.81818182 0.86363636]

mean value: 0.8551383399209486

key: train_accuracy
value: [0.95121951 1.         0.94146341 1.         0.94146341 0.94146341
 0.93170732 0.97073171 0.99514563 0.94660194]

mean value: 0.9619796353303338

key: test_fscore
value: [0.8        0.86956522 0.85714286 0.7826087  0.90909091 0.91666667
 0.83333333 0.86956522 0.77777778 0.86956522]

mean value: 0.848531589183763

key: train_fscore
value: [0.95238095 1.         0.94230769 1.         0.94117647 0.94230769
 0.93203883 0.97058824 0.99512195 0.94736842]

mean value: 0.962329025010229

key: test_precision
value: [0.88888889 0.83333333 0.9        0.75       1.         0.91666667
 0.83333333 0.90909091 1.         0.83333333]

mean value: 0.8864646464646465

key: train_precision
value: [0.93457944 1.         0.93333333 1.         0.94117647 0.9245283
 0.92307692 0.97058824 1.         0.93396226]

mean value: 0.9561244967582682

key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.81818182 0.83333333 0.91666667
 0.83333333 0.83333333 0.63636364 0.90909091]

mean value: 0.8234848484848485

key: train_recall
value: [0.97087379 1.         0.95145631 1.         0.94117647 0.96078431
 0.94117647 0.97058824 0.99029126 0.96116505]

mean value: 0.9687511897963069

key: test_roc_auc
value: [0.8219697  0.87121212 0.86742424 0.78409091 0.91666667 0.91287879
 0.82575758 0.87121212 0.81818182 0.86363636]

mean value: 0.8553030303030303

key: train_roc_auc
value: [0.95112317 1.         0.94141443 1.         0.94146202 0.94155721
 0.93175328 0.97073101 0.99514563 0.94660194]

mean value: 0.96197886921759

key: test_jcc
value: [0.66666667 0.76923077 0.75       0.64285714 0.83333333 0.84615385
 0.71428571 0.76923077 0.63636364 0.76923077]

mean value: 0.7397352647352647

key: train_jcc
value: [0.90909091 1.         0.89090909 1.         0.88888889 0.89090909
 0.87272727 0.94285714 0.99029126 0.9       ]

mean value: 0.9285673657518317

MCC on Blind test: 0.2

Accuracy on Blind test: 0.59

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00964332 0.00924993 0.00782824 0.00768685 0.00750494 0.00749516
 0.007514   0.00766039 0.00757623 0.00769448]

mean value: 0.007985353469848633

key: score_time
value: [0.01074767 0.00942302 0.00893474 0.0085392  0.00857997 0.00853729
 0.00858474 0.00858116 0.0085609  0.00856519]

mean value: 0.008905386924743653

key: test_mcc
value: [0.44411739 0.50460839 0.2096648  0.23262105 0.40451992 0.65909298
 0.47923384 0.62050523 0.39735971 0.20412415]

mean value: 0.4155847461790167

key: train_mcc
value: [0.39137259 0.44043936 0.45968386 0.4798642  0.4267072  0.43504485
 0.45392287 0.42888555 0.44151079 0.46358632]

mean value: 0.4421017579498864

key: test_accuracy
value: [0.69565217 0.69565217 0.56521739 0.60869565 0.65217391 0.82608696
 0.69565217 0.7826087  0.63636364 0.59090909]

mean value: 0.674901185770751

key: train_accuracy
value: [0.63414634 0.68780488 0.70243902 0.70731707 0.67804878 0.68292683
 0.69756098 0.68292683 0.69417476 0.7038835 ]

mean value: 0.6871228984134502

key: test_fscore
value: [0.74074074 0.75862069 0.66666667 0.64       0.75       0.84615385
 0.77419355 0.82758621 0.73333333 0.66666667]

mean value: 0.7403961698500074

key: train_fscore
value: [0.73309609 0.75384615 0.76078431 0.76744186 0.74615385 0.74903475
 0.75590551 0.74708171 0.75294118 0.76078431]

mean value: 0.7527069722703967

key: test_precision
value: [0.625      0.61111111 0.52631579 0.57142857 0.6        0.78571429
 0.63157895 0.70588235 0.57894737 0.5625    ]

mean value: 0.6198478426458303

key: train_precision
value: [0.57865169 0.62420382 0.63815789 0.63870968 0.61392405 0.61783439
 0.63157895 0.61935484 0.63157895 0.63815789]

mean value: 0.6232152152926238

key: test_recall
value: [0.90909091 1.         0.90909091 0.72727273 1.         0.91666667
 1.         1.         1.         0.81818182]

mean value: 0.928030303030303

key: train_recall
value: [1.         0.95145631 0.94174757 0.96116505 0.95098039 0.95098039
 0.94117647 0.94117647 0.93203883 0.94174757]

mean value: 0.9512469065296021

key: test_roc_auc
value: [0.70454545 0.70833333 0.57954545 0.61363636 0.63636364 0.8219697
 0.68181818 0.77272727 0.63636364 0.59090909]

mean value: 0.6746212121212122

key: train_roc_auc
value: [0.63235294 0.68651247 0.70126594 0.70607272 0.67937369 0.68422806
 0.69874358 0.68418047 0.69417476 0.7038835 ]

mean value: 0.6870788121073672

key: test_jcc
value: [0.58823529 0.61111111 0.5        0.47058824 0.6        0.73333333
 0.63157895 0.70588235 0.57894737 0.5       ]

mean value: 0.591967664258686

key: train_jcc
value: [0.57865169 0.60493827 0.61392405 0.62264151 0.59509202 0.59876543
 0.60759494 0.59627329 0.60377358 0.61392405]

mean value: 0.6035578837876612

MCC on Blind test: 0.48

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00798321 0.00766301 0.00776434 0.00779152 0.0078249  0.00776362
 0.0078249  0.00781822 0.00786686 0.00770688]

mean value: 0.007800745964050293

key: score_time
value: [0.00858808 0.00862932 0.00854731 0.00855184 0.00859928 0.00858569
 0.00869465 0.0087173  0.0086596  0.00857878]

mean value: 0.00861518383026123

key: test_mcc
value: [ 0.3030303   0.15096491 -0.03816905  0.3030303   0.39727608  0.56818182
  0.39727608  0.31252706  0.29277002  0.09245003]

mean value: 0.27793375505710965

key: train_mcc
value: [0.37046449 0.38910743 0.39476736 0.38236392 0.38354703 0.35891522
 0.35302365 0.36367161 0.37290762 0.39345795]

mean value: 0.37622262820367175

key: test_accuracy
value: [0.65217391 0.56521739 0.47826087 0.65217391 0.69565217 0.7826087
 0.69565217 0.65217391 0.63636364 0.54545455]

mean value: 0.6355731225296443

key: train_accuracy
value: [0.68292683 0.69268293 0.69268293 0.68780488 0.68780488 0.67804878
 0.67317073 0.67804878 0.68446602 0.68932039]

mean value: 0.6846957139474308

key: test_fscore
value: [0.63636364 0.61538462 0.5        0.63636364 0.74074074 0.7826087
 0.74074074 0.71428571 0.69230769 0.58333333]

mean value: 0.6642128805172284

key: train_fscore
value: [0.70852018 0.71493213 0.72489083 0.71681416 0.71428571 0.69444444
 0.69955157 0.70535714 0.70588235 0.72649573]

mean value: 0.7111174245586319

key: test_precision
value: [0.63636364 0.53333333 0.46153846 0.63636364 0.66666667 0.81818182
 0.66666667 0.625      0.6        0.53846154]

mean value: 0.6182575757575758

key: train_precision
value: [0.65833333 0.66949153 0.65873016 0.65853659 0.6557377  0.65789474
 0.6446281  0.64754098 0.66101695 0.64885496]

mean value: 0.6560765038377927

key: test_recall
value: [0.63636364 0.72727273 0.54545455 0.63636364 0.83333333 0.75
 0.83333333 0.83333333 0.81818182 0.63636364]

mean value: 0.725

key: train_recall
value: [0.76699029 0.76699029 0.80582524 0.78640777 0.78431373 0.73529412
 0.76470588 0.7745098  0.75728155 0.82524272]

mean value: 0.7767561393489435

key: test_roc_auc
value: [0.65151515 0.5719697  0.48106061 0.65151515 0.68939394 0.78409091
 0.68939394 0.64393939 0.63636364 0.54545455]

mean value: 0.634469696969697

key: train_roc_auc
value: [0.68251475 0.69231868 0.69212831 0.68732153 0.68827337 0.67832667
 0.67361508 0.67851704 0.68446602 0.68932039]

mean value: 0.6846801827527127

key: test_jcc
value: [0.46666667 0.44444444 0.33333333 0.46666667 0.58823529 0.64285714
 0.58823529 0.55555556 0.52941176 0.41176471]

mean value: 0.5027170868347339

key: train_jcc
value: [0.54861111 0.55633803 0.56849315 0.55862069 0.55555556 0.53191489
 0.53793103 0.54482759 0.54545455 0.5704698 ]

mean value: 0.5518216393594725

MCC on Blind test: 0.47

Accuracy on Blind test: 0.73

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00768375 0.00718713 0.0075891  0.00747037 0.00748372 0.00737071
 0.00746584 0.00747824 0.00745106 0.00744772]

mean value: 0.007462763786315918

key: score_time
value: [0.00988078 0.00973558 0.00988078 0.00987172 0.01486492 0.00991106
 0.00977206 0.00987005 0.00975847 0.00983   ]

mean value: 0.010337543487548829

key: test_mcc
value: [-0.05427825  0.13740858  0.56818182  0.56490196  0.31252706  0.58930667
  0.31298622  0.74242424  0.2773501   0.09245003]

mean value: 0.35432584250263544

key: train_mcc
value: [0.66217798 0.6392382  0.62934402 0.59038553 0.67133261 0.6310448
 0.68889027 0.65854355 0.59234469 0.68222103]

mean value: 0.6445522695261332

key: test_accuracy
value: [0.47826087 0.56521739 0.7826087  0.7826087  0.65217391 0.7826087
 0.65217391 0.86956522 0.63636364 0.54545455]

mean value: 0.674703557312253

key: train_accuracy
value: [0.82926829 0.8195122  0.81463415 0.79512195 0.83414634 0.81463415
 0.84390244 0.82926829 0.7961165  0.83980583]

mean value: 0.8216410134975136

key: test_fscore
value: [0.4        0.58333333 0.7826087  0.76190476 0.71428571 0.76190476
 0.63636364 0.86956522 0.6        0.5       ]

mean value: 0.6609966120835686

key: train_fscore
value: [0.83870968 0.82296651 0.81730769 0.79411765 0.82474227 0.80612245
 0.83838384 0.82758621 0.79411765 0.84651163]

mean value: 0.8210565561229923

key: test_precision
value: [0.44444444 0.53846154 0.75       0.8        0.625      0.88888889
 0.7        0.90909091 0.66666667 0.55555556]

mean value: 0.6878108003108003

key: train_precision
value: [0.79824561 0.81132075 0.80952381 0.8019802  0.86956522 0.84042553
 0.86458333 0.83168317 0.8019802  0.8125    ]

mean value: 0.8241807825271845

key: test_recall
value: [0.36363636 0.63636364 0.81818182 0.72727273 0.83333333 0.66666667
 0.58333333 0.83333333 0.54545455 0.45454545]

mean value: 0.6462121212121212

key: train_recall
value: [0.88349515 0.83495146 0.82524272 0.78640777 0.78431373 0.7745098
 0.81372549 0.82352941 0.78640777 0.88349515]

mean value: 0.8196078431372549

key: test_roc_auc
value: [0.47348485 0.56818182 0.78409091 0.78030303 0.64393939 0.78787879
 0.65530303 0.87121212 0.63636364 0.54545455]

mean value: 0.6746212121212121

key: train_roc_auc
value: [0.82900247 0.81943651 0.81458214 0.79516467 0.83390444 0.81443937
 0.84375595 0.82924043 0.7961165  0.83980583]

mean value: 0.8215448315248429

key: test_jcc
value: [0.25       0.41176471 0.64285714 0.61538462 0.55555556 0.61538462
 0.46666667 0.76923077 0.42857143 0.33333333]

mean value: 0.508874883286648

key: train_jcc
value: [0.72222222 0.69918699 0.69105691 0.65853659 0.70175439 0.67521368
 0.72173913 0.70588235 0.65853659 0.73387097]

mean value: 0.6967999807689436

MCC on Blind test: 0.3

Accuracy on Blind test: 0.65

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01029658 0.01014757 0.01019192 0.01032376 0.01028013 0.01028109
 0.01010323 0.01022696 0.01013231 0.01031971]

mean value: 0.010230326652526855

key: score_time
value: [0.00922775 0.00909877 0.0091269  0.00924611 0.00912356 0.00912118
 0.00901008 0.00927448 0.00904131 0.00910091]

mean value: 0.009137105941772462

key: test_mcc
value: [0.56490196 0.65151515 0.38932432 0.38932432 0.42228828 0.66414149
 0.65909298 0.82575758 0.64715023 0.46225016]

mean value: 0.5675746466667577

key: train_mcc
value: [0.80487341 0.72698715 0.75693529 0.76584809 0.67808871 0.71711403
 0.70747264 0.72814868 0.73789886 0.74884444]

mean value: 0.7372211285541519

key: test_accuracy
value: [0.7826087  0.82608696 0.69565217 0.69565217 0.69565217 0.82608696
 0.82608696 0.91304348 0.81818182 0.72727273]

mean value: 0.7806324110671937

key: train_accuracy
value: [0.90243902 0.86341463 0.87804878 0.88292683 0.83902439 0.85853659
 0.85365854 0.86341463 0.86893204 0.87378641]

mean value: 0.8684181861236088

key: test_fscore
value: [0.76190476 0.81818182 0.66666667 0.66666667 0.75862069 0.81818182
 0.84615385 0.91666667 0.8        0.7       ]

mean value: 0.7753042934077417

key: train_fscore
value: [0.90291262 0.8627451  0.88151659 0.88349515 0.83902439 0.85853659
 0.85436893 0.86666667 0.86956522 0.87735849]

mean value: 0.8696189734979832

key: test_precision
value: [0.8        0.81818182 0.7        0.7        0.64705882 0.9
 0.78571429 0.91666667 0.88888889 0.77777778]

mean value: 0.7934288260758849

key: train_precision
value: [0.90291262 0.87128713 0.86111111 0.88349515 0.83495146 0.85436893
 0.84615385 0.84259259 0.86538462 0.85321101]

mean value: 0.8615468458469154

key: test_recall
value: [0.72727273 0.81818182 0.63636364 0.63636364 0.91666667 0.75
 0.91666667 0.91666667 0.72727273 0.63636364]

mean value: 0.7681818181818182

key: train_recall
value: [0.90291262 0.85436893 0.90291262 0.88349515 0.84313725 0.8627451
 0.8627451  0.89215686 0.87378641 0.90291262]

mean value: 0.8781172663240053

key: test_roc_auc
value: [0.78030303 0.82575758 0.69318182 0.69318182 0.68560606 0.82954545
 0.8219697  0.91287879 0.81818182 0.72727273]

mean value: 0.7787878787878787

key: train_roc_auc
value: [0.9024367  0.86345898 0.8779269  0.88292404 0.83904436 0.85855702
 0.85370265 0.86355416 0.86893204 0.87378641]

mean value: 0.8684323243860652

key: test_jcc
value: [0.61538462 0.69230769 0.5        0.5        0.61111111 0.69230769
 0.73333333 0.84615385 0.66666667 0.53846154]

mean value: 0.6395726495726496

key: train_jcc
value: [0.82300885 0.75862069 0.78813559 0.79130435 0.72268908 0.75213675
 0.74576271 0.76470588 0.76923077 0.78151261]

mean value: 0.7697107276516258

MCC on Blind test: 0.45

Accuracy on Blind test: 0.72

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.3924427  0.61908412 0.74449182 0.91650462 0.63582444 0.65240908
 0.77556181 0.61309528 0.7518034  0.7737174 ]

mean value: 0.6874934673309326

key: score_time
value: [0.0111208  0.01100135 0.01523304 0.01314092 0.01097083 0.01089597
 0.01093626 0.01093769 0.01482415 0.01095867]

mean value: 0.012001967430114746

key: test_mcc
value: [0.31252706 0.58930667 0.69084928 0.56818182 0.65909298 0.74047959
 0.83743579 0.82575758 0.81818182 0.46225016]

mean value: 0.6504062738835646

key: train_mcc
value: [0.7606076  0.7863314  0.86610349 0.91330072 0.82498132 0.87660499
 0.79068188 0.88440807 0.88499797 0.91300871]

mean value: 0.8501026166060419

key: test_accuracy
value: [0.65217391 0.7826087  0.82608696 0.7826087  0.82608696 0.86956522
 0.91304348 0.91304348 0.90909091 0.72727273]

mean value: 0.8201581027667985

key: train_accuracy
value: [0.87804878 0.88780488 0.93170732 0.95609756 0.91219512 0.93658537
 0.89268293 0.94146341 0.94174757 0.95631068]

mean value: 0.9234643618280843

key: test_fscore
value: [0.55555556 0.8        0.77777778 0.7826087  0.84615385 0.88
 0.92307692 0.91666667 0.90909091 0.7       ]

mean value: 0.8090930373973853

key: train_fscore
value: [0.87179487 0.89686099 0.92929293 0.95522388 0.91       0.93896714
 0.88541667 0.93939394 0.94339623 0.9569378 ]

mean value: 0.9227284435900899

key: test_precision
value: [0.71428571 0.71428571 1.         0.75       0.78571429 0.84615385
 0.85714286 0.91666667 0.90909091 0.77777778]

mean value: 0.8271117771117771

key: train_precision
value: [0.92391304 0.83333333 0.96842105 0.97959184 0.92857143 0.9009009
 0.94444444 0.96875    0.91743119 0.94339623]

mean value: 0.9308753459170286

key: test_recall
value: [0.45454545 0.90909091 0.63636364 0.81818182 0.91666667 0.91666667
 1.         0.91666667 0.90909091 0.63636364]

mean value: 0.8113636363636364

key: train_recall
value: [0.82524272 0.97087379 0.89320388 0.93203883 0.89215686 0.98039216
 0.83333333 0.91176471 0.97087379 0.97087379]

mean value: 0.9180753854940035

key: test_roc_auc
value: [0.64393939 0.78787879 0.81818182 0.78409091 0.8219697  0.86742424
 0.90909091 0.91287879 0.90909091 0.72727273]

mean value: 0.8181818181818181

key: train_roc_auc
value: [0.87830763 0.88739768 0.93189606 0.9562155  0.91209785 0.93679802
 0.89239482 0.94131925 0.94174757 0.95631068]

mean value: 0.9234485056158386

key: test_jcc
value: [0.38461538 0.66666667 0.63636364 0.64285714 0.73333333 0.78571429
 0.85714286 0.84615385 0.83333333 0.53846154]

mean value: 0.6924642024642025

key: train_jcc
value: [0.77272727 0.81300813 0.86792453 0.91428571 0.83486239 0.88495575
 0.79439252 0.88571429 0.89285714 0.91743119]

mean value: 0.8578158927526129

MCC on Blind test: 0.28

Accuracy on Blind test: 0.64

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01156855 0.00892901 0.00856614 0.0077455  0.00787854 0.00857687
 0.00829124 0.00852704 0.00854969 0.00859594]

mean value: 0.008722853660583497

key: score_time
value: [0.01051092 0.00886154 0.00785089 0.00785041 0.008111   0.00851727
 0.00845528 0.00843167 0.0078907  0.00845909]

mean value: 0.008493876457214356

key: test_mcc
value: [0.74242424 1.         0.91605722 0.66414149 0.83971912 0.74242424
 0.74047959 0.74242424 0.83205029 0.73029674]

mean value: 0.7950017190609769

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 1.         0.95652174 0.82608696 0.91304348 0.86956522
 0.86956522 0.86956522 0.90909091 0.86363636]

mean value: 0.8946640316205533

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.86956522 1.         0.95238095 0.83333333 0.90909091 0.86956522
 0.88       0.86956522 0.9        0.85714286]

mean value: 0.8940643704121964

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.83333333 1.         1.         0.76923077 1.         0.90909091
 0.84615385 0.90909091 1.         0.9       ]

mean value: 0.9166899766899766

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         0.90909091 0.90909091 0.83333333 0.83333333
 0.91666667 0.83333333 0.81818182 0.81818182]

mean value: 0.878030303030303

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.87121212 1.         0.95454545 0.82954545 0.91666667 0.87121212
 0.86742424 0.87121212 0.90909091 0.86363636]

mean value: 0.8954545454545455

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.76923077 1.         0.90909091 0.71428571 0.83333333 0.76923077
 0.78571429 0.76923077 0.81818182 0.75      ]

mean value: 0.8118298368298369

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.51

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08520198 0.08628082 0.08537316 0.08609509 0.08660245 0.08801889
 0.08706832 0.08594346 0.08755255 0.08668995]

mean value: 0.08648266792297363

key: score_time
value: [0.01721334 0.01664853 0.01649141 0.0179987  0.01672387 0.01804256
 0.016927   0.01826859 0.01752901 0.01691222]

mean value: 0.017275524139404298

key: test_mcc
value: [0.56490196 1.         0.83743579 0.6992059  0.69084928 0.82575758
 0.83743579 0.91605722 1.         0.81818182]

mean value: 0.8189825331284214

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.7826087  1.         0.91304348 0.82608696 0.82608696 0.91304348
 0.91304348 0.95652174 1.         0.90909091]

mean value: 0.9039525691699605

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.76190476 1.         0.9        0.84615385 0.85714286 0.91666667
 0.92307692 0.96       1.         0.90909091]

mean value: 0.9074035964035964

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        1.         1.         0.73333333 0.75       0.91666667
 0.85714286 0.92307692 1.         0.90909091]

mean value: 0.8889310689310689

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 1.         0.81818182 1.         1.         0.91666667
 1.         1.         1.         0.90909091]

mean value: 0.9371212121212121

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78030303 1.         0.90909091 0.83333333 0.81818182 0.91287879
 0.90909091 0.95454545 1.         0.90909091]

mean value: 0.9026515151515151

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.61538462 1.         0.81818182 0.73333333 0.75       0.84615385
 0.85714286 0.92307692 1.         0.83333333]

mean value: 0.8376606726606727

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.33

Accuracy on Blind test: 0.63

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0074172  0.00699353 0.00702    0.0070374  0.00713778 0.00709534
 0.00710583 0.00709867 0.00728655 0.00706387]

mean value: 0.0071256160736083984

key: score_time
value: [0.00813842 0.00795412 0.00787568 0.00795722 0.00794506 0.00791264
 0.0079174  0.00833249 0.00816393 0.00792503]

mean value: 0.008012199401855468

key: test_mcc
value: [0.48075018 0.39727608 0.65909298 0.48856385 0.56490196 0.82575758
 0.56818182 0.56818182 0.20412415 0.54772256]

mean value: 0.5304552960001759

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.69565217 0.82608696 0.73913043 0.7826087  0.91304348
 0.7826087  0.7826087  0.59090909 0.77272727]

mean value: 0.7624505928853755

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7        0.63157895 0.8        0.75       0.8        0.91666667
 0.7826087  0.7826087  0.47058824 0.76190476]

mean value: 0.7395956002538315

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.75       0.88888889 0.69230769 0.76923077 0.91666667
 0.81818182 0.81818182 0.66666667 0.8       ]

mean value: 0.7897902097902098

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.54545455 0.72727273 0.81818182 0.83333333 0.91666667
 0.75       0.75       0.36363636 0.72727273]

mean value: 0.7068181818181818

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73484848 0.68939394 0.8219697  0.74242424 0.78030303 0.91287879
 0.78409091 0.78409091 0.59090909 0.77272727]

mean value: 0.7613636363636364

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.53846154 0.46153846 0.66666667 0.6        0.66666667 0.84615385
 0.64285714 0.64285714 0.30769231 0.61538462]

mean value: 0.5988278388278389

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.15525389 1.24308133 1.07740855 1.09325242 1.08529234 1.07614017
 1.08734107 1.08477783 1.07814384 1.07799172]

mean value: 1.1058683156967164

key: score_time
value: [0.09643054 0.09530497 0.09550691 0.09580946 0.09401822 0.09396243
 0.09032536 0.09210563 0.08807588 0.09230471]

mean value: 0.09338440895080566

key: test_mcc
value: [0.56490196 1.         0.91605722 0.76764947 0.91666667 0.82575758
 0.83743579 0.91605722 1.         0.91287093]

mean value: 0.8657396839505284

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.7826087  1.         0.95652174 0.86956522 0.95652174 0.91304348
 0.91304348 0.95652174 1.         0.95454545]

mean value: 0.9302371541501976

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.76190476 1.         0.95238095 0.88       0.95652174 0.91666667
 0.92307692 0.96       1.         0.95652174]

mean value: 0.9307072782290173

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        1.         1.         0.78571429 1.         0.91666667
 0.85714286 0.92307692 1.         0.91666667]

mean value: 0.9199267399267399

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 1.         0.90909091 1.         0.91666667 0.91666667
 1.         1.         1.         1.        ]

mean value: 0.946969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78030303 1.         0.95454545 0.875      0.95833333 0.91287879
 0.90909091 0.95454545 1.         0.95454545]

mean value: 0.9299242424242424

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.61538462 1.         0.90909091 0.78571429 0.91666667 0.84615385
 0.85714286 0.92307692 1.         0.91666667]

mean value: 0.876989676989677

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.56

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.86088943 0.88546109 0.83543372 0.87413931 0.92852831 0.89648938
 0.97815275 0.92712474 0.90053868 0.94448614]

mean value: 0.9031243562698364

key: score_time
value: [0.25403523 0.22465062 0.24028587 0.24344826 0.16241813 0.23810482
 0.23798776 0.16599536 0.18922591 0.23433256]

mean value: 0.21904845237731935

key: test_mcc
value: [0.48075018 0.91666667 0.82575758 0.47727273 1.         0.74242424
 0.83743579 0.91605722 1.         0.64715023]

mean value: 0.7843514630610502

key: train_mcc
value: [0.90516294 0.89609853 0.89781488 0.91325992 0.91435567 0.92355447
 0.91435567 0.92355447 0.89663335 0.89663335]

mean value: 0.9081423249606505

key: test_accuracy
value: [0.73913043 0.95652174 0.91304348 0.73913043 1.         0.86956522
 0.91304348 0.95652174 1.         0.81818182]

mean value: 0.8905138339920948

key: train_accuracy
value: [0.95121951 0.94634146 0.94634146 0.95609756 0.95609756 0.96097561
 0.95609756 0.96097561 0.94660194 0.94660194]

mean value: 0.952735022495856

key: test_fscore
value: [0.7        0.95652174 0.90909091 0.72727273 1.         0.86956522
 0.92307692 0.96       1.         0.83333333]

mean value: 0.8878860849295632

key: train_fscore
value: [0.95327103 0.94883721 0.94930876 0.95734597 0.95734597 0.96190476
 0.95734597 0.96190476 0.94883721 0.94883721]

mean value: 0.9544938850206197

key: test_precision
value: [0.77777778 0.91666667 0.90909091 0.72727273 1.         0.90909091
 0.85714286 0.92307692 1.         0.76923077]

mean value: 0.8789349539349539

key: train_precision
value: [0.91891892 0.91071429 0.90350877 0.93518519 0.9266055  0.93518519
 0.9266055  0.93518519 0.91071429 0.91071429]

mean value: 0.9213337112721468

key: test_recall
value: [0.63636364 1.         0.90909091 0.72727273 1.         0.83333333
 1.         1.         1.         0.90909091]

mean value: 0.9015151515151515

key: train_recall
value: [0.99029126 0.99029126 1.         0.98058252 0.99019608 0.99019608
 0.99019608 0.99019608 0.99029126 0.99029126]

mean value: 0.9902531886541024

key: test_roc_auc
value: [0.73484848 0.95833333 0.91287879 0.73863636 1.         0.87121212
 0.90909091 0.95454545 1.         0.81818182]

mean value: 0.8897727272727273

key: train_roc_auc
value: [0.95102798 0.94612602 0.94607843 0.95597754 0.95626309 0.96111746
 0.95626309 0.96111746 0.94660194 0.94660194]

mean value: 0.9527174947648962

key: test_jcc
value: [0.53846154 0.91666667 0.83333333 0.57142857 1.         0.76923077
 0.85714286 0.92307692 1.         0.71428571]

mean value: 0.8123626373626374

key: train_jcc
value: [0.91071429 0.90265487 0.90350877 0.91818182 0.91818182 0.9266055
 0.91818182 0.9266055  0.90265487 0.90265487]

mean value: 0.9129944123133789

MCC on Blind test: 0.34

Accuracy on Blind test: 0.64

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01725674 0.00764561 0.00769782 0.00771165 0.00768232 0.00783229
 0.00767946 0.00749207 0.00788879 0.00778747]

mean value: 0.008667421340942384

key: score_time
value: [0.01506519 0.00864601 0.00875545 0.00867391 0.00863385 0.00854945
 0.00867081 0.00864387 0.00859499 0.00872564]

mean value: 0.009295916557312012

key: test_mcc
value: [ 0.3030303   0.15096491 -0.03816905  0.3030303   0.39727608  0.56818182
  0.39727608  0.31252706  0.29277002  0.09245003]

mean value: 0.27793375505710965

key: train_mcc
value: [0.37046449 0.38910743 0.39476736 0.38236392 0.38354703 0.35891522
 0.35302365 0.36367161 0.37290762 0.39345795]

mean value: 0.37622262820367175

key: test_accuracy
value: [0.65217391 0.56521739 0.47826087 0.65217391 0.69565217 0.7826087
 0.69565217 0.65217391 0.63636364 0.54545455]

mean value: 0.6355731225296443

key: train_accuracy
value: [0.68292683 0.69268293 0.69268293 0.68780488 0.68780488 0.67804878
 0.67317073 0.67804878 0.68446602 0.68932039]

mean value: 0.6846957139474308

key: test_fscore
value: [0.63636364 0.61538462 0.5        0.63636364 0.74074074 0.7826087
 0.74074074 0.71428571 0.69230769 0.58333333]

mean value: 0.6642128805172284

key: train_fscore
value: [0.70852018 0.71493213 0.72489083 0.71681416 0.71428571 0.69444444
 0.69955157 0.70535714 0.70588235 0.72649573]

mean value: 0.7111174245586319

key: test_precision
value: [0.63636364 0.53333333 0.46153846 0.63636364 0.66666667 0.81818182
 0.66666667 0.625      0.6        0.53846154]

mean value: 0.6182575757575758

key: train_precision
value: [0.65833333 0.66949153 0.65873016 0.65853659 0.6557377  0.65789474
 0.6446281  0.64754098 0.66101695 0.64885496]

mean value: 0.6560765038377927

key: test_recall
value: [0.63636364 0.72727273 0.54545455 0.63636364 0.83333333 0.75
 0.83333333 0.83333333 0.81818182 0.63636364]

mean value: 0.725

key: train_recall
value: [0.76699029 0.76699029 0.80582524 0.78640777 0.78431373 0.73529412
 0.76470588 0.7745098  0.75728155 0.82524272]

mean value: 0.7767561393489435

key: test_roc_auc
value: [0.65151515 0.5719697  0.48106061 0.65151515 0.68939394 0.78409091
 0.68939394 0.64393939 0.63636364 0.54545455]

mean value: 0.634469696969697

key: train_roc_auc
value: [0.68251475 0.69231868 0.69212831 0.68732153 0.68827337 0.67832667
 0.67361508 0.67851704 0.68446602 0.68932039]

mean value: 0.6846801827527127

key: test_jcc
value: [0.46666667 0.44444444 0.33333333 0.46666667 0.58823529 0.64285714
 0.58823529 0.55555556 0.52941176 0.41176471]

mean value: 0.5027170868347339

key: train_jcc
value: [0.54861111 0.55633803 0.56849315 0.55862069 0.55555556 0.53191489
 0.53793103 0.54482759 0.54545455 0.5704698 ]

mean value: 0.5518216393594725

MCC on Blind test: 0.47

Accuracy on Blind test: 0.73

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.11559343 0.039253   0.03851128 0.17964649 0.04505944 0.04687715
 0.03830266 0.04024267 0.03952813 0.04023385]

mean value: 0.062324810028076175

key: score_time
value: [0.0100162  0.01036406 0.01036692 0.01054835 0.00991821 0.00990653
 0.00961637 0.00992942 0.00987625 0.01000285]

mean value: 0.010054516792297363

key: test_mcc
value: [0.65151515 1.         0.91605722 0.76764947 0.83971912 0.83971912
 0.76277007 0.91605722 1.         0.81818182]

mean value: 0.8511669209848822

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.82608696 1.         0.95652174 0.86956522 0.91304348 0.91304348
 0.86956522 0.95652174 1.         0.90909091]

mean value: 0.9213438735177866

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.81818182 1.         0.95238095 0.88       0.90909091 0.90909091
 0.88888889 0.96       1.         0.90909091]

mean value: 0.9226724386724386

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.81818182 1.         1.         0.78571429 1.         1.
 0.8        0.92307692 1.         0.90909091]

mean value: 0.9236063936063936

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.90909091 1.         0.83333333 0.83333333
 1.         1.         1.         0.90909091]

mean value: 0.9303030303030303

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.82575758 1.         0.95454545 0.875      0.91666667 0.91666667
 0.86363636 0.95454545 1.         0.90909091]

mean value: 0.9215909090909091

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.69230769 1.         0.90909091 0.78571429 0.83333333 0.83333333
 0.8        0.92307692 1.         0.83333333]

mean value: 0.861018981018981

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.52

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01018047 0.03178215 0.03211904 0.0325737  0.03071642 0.03224421
 0.03228498 0.0325532  0.03256488 0.03023434]

mean value: 0.02972533702850342

key: score_time
value: [0.01017213 0.02084589 0.02090144 0.01349664 0.02145767 0.02162623
 0.01061773 0.01060104 0.01897764 0.02124476]

mean value: 0.016994118690490723

key: test_mcc
value: [0.58002308 0.65151515 0.56490196 0.83971912 0.83971912 0.91666667
 0.74047959 0.82575758 0.83205029 0.73029674]

mean value: 0.7521129297642005

key: train_mcc
value: [0.87352395 0.87320324 0.86356283 0.83418999 0.88310329 0.83418999
 0.84389872 0.85370265 0.83499081 0.84481947]

mean value: 0.8539184935656506

key: test_accuracy
value: [0.7826087  0.82608696 0.7826087  0.91304348 0.91304348 0.95652174
 0.86956522 0.91304348 0.90909091 0.86363636]

mean value: 0.8729249011857707

key: train_accuracy
value: [0.93658537 0.93658537 0.93170732 0.91707317 0.94146341 0.91707317
 0.92195122 0.92682927 0.91747573 0.9223301 ]

mean value: 0.9269074117925645

key: test_fscore
value: [0.73684211 0.81818182 0.76190476 0.91666667 0.90909091 0.95652174
 0.88       0.91666667 0.9        0.86956522]

mean value: 0.8665439884295719

key: train_fscore
value: [0.93779904 0.93719807 0.93269231 0.91707317 0.94174757 0.91707317
 0.92156863 0.92682927 0.9178744  0.92156863]

mean value: 0.9271424251996218

key: test_precision
value: [0.875      0.81818182 0.8        0.84615385 1.         1.
 0.84615385 0.91666667 1.         0.83333333]

mean value: 0.893548951048951

key: train_precision
value: [0.9245283  0.93269231 0.92380952 0.92156863 0.93269231 0.91262136
 0.92156863 0.9223301  0.91346154 0.93069307]

mean value: 0.9235965760062042

key: test_recall
value: [0.63636364 0.81818182 0.72727273 1.         0.83333333 0.91666667
 0.91666667 0.91666667 0.81818182 0.90909091]

mean value: 0.8492424242424242

key: train_recall
value: [0.95145631 0.94174757 0.94174757 0.91262136 0.95098039 0.92156863
 0.92156863 0.93137255 0.9223301  0.91262136]

mean value: 0.9308014467923091

key: test_roc_auc
value: [0.77651515 0.82575758 0.78030303 0.91666667 0.91666667 0.95833333
 0.86742424 0.91287879 0.90909091 0.86363636]

mean value: 0.8727272727272727

key: train_roc_auc
value: [0.93651247 0.93656006 0.9316581  0.91709499 0.94150961 0.91709499
 0.92194936 0.92685132 0.91747573 0.9223301 ]

mean value: 0.9269036740909956

key: test_jcc
value: [0.58333333 0.69230769 0.61538462 0.84615385 0.83333333 0.91666667
 0.78571429 0.84615385 0.81818182 0.76923077]

mean value: 0.7706460206460206

key: train_jcc
value: [0.88288288 0.88181818 0.87387387 0.84684685 0.88990826 0.84684685
 0.85454545 0.86363636 0.84821429 0.85454545]

mean value: 0.8643118447590925

MCC on Blind test: 0.1

Accuracy on Blind test: 0.55

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01748133 0.00712228 0.00695634 0.00682926 0.00673771 0.00671387
 0.00680852 0.0069313  0.00677037 0.00677538]

mean value: 0.007912635803222656

key: score_time
value: [0.00848055 0.0081079  0.0079267  0.0077877  0.00768209 0.00775528
 0.00768805 0.00777531 0.00776768 0.00765419]

mean value: 0.007862544059753418

key: test_mcc
value: [0.39727608 0.33371191 0.39393939 0.21969697 0.21452908 0.66414149
 0.39727608 0.48856385 0.63636364 0.27272727]

mean value: 0.40182257604909155

key: train_mcc
value: [0.47440586 0.49337247 0.49337247 0.46430782 0.47361912 0.44415883
 0.46367706 0.44784529 0.43062816 0.4882291 ]

mean value: 0.46736161990058567

key: test_accuracy
value: [0.69565217 0.65217391 0.69565217 0.60869565 0.60869565 0.82608696
 0.69565217 0.73913043 0.81818182 0.63636364]

mean value: 0.6976284584980237

key: train_accuracy
value: [0.73658537 0.74634146 0.74634146 0.73170732 0.73658537 0.72195122
 0.73170732 0.72195122 0.71359223 0.74271845]

mean value: 0.7329481411318968

key: test_fscore
value: [0.63157895 0.69230769 0.69565217 0.60869565 0.66666667 0.81818182
 0.74074074 0.72727273 0.81818182 0.63636364]

mean value: 0.7035641873170477

key: train_fscore
value: [0.74766355 0.75471698 0.75471698 0.74178404 0.74038462 0.72463768
 0.73429952 0.73732719 0.73059361 0.75576037]

mean value: 0.7421884529586577

key: test_precision
value: [0.75       0.6        0.66666667 0.58333333 0.6        0.9
 0.66666667 0.8        0.81818182 0.63636364]

mean value: 0.7021212121212121

key: train_precision
value: [0.72072072 0.73394495 0.73394495 0.71818182 0.72641509 0.71428571
 0.72380952 0.69565217 0.68965517 0.71929825]

mean value: 0.7175908371535152

key: test_recall
value: [0.54545455 0.81818182 0.72727273 0.63636364 0.75       0.75
 0.83333333 0.66666667 0.81818182 0.63636364]

mean value: 0.7181818181818181

key: train_recall
value: [0.77669903 0.77669903 0.77669903 0.76699029 0.75490196 0.73529412
 0.74509804 0.78431373 0.77669903 0.7961165 ]

mean value: 0.7689510755758614

key: test_roc_auc
value: [0.68939394 0.65909091 0.6969697  0.60984848 0.60227273 0.82954545
 0.68939394 0.74242424 0.81818182 0.63636364]

mean value: 0.6973484848484848

key: train_roc_auc
value: [0.73638873 0.74619265 0.74619265 0.73153436 0.73667428 0.72201599
 0.73177232 0.72225395 0.71359223 0.74271845]

mean value: 0.7329335617742243

key: test_jcc
value: [0.46153846 0.52941176 0.53333333 0.4375     0.5        0.69230769
 0.58823529 0.57142857 0.69230769 0.46666667]

mean value: 0.5472729476405946

key: train_jcc
value: [0.59701493 0.60606061 0.60606061 0.58955224 0.58778626 0.56818182
 0.58015267 0.58394161 0.57553957 0.60740741]

mean value: 0.5901697707371992

MCC on Blind test: 0.43

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0074048  0.00984526 0.01072073 0.00961113 0.01026011 0.01042223
 0.01001883 0.01076102 0.01011777 0.01003718]

mean value: 0.00991990566253662

key: score_time
value: [0.00778842 0.00981927 0.00983167 0.01007533 0.01036739 0.01042461
 0.01043558 0.01044655 0.01036429 0.01048994]

mean value: 0.010004305839538574

key: test_mcc
value: [0.56490196 0.66414149 0.65151515 0.63327851 0.91666667 0.74047959
 0.83743579 0.91605722 0.91287093 0.54772256]

mean value: 0.7385069858830032

key: train_mcc
value: [0.8345235  0.8345235  0.86600321 0.61725542 0.82136935 0.84332727
 0.82455974 0.85570033 0.78655606 0.79179983]

mean value: 0.8075618225770705

key: test_accuracy
value: [0.7826087  0.82608696 0.82608696 0.7826087  0.95652174 0.86956522
 0.91304348 0.95652174 0.95454545 0.77272727]

mean value: 0.8640316205533597

key: train_accuracy
value: [0.91707317 0.91707317 0.93170732 0.7804878  0.90731707 0.91707317
 0.91219512 0.92682927 0.89320388 0.89320388]

mean value: 0.8996163864551268

key: test_fscore
value: [0.76190476 0.83333333 0.81818182 0.81481481 0.95652174 0.88
 0.92307692 0.96       0.95652174 0.76190476]

mean value: 0.8666259891477283

key: train_fscore
value: [0.91625616 0.91625616 0.93457944 0.81927711 0.9124424  0.92237443
 0.91262136 0.92890995 0.89215686 0.88659794]

mean value: 0.904147180121348

key: test_precision
value: [0.8        0.76923077 0.81818182 0.6875     1.         0.84615385
 0.85714286 0.92307692 0.91666667 0.8       ]

mean value: 0.841795288045288

key: train_precision
value: [0.93       0.93       0.9009009  0.69863014 0.86086957 0.86324786
 0.90384615 0.89908257 0.9009901  0.94505495]

mean value: 0.8832622233070796

key: test_recall
value: [0.72727273 0.90909091 0.81818182 1.         0.91666667 0.91666667
 1.         1.         1.         0.72727273]

mean value: 0.9015151515151515

key: train_recall
value: [0.90291262 0.90291262 0.97087379 0.99029126 0.97058824 0.99019608
 0.92156863 0.96078431 0.88349515 0.83495146]

mean value: 0.9328574148105845

key: test_roc_auc
value: [0.78030303 0.82954545 0.82575758 0.79166667 0.95833333 0.86742424
 0.90909091 0.95454545 0.95454545 0.77272727]

mean value: 0.8643939393939394

key: train_roc_auc
value: [0.91714259 0.91714259 0.93151532 0.77945936 0.90762421 0.91742814
 0.91224062 0.9269941  0.89320388 0.89320388]

mean value: 0.8995954692556635

key: test_jcc
value: [0.61538462 0.71428571 0.69230769 0.6875     0.91666667 0.78571429
 0.85714286 0.92307692 0.91666667 0.61538462]

mean value: 0.7724130036630037

key: train_jcc
value: [0.84545455 0.84545455 0.87719298 0.69387755 0.83898305 0.8559322
 0.83928571 0.86725664 0.80530973 0.7962963 ]

mean value: 0.8265043260886354

MCC on Blind test: 0.2

Accuracy on Blind test: 0.57

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01033568 0.01053357 0.01022339 0.01038384 0.01062822 0.01051664
 0.01035213 0.01098609 0.01057267 0.01157546]

mean value: 0.01061077117919922

key: score_time
value: [0.01053452 0.01041985 0.01034975 0.01045704 0.01039529 0.01041269
 0.01073003 0.01061511 0.0108068  0.0110898 ]

mean value: 0.010581088066101075

key: test_mcc
value: [0.33946383 0.39727608 0.65909298 0.76764947 0.83971912 0.76277007
 0.76277007 0.82575758 0.83205029 0.63636364]

mean value: 0.6822913134388915

key: train_mcc
value: [0.74004127 0.85570033 0.72342586 0.7674294  0.80545006 0.55024014
 0.7696264  0.91224062 0.81572728 0.85473156]

mean value: 0.7794612920006722

key: test_accuracy
value: [0.65217391 0.69565217 0.82608696 0.86956522 0.91304348 0.86956522
 0.86956522 0.91304348 0.90909091 0.81818182]

mean value: 0.833596837944664

key: train_accuracy
value: [0.85365854 0.92682927 0.84878049 0.87317073 0.90243902 0.73170732
 0.87804878 0.95609756 0.90291262 0.92718447]

mean value: 0.8800828794695714

key: test_fscore
value: [0.5        0.63157895 0.8        0.88       0.90909091 0.88888889
 0.88888889 0.91666667 0.91666667 0.81818182]

mean value: 0.8149962785752259

key: train_fscore
value: [0.82954545 0.92462312 0.82681564 0.88695652 0.9        0.78764479
 0.88789238 0.95609756 0.90990991 0.92610837]

mean value: 0.8835593743916733

key: test_precision
value: [0.8        0.75       0.88888889 0.78571429 1.         0.8
 0.8        0.91666667 0.84615385 0.81818182]

mean value: 0.8405605505605506

key: train_precision
value: [1.         0.95833333 0.97368421 0.80314961 0.91836735 0.64968153
 0.81818182 0.95145631 0.8487395  0.94      ]

mean value: 0.8861593650419807

key: test_recall
value: [0.36363636 0.54545455 0.72727273 1.         0.83333333 1.
 1.         0.91666667 1.         0.81818182]

mean value: 0.8204545454545454

key: train_recall
value: [0.70873786 0.89320388 0.7184466  0.99029126 0.88235294 1.
 0.97058824 0.96078431 0.98058252 0.91262136]

mean value: 0.901760898534171

key: test_roc_auc
value: [0.64015152 0.68939394 0.8219697  0.875      0.91666667 0.86363636
 0.86363636 0.91287879 0.90909091 0.81818182]

mean value: 0.831060606060606

key: train_roc_auc
value: [0.85436893 0.9269941  0.84941938 0.87259661 0.90234152 0.73300971
 0.878498   0.95612031 0.90291262 0.92718447]

mean value: 0.8803445650104702

key: test_jcc
value: [0.33333333 0.46153846 0.66666667 0.78571429 0.83333333 0.8
 0.8        0.84615385 0.84615385 0.69230769]

mean value: 0.7065201465201465

key: train_jcc
value: [0.70873786 0.85981308 0.7047619  0.796875   0.81818182 0.64968153
 0.7983871  0.91588785 0.83471074 0.86238532]

mean value: 0.7949422211940016

MCC on Blind test: 0.11

Accuracy on Blind test: 0.54

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.08366251 0.06921864 0.0717814  0.07228327 0.07036138 0.0708344
 0.07182002 0.0701704  0.07077861 0.07223129]

mean value: 0.07231419086456299

key: score_time
value: [0.01484537 0.01417661 0.01449132 0.01528001 0.01412058 0.01532507
 0.01464701 0.01492405 0.01425171 0.01426148]

mean value: 0.014632320404052735

key: test_mcc
value: [0.83971912 0.91605722 0.91605722 0.58930667 0.83971912 0.91666667
 0.76277007 0.82575758 0.91287093 0.81818182]

mean value: 0.833710642294015

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.95652174 0.95652174 0.7826087  0.91304348 0.95652174
 0.86956522 0.91304348 0.95454545 0.90909091]

mean value: 0.9124505928853754

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91666667 0.95238095 0.95238095 0.8        0.90909091 0.95652174
 0.88888889 0.91666667 0.95238095 0.90909091]

mean value: 0.9154068636677333

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.84615385 1.         1.         0.71428571 1.         1.
 0.8        0.91666667 1.         0.90909091]

mean value: 0.9186197136197136

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.90909091 0.90909091 0.83333333 0.91666667
 1.         0.91666667 0.90909091 0.90909091]

mean value: 0.9212121212121211

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91666667 0.95454545 0.95454545 0.78787879 0.91666667 0.95833333
 0.86363636 0.91287879 0.95454545 0.90909091]

mean value: 0.9128787878787878

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84615385 0.90909091 0.90909091 0.66666667 0.83333333 0.91666667
 0.8        0.84615385 0.90909091 0.83333333]

mean value: 0.8469580419580419

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.52

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03033328 0.03085208 0.03747296 0.03314781 0.02555704 0.03062606
 0.02538133 0.02366614 0.04046106 0.03656936]

mean value: 0.031406712532043454

key: score_time
value: [0.02738023 0.02815413 0.01663828 0.01588154 0.01602888 0.01523161
 0.02233171 0.02261209 0.01840067 0.01769853]

mean value: 0.020035767555236818

key: test_mcc
value: [0.83971912 0.83743579 0.76277007 0.66414149 0.91666667 0.82575758
 0.83743579 0.91605722 1.         0.91287093]

mean value: 0.851285465730236

key: train_mcc
value: [0.99029126 0.98067587 0.99029126 1.         1.         1.
 1.         0.99029126 0.99033794 0.99033794]

mean value: 0.9932225534805602

key: test_accuracy
value: [0.91304348 0.91304348 0.86956522 0.82608696 0.95652174 0.91304348
 0.91304348 0.95652174 1.         0.95454545]

mean value: 0.9215415019762846

key: train_accuracy
value: [0.99512195 0.9902439  0.99512195 1.         1.         1.
 1.         0.99512195 0.99514563 0.99514563]

mean value: 0.9965901018233483

key: test_fscore
value: [0.91666667 0.9        0.84210526 0.83333333 0.95652174 0.91666667
 0.92307692 0.96       1.         0.95652174]

mean value: 0.9204892331162354

key: train_fscore
value: [0.99512195 0.99019608 0.99512195 1.         1.         1.
 1.         0.99512195 0.99512195 0.99512195]

mean value: 0.9965805834528934

key: test_precision
value: [0.84615385 1.         1.         0.76923077 1.         0.91666667
 0.85714286 0.92307692 1.         0.91666667]

mean value: 0.922893772893773

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         0.99029126 1.         1.        ]

mean value: 0.9990291262135922

key: test_recall
value: [1.         0.81818182 0.72727273 0.90909091 0.91666667 0.91666667
 1.         1.         1.         1.        ]

mean value: 0.9287878787878788

key: train_recall
value: [0.99029126 0.98058252 0.99029126 1.         1.         1.
 1.         1.         0.99029126 0.99029126]

mean value: 0.9941747572815534

key: test_roc_auc
value: [0.91666667 0.90909091 0.86363636 0.82954545 0.95833333 0.91287879
 0.90909091 0.95454545 1.         0.95454545]

mean value: 0.9208333333333333

key: train_roc_auc
value: [0.99514563 0.99029126 0.99514563 1.         1.         1.
 1.         0.99514563 0.99514563 0.99514563]

mean value: 0.9966019417475728

key: test_jcc
value: [0.84615385 0.81818182 0.72727273 0.71428571 0.91666667 0.84615385
 0.85714286 0.92307692 1.         0.91666667]

mean value: 0.8565601065601065

key: train_jcc
value: [0.99029126 0.98058252 0.99029126 1.         1.         1.
 1.         0.99029126 0.99029126 0.99029126]

mean value: 0.9932038834951457

MCC on Blind test: 0.13

Accuracy on Blind test: 0.55

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.04991055 0.09063339 0.06661296 0.04403114 0.03577352 0.03729105
 0.02286935 0.02291799 0.02340531 0.04758835]

mean value: 0.044103360176086424

key: score_time
value: [0.02371454 0.02476144 0.03154182 0.01158547 0.02154064 0.01150608
 0.01171994 0.01175761 0.01140141 0.01663899]

mean value: 0.017616796493530273

key: test_mcc
value: [0.39727608 0.56818182 0.56490196 0.38932432 0.31252706 0.6992059
 0.65151515 0.74242424 0.64715023 0.61237244]

mean value: 0.558487918534798

key: train_mcc
value: [0.93174679 0.94146202 0.9024367  0.92194936 0.91224062 0.89272796
 0.9024367  0.94146202 0.91266437 0.93243443]

mean value: 0.9191560993974353

key: test_accuracy
value: [0.69565217 0.7826087  0.7826087  0.69565217 0.65217391 0.82608696
 0.82608696 0.86956522 0.81818182 0.77272727]

mean value: 0.7721343873517786

key: train_accuracy
value: [0.96585366 0.97073171 0.95121951 0.96097561 0.95609756 0.94634146
 0.95121951 0.97073171 0.95631068 0.96601942]

mean value: 0.9595500828794695

key: test_fscore
value: [0.63157895 0.7826087  0.76190476 0.66666667 0.71428571 0.8
 0.83333333 0.86956522 0.8        0.70588235]

mean value: 0.7565825689543552

key: train_fscore
value: [0.96618357 0.97087379 0.95145631 0.96116505 0.95609756 0.94634146
 0.95098039 0.97058824 0.95652174 0.96650718]

mean value: 0.9596715288515447

key: test_precision
value: [0.75       0.75       0.8        0.7        0.625      1.
 0.83333333 0.90909091 0.88888889 1.        ]

mean value: 0.8256313131313131

key: train_precision
value: [0.96153846 0.97087379 0.95145631 0.96116505 0.95145631 0.94174757
 0.95098039 0.97058824 0.95192308 0.95283019]

mean value: 0.9564559383717978

key: test_recall
value: [0.54545455 0.81818182 0.72727273 0.63636364 0.83333333 0.66666667
 0.83333333 0.83333333 0.72727273 0.54545455]

mean value: 0.7166666666666667

key: train_recall
value: [0.97087379 0.97087379 0.95145631 0.96116505 0.96078431 0.95098039
 0.95098039 0.97058824 0.96116505 0.98058252]

mean value: 0.9629449838187703

key: test_roc_auc
value: [0.68939394 0.78409091 0.78030303 0.69318182 0.64393939 0.83333333
 0.82575758 0.87121212 0.81818182 0.77272727]

mean value: 0.7712121212121212

key: train_roc_auc
value: [0.96582905 0.97073101 0.95121835 0.96097468 0.95612031 0.94636398
 0.95121835 0.97073101 0.95631068 0.96601942]

mean value: 0.9595516847515705

key: test_jcc
value: [0.46153846 0.64285714 0.61538462 0.5        0.55555556 0.66666667
 0.71428571 0.76923077 0.66666667 0.54545455]

mean value: 0.6137640137640138

key: train_jcc
value: [0.93457944 0.94339623 0.90740741 0.92523364 0.91588785 0.89814815
 0.90654206 0.94285714 0.91666667 0.93518519]

mean value: 0.9225903767333851

MCC on Blind test: 0.34

Accuracy on Blind test: 0.67

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.1293633  0.12412262 0.12290406 0.12451863 0.12319613 0.12431335
 0.12317395 0.12272644 0.12358046 0.12428999]

mean value: 0.12421889305114746

key: score_time
value: [0.00877905 0.00823951 0.00833726 0.00852513 0.00845313 0.00828242
 0.00824165 0.0083313  0.00858903 0.00842285]

mean value: 0.008420133590698242

key: test_mcc
value: [0.74242424 0.91605722 0.91605722 0.76764947 0.83971912 0.91605722
 0.83743579 0.91605722 1.         0.81818182]

mean value: 0.866963934561781

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.95652174 0.95652174 0.86956522 0.91304348 0.95652174
 0.91304348 0.95652174 1.         0.90909091]

mean value: 0.9300395256916996

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.86956522 0.95238095 0.95238095 0.88       0.90909091 0.96
 0.92307692 0.96       1.         0.90909091]

mean value: 0.931558586341195

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.83333333 1.         1.         0.78571429 1.         0.92307692
 0.85714286 0.92307692 1.         0.90909091]

mean value: 0.9231435231435231

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 0.90909091 1.         0.83333333 1.
 1.         1.         1.         0.90909091]

mean value: 0.946969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.87121212 0.95454545 0.95454545 0.875      0.91666667 0.95454545
 0.90909091 0.95454545 1.         0.90909091]

mean value: 0.9299242424242424

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.76923077 0.90909091 0.90909091 0.78571429 0.83333333 0.92307692
 0.85714286 0.92307692 1.         0.83333333]

mean value: 0.8743090243090244

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.55

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.00908375 0.01190424 0.01377177 0.01136947 0.01182413 0.0117712
 0.01173353 0.01190329 0.0147233  0.01190543]

mean value: 0.011999011039733887

key: score_time
value: [0.01050138 0.01059628 0.01065016 0.01060724 0.01079369 0.01067948
 0.01063824 0.01067424 0.01086879 0.0129354 ]

mean value: 0.010894489288330079

key: test_mcc
value: [0.47727273 0.66414149 0.17236256 0.37057951 0.55048188 0.56490196
 0.40451992 0.40451992 0.13245324 0.48795004]

mean value: 0.42291832347791636

key: train_mcc
value: [0.5185658  0.60463182 0.61253896 0.61919584 0.62634721 0.57825573
 0.42798979 0.54305523 0.59064979 0.61850654]

mean value: 0.5739736713126434

key: test_accuracy
value: [0.73913043 0.82608696 0.56521739 0.65217391 0.73913043 0.7826087
 0.65217391 0.65217391 0.54545455 0.72727273]

mean value: 0.6881422924901186

key: train_accuracy
value: [0.72682927 0.8        0.7902439  0.78536585 0.78536585 0.76585366
 0.65365854 0.73658537 0.77669903 0.77669903]

mean value: 0.7597300497276818

key: test_fscore
value: [0.72727273 0.83333333 0.64285714 0.71428571 0.8        0.8
 0.75       0.75       0.66666667 0.76923077]

mean value: 0.7453646353646354

key: train_fscore
value: [0.78125    0.81278539 0.82008368 0.82113821 0.82113821 0.80327869
 0.74181818 0.78740157 0.80991736 0.81746032]

mean value: 0.8016271610878589

key: test_precision
value: [0.72727273 0.76923077 0.52941176 0.58823529 0.66666667 0.76923077
 0.6        0.6        0.52631579 0.66666667]

mean value: 0.6443030447364813

key: train_precision
value: [0.65359477 0.76724138 0.72058824 0.70629371 0.70138889 0.69014085
 0.58959538 0.65789474 0.70503597 0.69127517]

mean value: 0.6883049077672215

key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.90909091 1.         0.83333333
 1.         1.         0.90909091 0.90909091]

mean value: 0.9015151515151515

key: train_recall
value: [0.97087379 0.86407767 0.95145631 0.98058252 0.99019608 0.96078431
 1.         0.98039216 0.95145631 1.        ]

mean value: 0.9649819150961355

key: test_roc_auc
value: [0.73863636 0.82954545 0.57575758 0.66287879 0.72727273 0.78030303
 0.63636364 0.63636364 0.54545455 0.72727273]

mean value: 0.6859848484848485

key: train_roc_auc
value: [0.72563297 0.79968589 0.78945365 0.78440891 0.78636018 0.76679992
 0.65533981 0.73776889 0.77669903 0.77669903]

mean value: 0.7598848277174948

key: test_jcc
value: [0.57142857 0.71428571 0.47368421 0.55555556 0.66666667 0.66666667
 0.6        0.6        0.5        0.625     ]

mean value: 0.597328738512949

key: train_jcc
value: [0.64102564 0.68461538 0.69503546 0.69655172 0.69655172 0.67123288
 0.58959538 0.64935065 0.68055556 0.69127517]

mean value: 0.6695789560036107

MCC on Blind test: 0.35

Accuracy on Blind test: 0.62

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01302195 0.01023006 0.01021552 0.01021671 0.01030731 0.01027513
 0.01022935 0.01022243 0.01024985 0.01030946]

mean value: 0.010527777671813964

key: score_time
value: [0.01048541 0.0103786  0.01038289 0.01038122 0.01036906 0.01034164
 0.01037383 0.01037812 0.01037979 0.01041484]

mean value: 0.010388541221618652

key: test_mcc
value: [0.58002308 0.74242424 0.65909298 0.74242424 0.83971912 0.91666667
 0.83743579 0.82575758 0.91287093 0.63636364]

mean value: 0.7692778262419881

key: train_mcc
value: [0.85368872 0.84407425 0.86341138 0.84407425 0.82438607 0.81495251
 0.84389872 0.81495251 0.83499081 0.85473156]

mean value: 0.8393160802573247

key: test_accuracy
value: [0.7826087  0.86956522 0.82608696 0.86956522 0.91304348 0.95652174
 0.91304348 0.91304348 0.95454545 0.81818182]

mean value: 0.8816205533596838

key: train_accuracy
value: [0.92682927 0.92195122 0.93170732 0.92195122 0.91219512 0.90731707
 0.92195122 0.90731707 0.91747573 0.92718447]

mean value: 0.9195879706369879

key: test_fscore
value: [0.73684211 0.86956522 0.8        0.86956522 0.90909091 0.95652174
 0.92307692 0.91666667 0.95238095 0.81818182]

mean value: 0.875189154857347

key: train_fscore
value: [0.92753623 0.92156863 0.93203883 0.92156863 0.91176471 0.90547264
 0.92156863 0.90547264 0.9178744  0.92610837]

mean value: 0.9190973699222151

key: test_precision
value: [0.875      0.83333333 0.88888889 0.83333333 1.         1.
 0.85714286 0.91666667 1.         0.81818182]

mean value: 0.9022546897546897

key: train_precision
value: [0.92307692 0.93069307 0.93203883 0.93069307 0.91176471 0.91919192
 0.92156863 0.91919192 0.91346154 0.94      ]

mean value: 0.9241680606820951

key: test_recall
value: [0.63636364 0.90909091 0.72727273 0.90909091 0.83333333 0.91666667
 1.         0.91666667 0.90909091 0.81818182]

mean value: 0.8575757575757575

key: train_recall
value: [0.93203883 0.91262136 0.93203883 0.91262136 0.91176471 0.89215686
 0.92156863 0.89215686 0.9223301  0.91262136]

mean value: 0.9141918903483723

key: test_roc_auc
value: [0.77651515 0.87121212 0.8219697  0.87121212 0.91666667 0.95833333
 0.90909091 0.91287879 0.95454545 0.81818182]

mean value: 0.8810606060606061

key: train_roc_auc
value: [0.92680373 0.92199695 0.93170569 0.92199695 0.91219303 0.90724348
 0.92194936 0.90724348 0.91747573 0.92718447]

mean value: 0.91957928802589

key: test_jcc
value: [0.58333333 0.76923077 0.66666667 0.76923077 0.83333333 0.91666667
 0.85714286 0.84615385 0.90909091 0.69230769]

mean value: 0.7843156843156843

key: train_jcc
value: [0.86486486 0.85454545 0.87272727 0.85454545 0.83783784 0.82727273
 0.85454545 0.82727273 0.84821429 0.86238532]

mean value: 0.8504211400426996

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
       'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.08454299 0.08204246 0.08172989 0.0863173  0.09503531 0.08147788
 0.08158469 0.08152223 0.09179831 0.08183026]

mean value: 0.08478813171386719

key: score_time
value: [0.01065302 0.01066089 0.01067209 0.01064348 0.01065207 0.010607
 0.010638   0.01059103 0.01064205 0.01067996]

mean value: 0.010643959045410156

key: test_mcc
value: [0.58002308 0.65151515 0.65909298 0.74242424 0.83971912 0.91666667
 0.83743579 0.82575758 0.91287093 0.73029674]

mean value: 0.7695802278487375

key: train_mcc
value: [0.85368872 0.87320324 0.86356283 0.84407425 0.87321531 0.83417421
 0.84389872 0.85370265 0.83499081 0.86407767]

mean value: 0.8538588407839809

key: test_accuracy
value: [0.7826087  0.82608696 0.82608696 0.86956522 0.91304348 0.95652174
 0.91304348 0.91304348 0.95454545 0.86363636]

mean value: 0.8818181818181818

key: train_accuracy
value: [0.92682927 0.93658537 0.93170732 0.92195122 0.93658537 0.91707317
 0.92195122 0.92682927 0.91747573 0.93203883]

mean value: 0.9269026758228748

key: test_fscore
value: [0.73684211 0.81818182 0.8        0.86956522 0.90909091 0.95652174
 0.92307692 0.91666667 0.95238095 0.86956522]

mean value: 0.875189154857347

key: train_fscore
value: [0.92753623 0.93719807 0.93269231 0.92156863 0.93658537 0.91625616
 0.92156863 0.92682927 0.9178744  0.93203883]

mean value: 0.9270147884979708

key: test_precision
value: [0.875      0.81818182 0.88888889 0.83333333 1.         1.
 0.85714286 0.91666667 1.         0.83333333]

mean value: 0.9022546897546897

key: train_precision
value: [0.92307692 0.93269231 0.92380952 0.93069307 0.93203883 0.92079208
 0.92156863 0.9223301  0.91346154 0.93203883]

mean value: 0.9252501835996416

key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.90909091 0.83333333 0.91666667
 1.         0.91666667 0.90909091 0.90909091]

mean value: 0.8575757575757575

key: train_recall
value: [0.93203883 0.94174757 0.94174757 0.91262136 0.94117647 0.91176471
 0.92156863 0.93137255 0.9223301  0.93203883]

mean value: 0.9288406624785837

key: test_roc_auc
value: [0.77651515 0.82575758 0.8219697  0.87121212 0.91666667 0.95833333
 0.90909091 0.91287879 0.95454545 0.86363636]

mean value: 0.8810606060606061

key: train_roc_auc
value: [0.92680373 0.93656006 0.9316581  0.92199695 0.93660765 0.9170474
 0.92194936 0.92685132 0.91747573 0.93203883]

mean value: 0.9268989149057681

key: test_jcc
value: [0.58333333 0.69230769 0.66666667 0.76923077 0.83333333 0.91666667
 0.85714286 0.84615385 0.90909091 0.76923077]

mean value: 0.7843156843156843

key: train_jcc
value: [0.86486486 0.88181818 0.87387387 0.85454545 0.88073394 0.84545455
 0.85454545 0.86363636 0.84821429 0.87272727]

mean value: 0.8640414242134425

MCC on Blind test: 0.12

Accuracy on Blind test: 0.56