LSHTM_analysis/scripts/ml/log_rpob_7030.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_7030.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 1133

PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation

or_mychisq          339
log10_or_mychisq    339
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 169
No. of categorical features: 7

PASS: x_features has no target variable

No. of columns for x_features: 176

-------------------------------------------------------------
Successfully split data with stratification: 70/30
Input features data size: (557, 176)
Train data size: (373, 176)
Test data size: (184, 176)
y_train numbers: Counter({0: 189, 1: 184})
y_train ratio: 1.0271739130434783

y_test_numbers: Counter({0: 93, 1: 91})
y_test ratio: 1.021978021978022
-------------------------------------------------------------

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True

index: 2
ind: 3

Mask count check: True
Original Data
 Counter({0: 189, 1: 184}) Data dim: (373, 176)

Simple Random OverSampling
 Counter({1: 189, 0: 189})
(378, 176)

Simple Random UnderSampling
 Counter({0: 184, 1: 184})
(368, 176)

Simple Combined Over and UnderSampling
 Counter({0: 189, 1: 189})
(378, 176)

SMOTE_NC OverSampling
 Counter({1: 189, 0: 189})
(378, 176)

#####################################################################

Running ML analysis: 70/30 split
Gene name: rpoB
Drug name: rifampicin

Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_7030/

Sanity checks:
Total input features: 176

Training data size: (373, 176)
Test data size: (184, 176)

Target feature numbers (training data): Counter({0: 189, 1: 184})
Target features ratio (training data: 1.0271739130434783

Target feature numbers (test data): Counter({0: 93, 1: 91})
Target features ratio (test data): 1.021978021978022

#####################################################################


================================================================

Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
These are:
 ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.06178975 0.03049135 0.03225613 0.03168249 0.03428817 0.03472066
 0.0346334  0.02963018 0.0661819  0.06939864]

mean value: 0.042507266998291014

key: score_time
value: [0.02421689 0.01209807 0.01218534 0.01204538 0.01496863 0.01508284
 0.01212668 0.01215744 0.01237869 0.01236558]

mean value: 0.013962554931640624

key: test_mcc
value: [0.89973541 0.57894737 0.68803296 0.73099415 0.83918129 0.68035483
 0.83918129 0.89181287 0.94736842 0.84834956]

mean value: 0.7943958140147007

key: train_mcc
value: [0.86265911 0.8687128  0.88086411 0.87498893 0.87500665 0.88101481
 0.87500665 0.87500665 0.86910921 0.86324256]

mean value: 0.8725611472917782

key: test_accuracy
value: [0.94736842 0.78947368 0.84210526 0.86486486 0.91891892 0.83783784
 0.91891892 0.94594595 0.97297297 0.91891892]

mean value: 0.8957325746799432

key: train_accuracy
value: [0.93134328 0.93432836 0.94029851 0.9375     0.9375     0.94047619
 0.9375     0.9375     0.93452381 0.93154762]

mean value: 0.936251776830135

key: test_fscore
value: [0.95       0.78947368 0.83333333 0.86486486 0.91891892 0.84210526
 0.91891892 0.94444444 0.97297297 0.90909091]

mean value: 0.8944123309912784

key: train_fscore
value: [0.93009119 0.93373494 0.94011976 0.93655589 0.93693694 0.94011976
 0.93693694 0.93693694 0.93413174 0.93134328]

mean value: 0.9356907368285973

key: test_precision
value: [0.9047619  0.78947368 0.88235294 0.88888889 0.89473684 0.8
 0.89473684 0.94444444 0.94736842 1.        ]

mean value: 0.8946763968745393

key: train_precision
value: [0.93292683 0.92814371 0.92899408 0.93373494 0.93413174 0.93452381
 0.93413174 0.93413174 0.92857143 0.92307692]

mean value: 0.9312366935195415

key: test_recall
value: [1.         0.78947368 0.78947368 0.84210526 0.94444444 0.88888889
 0.94444444 0.94444444 1.         0.83333333]

mean value: 0.8976608187134503

key: train_recall
value: [0.92727273 0.93939394 0.95151515 0.93939394 0.93975904 0.94578313
 0.93975904 0.93975904 0.93975904 0.93975904]

mean value: 0.940215407082877

key: test_roc_auc
value: [0.94736842 0.78947368 0.84210526 0.86549708 0.91959064 0.83918129
 0.91959064 0.94590643 0.97368421 0.91666667]

mean value: 0.895906432748538

key: train_roc_auc
value: [0.93128342 0.93440285 0.94046346 0.93753323 0.93752658 0.94053863
 0.93752658 0.93752658 0.9345854  0.93164422]

mean value: 0.936303093978315

key: test_jcc
value: [0.9047619  0.65217391 0.71428571 0.76190476 0.85       0.72727273
 0.85       0.89473684 0.94736842 0.83333333]

mean value: 0.8135837617759815

key: train_jcc
value: [0.86931818 0.87570621 0.88700565 0.88068182 0.88135593 0.88700565
 0.88135593 0.88135593 0.87640449 0.87150838]

mean value: 0.8791698185004754

MCC on Blind test: 0.84

Accuracy on Blind test: 0.92

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.24725127 1.25279379 1.20516229 1.37949181 1.13146496 1.56774902
 1.27143836 1.19698691 1.58842921 1.32122326]

mean value: 1.3161990880966186

key: score_time
value: [0.0150485  0.01297903 0.01233244 0.0124383  0.01545596 0.01286793
 0.01544952 0.01620698 0.01299453 0.0154202 ]

mean value: 0.014119338989257813

key: test_mcc
value: [0.89973541 0.68803296 0.68803296 0.68035483 0.83918129 0.68035483
 0.83918129 0.78764146 0.94736842 0.73020842]

mean value: 0.7780091871199141

key: train_mcc
value: [0.88065448 1.         0.83879937 0.90473153 0.89284196 0.83334517
 0.88691246 1.         1.         0.98809355]

mean value: 0.9225378515989221

key: test_accuracy
value: [0.94736842 0.84210526 0.84210526 0.83783784 0.91891892 0.83783784
 0.91891892 0.89189189 0.97297297 0.86486486]

mean value: 0.8874822190611664

key: train_accuracy
value: [0.94029851 1.         0.91940299 0.95238095 0.94642857 0.91666667
 0.94345238 1.         1.         0.99404762]

mean value: 0.9612677683013504

key: test_fscore
value: [0.95       0.85       0.83333333 0.83333333 0.91891892 0.84210526
 0.91891892 0.88235294 0.97297297 0.85714286]

mean value: 0.88590785389547

key: train_fscore
value: [0.93975904 1.         0.918429   0.95151515 0.94578313 0.91515152
 0.94294294 1.         1.         0.9939759 ]

mean value: 0.9607556684919915

key: test_precision
value: [0.9047619  0.80952381 0.88235294 0.88235294 0.89473684 0.8
 0.89473684 0.9375     0.94736842 0.88235294]

mean value: 0.8835686643078284

key: train_precision
value: [0.93413174 1.         0.91566265 0.95151515 0.94578313 0.92073171
 0.94011976 1.         1.         0.9939759 ]

mean value: 0.96019200425852

key: test_recall
value: [1.         0.89473684 0.78947368 0.78947368 0.94444444 0.88888889
 0.94444444 0.83333333 1.         0.83333333]

mean value: 0.8918128654970761

key: train_recall
value: [0.94545455 1.         0.92121212 0.95151515 0.94578313 0.90963855
 0.94578313 1.         1.         0.9939759 ]

mean value: 0.9613362541073385

key: test_roc_auc
value: [0.94736842 0.84210526 0.84210526 0.83918129 0.91959064 0.83918129
 0.91959064 0.89035088 0.97368421 0.86403509]

mean value: 0.887719298245614

key: train_roc_auc
value: [0.94037433 1.         0.91942959 0.95236576 0.94642098 0.91658398
 0.9434798  1.         1.         0.99404678]

mean value: 0.9612701222377078

key: test_jcc
value: [0.9047619  0.73913043 0.71428571 0.71428571 0.85       0.72727273
 0.85       0.78947368 0.94736842 0.75      ]

mean value: 0.7986578600651827

key: train_jcc
value: [0.88636364 1.         0.84916201 0.90751445 0.89714286 0.84357542
 0.89204545 1.         1.         0.98802395]

mean value: 0.9263827781182407

MCC on Blind test: 0.76

Accuracy on Blind test: 0.88

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01527596 0.01070619 0.01155663 0.01507545 0.0173049  0.00961065
 0.00960159 0.01027203 0.01683927 0.00966716]

mean value: 0.012590980529785157

key: score_time
value: [0.01228333 0.00963926 0.01033974 0.01431751 0.01128507 0.00904703
 0.00902438 0.01120901 0.01366067 0.00890756]

mean value: 0.010971355438232421

key: test_mcc
value: [0.74620251 0.42640143 0.61017022 0.47328975 0.57857577 0.53638795
 0.83918129 0.73821295 0.73821295 0.51319869]

mean value: 0.6199833494826468

key: train_mcc
value: [0.63213973 0.67077671 0.66818514 0.67469654 0.63279874 0.65000993
 0.65436967 0.65987564 0.64691443 0.63336739]

mean value: 0.6523133932014681

key: test_accuracy
value: [0.86842105 0.71052632 0.78947368 0.72972973 0.78378378 0.75675676
 0.91891892 0.86486486 0.86486486 0.75675676]

mean value: 0.8044096728307255

key: train_accuracy
value: [0.81492537 0.83283582 0.83283582 0.83630952 0.80952381 0.82142857
 0.82440476 0.82738095 0.82142857 0.81547619]

mean value: 0.8236549395877754

key: test_fscore
value: [0.85714286 0.68571429 0.75       0.70588235 0.75       0.7804878
 0.91891892 0.84848485 0.84848485 0.74285714]

mean value: 0.7887973059422126

key: train_fscore
value: [0.80254777 0.81818182 0.82165605 0.82539683 0.78378378 0.80392157
 0.80906149 0.81290323 0.80769231 0.80379747]

mean value: 0.8088942308172258

key: test_precision
value: [0.9375     0.75       0.92307692 0.8        0.85714286 0.69565217
 0.89473684 0.93333333 0.93333333 0.76470588]

mean value: 0.8489481345257694

key: train_precision
value: [0.84563758 0.88111888 0.86577181 0.86666667 0.89230769 0.87857143
 0.87412587 0.875      0.8630137  0.84666667]

mean value: 0.8688880304060501

key: test_recall
value: [0.78947368 0.63157895 0.63157895 0.63157895 0.66666667 0.88888889
 0.94444444 0.77777778 0.77777778 0.72222222]

mean value: 0.7461988304093568

key: train_recall
value: [0.76363636 0.76363636 0.78181818 0.78787879 0.69879518 0.74096386
 0.75301205 0.75903614 0.75903614 0.76506024]

mean value: 0.7572873311427528

key: test_roc_auc
value: [0.86842105 0.71052632 0.78947368 0.73245614 0.78070175 0.76023392
 0.91959064 0.8625731  0.8625731  0.75584795]

mean value: 0.8042397660818713

key: train_roc_auc
value: [0.81417112 0.83181818 0.83208556 0.83545986 0.80822112 0.82048193
 0.82356485 0.8265769  0.82069454 0.81488306]

mean value: 0.8227957123550022

key: test_jcc
value: [0.75       0.52173913 0.6        0.54545455 0.6        0.64
 0.85       0.73684211 0.73684211 0.59090909]

mean value: 0.6571786977324735

key: train_jcc
value: [0.67021277 0.69230769 0.6972973  0.7027027  0.64444444 0.67213115
 0.67934783 0.68478261 0.67741935 0.67195767]

mean value: 0.6792603511829558

MCC on Blind test: 0.7

Accuracy on Blind test: 0.85

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01008868 0.0128963  0.02115774 0.00986743 0.00986052 0.01429033
 0.01309276 0.01284504 0.00995731 0.01076698]

mean value: 0.012482309341430664

key: score_time
value: [0.00907111 0.01345086 0.00931144 0.00973558 0.00988913 0.0149653
 0.01284695 0.00896931 0.0095892  0.0089035 ]

mean value: 0.010673236846923829

key: test_mcc
value: [0.74620251 0.42640143 0.65465367 0.56725146 0.68035483 0.56725146
 0.56725146 0.67849265 0.7888597  0.83918129]

mean value: 0.6515900457032924

key: train_mcc
value: [0.73755882 0.7792393  0.74367201 0.73209888 0.70905196 0.7441844
 0.76402212 0.73287373 0.70870914 0.71482244]

mean value: 0.7366232811135077

key: test_accuracy
value: [0.86842105 0.71052632 0.81578947 0.78378378 0.83783784 0.78378378
 0.78378378 0.83783784 0.89189189 0.91891892]

mean value: 0.8232574679943101

key: train_accuracy
value: [0.86865672 0.88955224 0.87164179 0.86607143 0.85416667 0.87202381
 0.88095238 0.86607143 0.85416667 0.85714286]

mean value: 0.8680445984363895

key: test_fscore
value: [0.87804878 0.73170732 0.78787879 0.78947368 0.84210526 0.77777778
 0.77777778 0.82352941 0.89473684 0.91891892]

mean value: 0.8221954561152628

key: train_fscore
value: [0.86826347 0.88888889 0.87164179 0.86404834 0.85545723 0.87164179
 0.88372093 0.86725664 0.85459941 0.85798817]

mean value: 0.868350664914892

key: test_precision
value: [0.81818182 0.68181818 0.92857143 0.78947368 0.8        0.77777778
 0.77777778 0.875      0.85       0.89473684]

mean value: 0.8193337510442774

key: train_precision
value: [0.85798817 0.88095238 0.85882353 0.86144578 0.83815029 0.86390533
 0.85393258 0.84971098 0.84210526 0.84302326]

mean value: 0.8550037559538748

key: test_recall
value: [0.94736842 0.78947368 0.68421053 0.78947368 0.88888889 0.77777778
 0.77777778 0.77777778 0.94444444 0.94444444]

mean value: 0.8321637426900584

key: train_recall
value: [0.87878788 0.8969697  0.88484848 0.86666667 0.87349398 0.87951807
 0.91566265 0.88554217 0.86746988 0.87349398]

mean value: 0.8822453450164294

key: test_roc_auc
value: [0.86842105 0.71052632 0.81578947 0.78362573 0.83918129 0.78362573
 0.78362573 0.83625731 0.89327485 0.91959064]

mean value: 0.8233918128654971

key: train_roc_auc
value: [0.8688057  0.88966132 0.87183601 0.86608187 0.85439405 0.87211198
 0.88136074 0.8663005  0.85432318 0.85733522]

mean value: 0.8682210557211489

key: test_jcc
value: [0.7826087  0.57692308 0.65       0.65217391 0.72727273 0.63636364
 0.63636364 0.7        0.80952381 0.85      ]

mean value: 0.7021229495142538

key: train_jcc
value: [0.76719577 0.8        0.77248677 0.7606383  0.74742268 0.77248677
 0.79166667 0.765625   0.74611399 0.75129534]

mean value: 0.7674931283545561

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00998735 0.01429224 0.01066971 0.01056051 0.01671553 0.00953698
 0.01070189 0.0094161  0.0092299  0.01051044]

mean value: 0.011162066459655761

key: score_time
value: [0.06381631 0.04476571 0.01716495 0.01765943 0.0184772  0.0145154
 0.01202798 0.0113976  0.01206398 0.01199079]

mean value: 0.022387933731079102

key: test_mcc
value: [0.42163702 0.15877684 0.63960215 0.24633537 0.7888597  0.08554907
 0.4163404  0.30307132 0.45906433 0.62170355]

mean value: 0.41409397538354425

key: train_mcc
value: [0.64781471 0.70758921 0.65960709 0.67249172 0.63089248 0.68489413
 0.65480084 0.69054046 0.66065385 0.65480084]

mean value: 0.6664085336282927

key: test_accuracy
value: [0.71052632 0.57894737 0.81578947 0.62162162 0.89189189 0.54054054
 0.7027027  0.64864865 0.72972973 0.81081081]

mean value: 0.7051209103840683

key: train_accuracy
value: [0.8238806  0.85373134 0.82985075 0.83630952 0.81547619 0.8422619
 0.82738095 0.8452381  0.83035714 0.82738095]

mean value: 0.8331867448471926

key: test_fscore
value: [0.7027027  0.6        0.8        0.61111111 0.89473684 0.56410256
 0.64516129 0.58064516 0.72222222 0.8       ]

mean value: 0.6920681893856767

key: train_fscore
value: [0.81846154 0.85285285 0.82674772 0.83282675 0.81212121 0.84272997
 0.82317073 0.84146341 0.82779456 0.82317073]

mean value: 0.8301339481829435

key: test_precision
value: [0.72222222 0.57142857 0.875      0.64705882 0.85       0.52380952
 0.76923077 0.69230769 0.72222222 0.82352941]

mean value: 0.7196809236515119

key: train_precision
value: [0.83125    0.8452381  0.82926829 0.83536585 0.81707317 0.83040936
 0.83333333 0.85185185 0.83030303 0.83333333]

mean value: 0.8337426317857961

key: test_recall
value: [0.68421053 0.63157895 0.73684211 0.57894737 0.94444444 0.61111111
 0.55555556 0.5        0.72222222 0.77777778]

mean value: 0.6742690058479532

key: train_recall
value: [0.80606061 0.86060606 0.82424242 0.83030303 0.80722892 0.85542169
 0.81325301 0.8313253  0.8253012  0.81325301]

mean value: 0.8266995253742242

key: test_roc_auc
value: [0.71052632 0.57894737 0.81578947 0.62280702 0.89327485 0.54239766
 0.69883041 0.64473684 0.72953216 0.80994152]

mean value: 0.7046783625730995

key: train_roc_auc
value: [0.82361854 0.85383244 0.82976827 0.83620415 0.81537916 0.84241673
 0.82721474 0.84507442 0.83029766 0.82721474]

mean value: 0.8331020846685362

key: test_jcc
value: [0.54166667 0.42857143 0.66666667 0.44       0.80952381 0.39285714
 0.47619048 0.40909091 0.56521739 0.66666667]

mean value: 0.5396451157538114

key: train_jcc
value: [0.69270833 0.7434555  0.70466321 0.71354167 0.68367347 0.72820513
 0.69948187 0.72631579 0.70618557 0.69948187]

mean value: 0.7097712394464257

MCC on Blind test: 0.49

Accuracy on Blind test: 0.74

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01723957 0.01586294 0.01570201 0.01627302 0.01574993 0.0172441
 0.01658297 0.01831293 0.01824284 0.01881599]

mean value: 0.017002630233764648

key: score_time
value: [0.0108459  0.01086402 0.010607   0.01053047 0.01058245 0.01136756
 0.01049376 0.01160789 0.01160383 0.01154995]

mean value: 0.011005282402038574

key: test_mcc
value: [0.89473684 0.47633051 0.63960215 0.7888597  0.84959079 0.68035483
 0.78362573 0.83871328 1.         0.83871328]

mean value: 0.7790527120618065

key: train_mcc
value: [0.79118098 0.83279857 0.81532977 0.80977356 0.78568391 0.80949681
 0.80371348 0.8097803  0.79787385 0.77976011]

mean value: 0.8035391354542314

key: test_accuracy
value: [0.94736842 0.73684211 0.81578947 0.89189189 0.91891892 0.83783784
 0.89189189 0.91891892 1.         0.91891892]

mean value: 0.8878378378378379

key: train_accuracy
value: [0.89552239 0.91641791 0.90746269 0.9047619  0.89285714 0.9047619
 0.90178571 0.9047619  0.89880952 0.88988095]

mean value: 0.9017022032693675

key: test_fscore
value: [0.94736842 0.75       0.8        0.88888889 0.92307692 0.84210526
 0.88888889 0.91428571 1.         0.91428571]

mean value: 0.8868899813636656

key: train_fscore
value: [0.89489489 0.91515152 0.90746269 0.90419162 0.89156627 0.90361446
 0.90149254 0.9047619  0.89880952 0.88888889]

mean value: 0.9010834291045358

key: test_precision
value: [0.94736842 0.71428571 0.875      0.94117647 0.85714286 0.8
 0.88888889 0.94117647 1.         0.94117647]

mean value: 0.8906215293134798

key: train_precision
value: [0.88690476 0.91515152 0.89411765 0.89349112 0.89156627 0.90361446
 0.89349112 0.89411765 0.88823529 0.88622754]

mean value: 0.8946917381614027

key: test_recall
value: [0.94736842 0.78947368 0.73684211 0.84210526 1.         0.88888889
 0.88888889 0.88888889 1.         0.88888889]

mean value: 0.8871345029239766

key: train_recall
value: [0.9030303  0.91515152 0.92121212 0.91515152 0.89156627 0.90361446
 0.90963855 0.91566265 0.90963855 0.89156627]

mean value: 0.9076232201533406

key: test_roc_auc
value: [0.94736842 0.73684211 0.81578947 0.89327485 0.92105263 0.83918129
 0.89181287 0.91812865 1.         0.91812865]

mean value: 0.8881578947368421

key: train_roc_auc
value: [0.8956328  0.91639929 0.90766488 0.90494418 0.89284196 0.90474841
 0.9018781  0.90489015 0.89893692 0.88990078]

mean value: 0.9017837462995806

key: test_jcc
value: [0.9        0.6        0.66666667 0.8        0.85714286 0.72727273
 0.8        0.84210526 1.         0.84210526]

mean value: 0.8035292777398041

key: train_jcc
value: [0.80978261 0.84357542 0.83060109 0.82513661 0.80434783 0.82417582
 0.82065217 0.82608696 0.81621622 0.8       ]

mean value: 0.8200574729521878

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [2.2372601  2.04340816 1.6573236  1.768543   1.91152811 1.71026349
 1.82718539 2.05708623 2.5108068  2.36679077]

mean value: 2.0090195655822756

key: score_time
value: [0.01652408 0.01711798 0.01492739 0.02228522 0.02000928 0.01484942
 0.02607751 0.01266646 0.02013946 0.01371574]

mean value: 0.017831254005432128

key: test_mcc
value: [0.89973541 0.58218174 0.63960215 0.74044197 0.7888597  0.6754386
 0.80369958 0.78362573 0.94736842 0.73020842]

mean value: 0.7591161713668702

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.78947368 0.81578947 0.86486486 0.89189189 0.83783784
 0.89189189 0.89189189 0.97297297 0.86486486]

mean value: 0.8768847795163585

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94444444 0.8        0.8        0.85714286 0.89473684 0.83333333
 0.9        0.88888889 0.97297297 0.85714286]

mean value: 0.8748662196030617

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.76190476 0.875      0.9375     0.85       0.83333333
 0.81818182 0.88888889 0.94736842 0.88235294]

mean value: 0.8794530164537905

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.89473684 0.84210526 0.73684211 0.78947368 0.94444444 0.83333333
 1.         0.88888889 1.         0.83333333]

mean value: 0.8763157894736842

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.78947368 0.81578947 0.86695906 0.89327485 0.8377193
 0.89473684 0.89181287 0.97368421 0.86403509]

mean value: 0.8774853801169591

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.89473684 0.66666667 0.66666667 0.75       0.80952381 0.71428571
 0.81818182 0.8        0.94736842 0.75      ]

mean value: 0.781742993848257

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02188754 0.01587939 0.01625609 0.01523471 0.01606822 0.01897764
 0.02061105 0.01994038 0.01674199 0.01598382]

mean value: 0.01775808334350586

key: score_time
value: [0.01280594 0.00951886 0.00900126 0.00924301 0.01010299 0.01299381
 0.01148295 0.0133841  0.0088644  0.00901771]

mean value: 0.01064150333404541

key: test_mcc
value: [0.9486833  0.79388419 0.89973541 0.83918129 0.7888597  0.83918129
 0.94736842 0.89181287 1.         0.83918129]

mean value: 0.8787887738162246

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.89473684 0.94736842 0.91891892 0.89189189 0.91891892
 0.97297297 0.94594595 1.         0.91891892]

mean value: 0.9383357041251779

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.88888889 0.94444444 0.91891892 0.89473684 0.91891892
 0.97297297 0.94444444 1.         0.91891892]

mean value: 0.9376603323971745

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94117647 1.         0.94444444 0.85       0.89473684
 0.94736842 0.94444444 1.         0.89473684]

mean value: 0.9366907464740282

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.84210526 0.89473684 0.89473684 0.94444444 0.94444444
 1.         0.94444444 1.         0.94444444]

mean value: 0.9409356725146198

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.89473684 0.94736842 0.91959064 0.89327485 0.91959064
 0.97368421 0.94590643 1.         0.91959064]

mean value: 0.9387426900584795

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.8        0.89473684 0.85       0.80952381 0.85
 0.94736842 0.89473684 1.         0.85      ]

mean value: 0.8846365914786968

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10830688 0.10529113 0.10499811 0.10781646 0.10907078 0.10594296
 0.10623217 0.10774302 0.12120008 0.11080527]

mean value: 0.10874068737030029

key: score_time
value: [0.01754594 0.01745701 0.01749945 0.01769519 0.01761103 0.01773334
 0.01761365 0.01932955 0.01826262 0.0196104 ]

mean value: 0.01803581714630127

key: test_mcc
value: [1.         0.52704628 0.63960215 0.7888597  0.7888597  0.56725146
 0.6754386  0.83918129 1.         0.84834956]

mean value: 0.7674588721170138

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.76315789 0.81578947 0.89189189 0.89189189 0.78378378
 0.83783784 0.91891892 1.         0.91891892]

mean value: 0.8822190611664296

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.76923077 0.8        0.88888889 0.89473684 0.77777778
 0.83333333 0.91891892 1.         0.90909091]

mean value: 0.879197743934586

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       0.875      0.94117647 0.85       0.77777778
 0.83333333 0.89473684 1.         1.        ]

mean value: 0.892202442380461

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.78947368 0.73684211 0.84210526 0.94444444 0.77777778
 0.83333333 0.94444444 1.         0.83333333]

mean value: 0.8701754385964913

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.76315789 0.81578947 0.89327485 0.89327485 0.78362573
 0.8377193  0.91959064 1.         0.91666667]

mean value: 0.8823099415204678

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.625      0.66666667 0.8        0.80952381 0.63636364
 0.71428571 0.85       1.         0.83333333]

mean value: 0.793517316017316

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.82

Accuracy on Blind test: 0.91

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01310754 0.01101828 0.01016736 0.01003575 0.01700163 0.01136065
 0.01075435 0.01067233 0.01063561 0.0164597 ]

mean value: 0.012121319770812988

key: score_time
value: [0.00934529 0.00985003 0.00910783 0.01338744 0.01081181 0.0099082
 0.00986314 0.00971031 0.01279759 0.01062942]

mean value: 0.010541105270385742

key: test_mcc
value: [0.73786479 0.32732684 0.42163702 0.56725146 0.41299552 0.24269006
 0.35087719 0.73099415 0.35104619 0.52214434]

mean value: 0.466482756739817

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86842105 0.65789474 0.71052632 0.78378378 0.7027027  0.62162162
 0.67567568 0.86486486 0.67567568 0.75675676]

mean value: 0.731792318634424

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.87179487 0.69767442 0.71794872 0.78947368 0.71794872 0.61111111
 0.66666667 0.86486486 0.64705882 0.76923077]

mean value: 0.7353772645910309

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.85       0.625      0.7        0.78947368 0.66666667 0.61111111
 0.66666667 0.84210526 0.6875     0.71428571]

mean value: 0.715280910609858

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.89473684 0.78947368 0.73684211 0.78947368 0.77777778 0.61111111
 0.66666667 0.88888889 0.61111111 0.83333333]

mean value: 0.7599415204678363

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86842105 0.65789474 0.71052632 0.78362573 0.70467836 0.62134503
 0.6754386  0.86549708 0.67397661 0.75877193]

mean value: 0.7320175438596491

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.77272727 0.53571429 0.56       0.65217391 0.56       0.44
 0.5        0.76190476 0.47826087 0.625     ]

mean value: 0.5885781102955016

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.37

Accuracy on Blind test: 0.68

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.74289036 1.53044105 1.49827647 1.53210521 1.52824092 1.50557423
 1.51571012 1.51636791 1.51605439 1.5078156 ]

mean value: 1.5393476247787476

key: score_time
value: [0.09816289 0.09600377 0.09313393 0.09704614 0.09628916 0.09311295
 0.09352589 0.09830904 0.09392619 0.09237742]

mean value: 0.09518873691558838

key: test_mcc
value: [0.9486833  0.79388419 0.80757285 0.94736842 0.89736456 0.89181287
 0.94736842 0.83918129 1.         1.        ]

mean value: 0.9073235893751431

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.89473684 0.89473684 0.97297297 0.94594595 0.94594595
 0.97297297 0.91891892 1.         1.        ]

mean value: 0.95199146514936

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.88888889 0.88235294 0.97297297 0.94736842 0.94444444
 0.97297297 0.91891892 1.         1.        ]

mean value: 0.9502278534786275

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94117647 1.         1.         0.9        0.94444444
 0.94736842 0.89473684 1.         1.        ]

mean value: 0.9577726178190574

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.84210526 0.78947368 0.94736842 1.         0.94444444
 1.         0.94444444 1.         1.        ]

mean value: 0.9467836257309942

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.89473684 0.89473684 0.97368421 0.94736842 0.94590643
 0.97368421 0.91959064 1.         1.        ]

mean value: 0.9523391812865497

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.8        0.78947368 0.94736842 0.9        0.89473684
 0.94736842 0.85       1.         1.        ]

mean value: 0.9078947368421053

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.79304814 0.91387272 1.02754641 0.95623207 0.91034627 0.89128447
 0.93726492 0.88634753 0.91595483 0.94292164]

mean value: 1.0174818992614747

key: score_time
value: [0.26749158 0.23623896 0.25596142 0.2053082  0.22877908 0.22374535
 0.21946526 0.25390697 0.22047591 0.22399354]

mean value: 0.23353662490844726

key: test_mcc
value: [1.         0.68803296 0.76376262 0.84959079 0.89736456 0.7888597
 0.89181287 0.83918129 1.         0.94721815]

mean value: 0.8665822928273674

key: train_mcc
value: [0.94639427 0.97016256 0.96423353 0.96427432 0.96434396 0.95243498
 0.95243498 0.96428065 0.94656062 0.95834146]

mean value: 0.9583461336790984

key: test_accuracy
value: [1.         0.84210526 0.86842105 0.91891892 0.94594595 0.89189189
 0.94594595 0.91891892 1.         0.97297297]

mean value: 0.9305120910384068

key: train_accuracy
value: [0.97313433 0.98507463 0.98208955 0.98214286 0.98214286 0.97619048
 0.97619048 0.98214286 0.97321429 0.97916667]

mean value: 0.9791488983653163

key: test_fscore
value: [1.         0.83333333 0.84848485 0.91428571 0.94736842 0.89473684
 0.94444444 0.91891892 1.         0.97142857]

mean value: 0.9273001094053726

key: train_fscore
value: [0.97247706 0.98489426 0.98170732 0.98181818 0.98181818 0.97575758
 0.97575758 0.98192771 0.97264438 0.97885196]

mean value: 0.9787654207752894

key: test_precision
value: [1.         0.88235294 1.         1.         0.9        0.85
 0.94444444 0.89473684 1.         1.        ]

mean value: 0.9471534227726178

key: train_precision
value: [0.98148148 0.98192771 0.98773006 0.98181818 0.98780488 0.98170732
 0.98170732 0.98192771 0.98159509 0.98181818]

mean value: 0.9829517932373947

key: test_recall
value: [1.         0.78947368 0.73684211 0.84210526 1.         0.94444444
 0.94444444 0.94444444 1.         0.94444444]

mean value: 0.9146198830409357

key: train_recall
value: [0.96363636 0.98787879 0.97575758 0.98181818 0.97590361 0.96987952
 0.96987952 0.98192771 0.96385542 0.97590361]

mean value: 0.974644030668127

key: test_roc_auc
value: [1.         0.84210526 0.86842105 0.92105263 0.94736842 0.89327485
 0.94590643 0.91959064 1.         0.97222222]

mean value: 0.9309941520467836

key: train_roc_auc
value: [0.97299465 0.98511586 0.98199643 0.98213716 0.98206945 0.97611623
 0.97611623 0.98214033 0.97310418 0.97912828]

mean value: 0.9790918811751368

key: test_jcc
value: [1.         0.71428571 0.73684211 0.84210526 0.9        0.80952381
 0.89473684 0.85       1.         0.94444444]

mean value: 0.8691938178780284

key: train_jcc
value: [0.94642857 0.9702381  0.96407186 0.96428571 0.96428571 0.95266272
 0.95266272 0.96449704 0.94674556 0.95857988]

mean value: 0.9584457880519603

MCC on Blind test: 0.85

Accuracy on Blind test: 0.92

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02368402 0.00988269 0.00996494 0.01014757 0.00983524 0.0103724
 0.00982785 0.00976944 0.010041   0.01028967]

mean value: 0.01138148307800293

key: score_time
value: [0.01074362 0.00889611 0.06786227 0.00892687 0.00902653 0.00950575
 0.00921893 0.00906777 0.00902295 0.00916648]

mean value: 0.015143728256225586

key: test_mcc
value: [0.74620251 0.42640143 0.65465367 0.56725146 0.68035483 0.56725146
 0.56725146 0.67849265 0.7888597  0.83918129]

mean value: 0.6515900457032924

key: train_mcc
value: [0.73755882 0.7792393  0.74367201 0.73209888 0.70905196 0.7441844
 0.76402212 0.73287373 0.70870914 0.71482244]

mean value: 0.7366232811135077

key: test_accuracy
value: [0.86842105 0.71052632 0.81578947 0.78378378 0.83783784 0.78378378
 0.78378378 0.83783784 0.89189189 0.91891892]

mean value: 0.8232574679943101

key: train_accuracy
value: [0.86865672 0.88955224 0.87164179 0.86607143 0.85416667 0.87202381
 0.88095238 0.86607143 0.85416667 0.85714286]

mean value: 0.8680445984363895

key: test_fscore
value: [0.87804878 0.73170732 0.78787879 0.78947368 0.84210526 0.77777778
 0.77777778 0.82352941 0.89473684 0.91891892]

mean value: 0.8221954561152628

key: train_fscore
value: [0.86826347 0.88888889 0.87164179 0.86404834 0.85545723 0.87164179
 0.88372093 0.86725664 0.85459941 0.85798817]

mean value: 0.868350664914892

key: test_precision
value: [0.81818182 0.68181818 0.92857143 0.78947368 0.8        0.77777778
 0.77777778 0.875      0.85       0.89473684]

mean value: 0.8193337510442774

key: train_precision
value: [0.85798817 0.88095238 0.85882353 0.86144578 0.83815029 0.86390533
 0.85393258 0.84971098 0.84210526 0.84302326]

mean value: 0.8550037559538748

key: test_recall
value: [0.94736842 0.78947368 0.68421053 0.78947368 0.88888889 0.77777778
 0.77777778 0.77777778 0.94444444 0.94444444]

mean value: 0.8321637426900584

key: train_recall
value: [0.87878788 0.8969697  0.88484848 0.86666667 0.87349398 0.87951807
 0.91566265 0.88554217 0.86746988 0.87349398]

mean value: 0.8822453450164294

key: test_roc_auc
value: [0.86842105 0.71052632 0.81578947 0.78362573 0.83918129 0.78362573
 0.78362573 0.83625731 0.89327485 0.91959064]

mean value: 0.8233918128654971

key: train_roc_auc
value: [0.8688057  0.88966132 0.87183601 0.86608187 0.85439405 0.87211198
 0.88136074 0.8663005  0.85432318 0.85733522]

mean value: 0.8682210557211489

key: test_jcc
value: [0.7826087  0.57692308 0.65       0.65217391 0.72727273 0.63636364
 0.63636364 0.7        0.80952381 0.85      ]

mean value: 0.7021229495142538

key: train_jcc
value: [0.76719577 0.8        0.77248677 0.7606383  0.74742268 0.77248677
 0.79166667 0.765625   0.74611399 0.75129534]

mean value: 0.7674931283545561

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.24965024 0.05168438 0.06280875 0.06240273 0.07617354 0.05706716
 0.05821943 0.06454206 0.06267881 0.09799004]

mean value: 0.0843217134475708

key: score_time
value: [0.01142573 0.01089931 0.01093245 0.01113415 0.01153183 0.01123571
 0.01077247 0.01092672 0.01053596 0.01156497]

mean value: 0.011095929145812988

key: test_mcc
value: [0.9486833  0.84327404 0.89973541 0.83918129 0.94736842 0.94736842
 0.94736842 0.89181287 1.         0.94736842]

mean value: 0.9212160587861828

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.92105263 0.94736842 0.91891892 0.97297297 0.97297297
 0.97297297 0.94594595 1.         0.97297297]

mean value: 0.9598862019914651

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.91891892 0.94444444 0.91891892 0.97297297 0.97297297
 0.97297297 0.94444444 1.         0.97297297]

mean value: 0.9592977592977593

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94444444 1.         0.94444444 0.94736842 0.94736842
 0.94736842 0.94444444 1.         0.94736842]

mean value: 0.957280701754386

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.89473684 0.89473684 0.89473684 1.         1.
 1.         0.94444444 1.         1.        ]

mean value: 0.9628654970760234

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.92105263 0.94736842 0.91959064 0.97368421 0.97368421
 0.97368421 0.94590643 1.         0.97368421]

mean value: 0.960233918128655

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.85       0.89473684 0.85       0.94736842 0.94736842
 0.94736842 0.89473684 1.         0.94736842]

mean value: 0.9228947368421052

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.92

Accuracy on Blind test: 0.96

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03995848 0.03733993 0.03714538 0.03793836 0.08239031 0.03703666
 0.03627944 0.07113647 0.07073593 0.04976606]

mean value: 0.049972701072692874

key: score_time
value: [0.01242304 0.01249933 0.01246953 0.01262975 0.01240516 0.01247954
 0.01251364 0.02486706 0.02162528 0.01249456]

mean value: 0.0146406888961792

key: test_mcc
value: [0.89473684 0.53300179 0.59222009 0.56725146 0.89736456 0.68035483
 0.94736842 0.89679028 0.84834956 0.62170355]

mean value: 0.7479141390576655

key: train_mcc
value: [0.95248307 0.95222816 0.95822045 0.94051126 0.940526   0.9285613
 0.9285613  0.9523742  0.94643395 0.94656062]

mean value: 0.9446460333429583

key: test_accuracy
value: [0.94736842 0.76315789 0.78947368 0.78378378 0.94594595 0.83783784
 0.97297297 0.94594595 0.91891892 0.81081081]

mean value: 0.8716216216216216

key: train_accuracy
value: [0.9761194  0.9761194  0.97910448 0.9702381  0.9702381  0.96428571
 0.96428571 0.97619048 0.97321429 0.97321429]

mean value: 0.9723009950248757

key: test_fscore
value: [0.94736842 0.7804878  0.76470588 0.78947368 0.94736842 0.84210526
 0.97297297 0.94117647 0.90909091 0.8       ]

mean value: 0.8694749829356792

key: train_fscore
value: [0.97546012 0.97575758 0.97885196 0.9695122  0.96969697 0.96385542
 0.96385542 0.97590361 0.97280967 0.97264438]

mean value: 0.9718347329426844

key: test_precision
value: [0.94736842 0.72727273 0.86666667 0.78947368 0.9        0.8
 0.94736842 1.         1.         0.82352941]

mean value: 0.8801679332019889

key: train_precision
value: [0.98757764 0.97575758 0.97590361 0.97546012 0.97560976 0.96385542
 0.96385542 0.97590361 0.97575758 0.98159509]

mean value: 0.9751275834377349

key: test_recall
value: [0.94736842 0.84210526 0.68421053 0.78947368 1.         0.88888889
 1.         0.88888889 0.83333333 0.77777778]

mean value: 0.8652046783625731

key: train_recall
value: [0.96363636 0.97575758 0.98181818 0.96363636 0.96385542 0.96385542
 0.96385542 0.97590361 0.96987952 0.96385542]

mean value: 0.9686053304125594

key: test_roc_auc
value: [0.94736842 0.76315789 0.78947368 0.78362573 0.94736842 0.83918129
 0.97368421 0.94444444 0.91666667 0.80994152]

mean value: 0.8714912280701754

key: train_roc_auc
value: [0.97593583 0.97611408 0.97914439 0.97012228 0.970163   0.96428065
 0.96428065 0.9761871  0.97317505 0.97310418]

mean value: 0.9722507216218284

key: test_jcc
value: [0.9        0.64       0.61904762 0.65217391 0.9        0.72727273
 0.94736842 0.88888889 0.83333333 0.66666667]

mean value: 0.7774751569305345

key: train_jcc
value: [0.95209581 0.95266272 0.95857988 0.9408284  0.94117647 0.93023256
 0.93023256 0.95294118 0.94705882 0.94674556]

mean value: 0.9452553963297876

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02484918 0.00989151 0.01043844 0.01072907 0.01075554 0.01006603
 0.00961328 0.0097363  0.00997925 0.00970602]

mean value: 0.011576461791992187

key: score_time
value: [0.01031947 0.00934625 0.00986648 0.00878239 0.00917149 0.00902915
 0.00898862 0.00896931 0.00876212 0.0088017 ]

mean value: 0.009203696250915527

key: test_mcc
value: [0.78947368 0.31622777 0.61017022 0.63129316 0.67849265 0.63129316
 0.73020842 0.73821295 0.62170355 0.78362573]

mean value: 0.6530701274630709

key: train_mcc
value: [0.67195163 0.71367434 0.64874079 0.72056751 0.63176039 0.66756867
 0.75591389 0.64914987 0.66690353 0.71425535]

mean value: 0.6840485961335292

key: test_accuracy
value: [0.89473684 0.65789474 0.78947368 0.81081081 0.83783784 0.81081081
 0.86486486 0.86486486 0.81081081 0.89189189]

mean value: 0.8233997155049787

key: train_accuracy
value: [0.8358209  0.85671642 0.8238806  0.86011905 0.81547619 0.83333333
 0.87797619 0.82440476 0.83333333 0.85714286]

mean value: 0.8418203624733476

key: test_fscore
value: [0.89473684 0.64864865 0.75       0.8        0.82352941 0.82051282
 0.85714286 0.84848485 0.8        0.88888889]

mean value: 0.8131944317548032

key: train_fscore
value: [0.82972136 0.85185185 0.81504702 0.85448916 0.80745342 0.82608696
 0.87613293 0.81846154 0.82822086 0.85454545]

mean value: 0.8362010555198316

key: test_precision
value: [0.89473684 0.66666667 0.92307692 0.875      0.875      0.76190476
 0.88235294 0.93333333 0.82352941 0.88888889]

mean value: 0.8524489768917013

key: train_precision
value: [0.84810127 0.86792453 0.84415584 0.87341772 0.83333333 0.8525641
 0.87878788 0.83647799 0.84375    0.8597561 ]

mean value: 0.8538268759467177

key: test_recall
value: [0.89473684 0.63157895 0.63157895 0.73684211 0.77777778 0.88888889
 0.83333333 0.77777778 0.77777778 0.88888889]

mean value: 0.7839181286549708

key: train_recall
value: [0.81212121 0.83636364 0.78787879 0.83636364 0.78313253 0.80120482
 0.87349398 0.80120482 0.81325301 0.84939759]

mean value: 0.8194414019715224

key: test_roc_auc
value: [0.89473684 0.65789474 0.78947368 0.8128655  0.83625731 0.8128655
 0.86403509 0.8625731  0.80994152 0.89181287]

mean value: 0.8232456140350878

key: train_roc_auc
value: [0.83547237 0.85641711 0.82335116 0.85970229 0.81509568 0.83295535
 0.87792346 0.82413182 0.83309709 0.85705174]

mean value: 0.8415198065929164

key: test_jcc
value: [0.80952381 0.48       0.6        0.66666667 0.7        0.69565217
 0.75       0.73684211 0.66666667 0.8       ]

mean value: 0.6905351422033345

key: train_jcc
value: [0.70899471 0.74193548 0.68783069 0.74594595 0.67708333 0.7037037
 0.77956989 0.69270833 0.70680628 0.74603175]

mean value: 0.7190610118240058

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01237154 0.01653957 0.0177362  0.02279115 0.01850939 0.01929617
 0.01568437 0.01943159 0.01927876 0.01795244]

mean value: 0.017959117889404297

key: score_time
value: [0.00886655 0.01132083 0.01133919 0.01192641 0.01196933 0.01196837
 0.01189017 0.01204634 0.01206517 0.01196551]

mean value: 0.01153578758239746

key: test_mcc
value: [0.89473684 0.63960215 0.4330127  0.74044197 0.78362573 0.63129316
 0.7888597  0.83918129 0.84959079 0.84834956]

mean value: 0.7448693884912015

key: train_mcc
value: [0.88773584 0.90033348 0.49718308 0.92909689 0.85527622 0.91673163
 0.86308142 0.91673163 0.85029687 0.87836587]

mean value: 0.8494832930737864

key: test_accuracy
value: [0.94736842 0.81578947 0.65789474 0.86486486 0.89189189 0.81081081
 0.89189189 0.91891892 0.91891892 0.91891892]

mean value: 0.8637268847795164

key: train_accuracy
value: [0.94328358 0.94925373 0.69552239 0.96428571 0.92261905 0.95833333
 0.93154762 0.95833333 0.92261905 0.9375    ]

mean value: 0.9183297796730633

key: test_fscore
value: [0.94736842 0.8        0.74509804 0.85714286 0.88888889 0.82051282
 0.89473684 0.91891892 0.92307692 0.90909091]

mean value: 0.8704834620004899

key: train_fscore
value: [0.94080997 0.94670846 0.76388889 0.96296296 0.91503268 0.95808383
 0.9305136  0.95808383 0.92571429 0.93375394]

mean value: 0.9235552453156383

key: test_precision
value: [0.94736842 0.875      0.59375    0.9375     0.88888889 0.76190476
 0.85       0.89473684 0.85714286 1.        ]

mean value: 0.8606291771094402

key: train_precision
value: [0.96794872 0.98051948 0.61797753 0.98113208 1.         0.95238095
 0.93333333 0.95238095 0.88043478 0.98013245]

mean value: 0.9246240273064844

key: test_recall
value: [0.94736842 0.73684211 1.         0.78947368 0.88888889 0.88888889
 0.94444444 0.94444444 1.         0.83333333]

mean value: 0.8973684210526316

key: train_recall
value: [0.91515152 0.91515152 1.         0.94545455 0.84337349 0.96385542
 0.92771084 0.96385542 0.97590361 0.89156627]

mean value: 0.9342022635998539

key: test_roc_auc
value: [0.94736842 0.81578947 0.65789474 0.86695906 0.89181287 0.8128655
 0.89327485 0.91959064 0.92105263 0.91666667]

mean value: 0.8643274853801169

key: train_roc_auc
value: [0.94286988 0.94875223 0.7        0.96395534 0.92168675 0.9583983
 0.93150248 0.9583983  0.92324592 0.9369596 ]

mean value: 0.9185768799939413

key: test_jcc
value: [0.9        0.66666667 0.59375    0.75       0.8        0.69565217
 0.80952381 0.85       0.85714286 0.83333333]

mean value: 0.7756068840579711

key: train_jcc
value: [0.88823529 0.89880952 0.61797753 0.92857143 0.84337349 0.91954023
 0.8700565  0.91954023 0.86170213 0.87573964]

mean value: 0.8623545998139636

MCC on Blind test: 0.83

Accuracy on Blind test: 0.91

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01700234 0.01797867 0.01679993 0.01786995 0.0184319  0.01814771
 0.01903105 0.01888227 0.0194633  0.01629138]

mean value: 0.017989850044250487

key: score_time
value: [0.01207399 0.01205277 0.012187   0.01223421 0.01236439 0.01218557
 0.01222467 0.01221824 0.01213694 0.01256537]

mean value: 0.012224316596984863

key: test_mcc
value: [0.78947368 0.16439899 0.73786479 0.7163504  0.84959079 0.68035483
 0.83871328 0.67849265 0.94721815 0.40611643]

mean value: 0.6808573989700671

key: train_mcc
value: [0.89255789 0.36277429 0.87738561 0.86960067 0.89521641 0.92337258
 0.87750371 0.90781863 0.79887733 0.48613777]

mean value: 0.7891244887522015

key: test_accuracy
value: [0.89473684 0.52631579 0.86842105 0.83783784 0.91891892 0.83783784
 0.91891892 0.83783784 0.97297297 0.64864865]

mean value: 0.82624466571835

key: train_accuracy
value: [0.94626866 0.6119403  0.93731343 0.93154762 0.94642857 0.96130952
 0.9375     0.95238095 0.88988095 0.69345238]

mean value: 0.8808022388059702

key: test_fscore
value: [0.89473684 0.67857143 0.86486486 0.8125     0.92307692 0.84210526
 0.91428571 0.82352941 0.97142857 0.43478261]

mean value: 0.8159881627951018

key: train_fscore
value: [0.94512195 0.7173913  0.93877551 0.92556634 0.94767442 0.96
 0.93416928 0.94968553 0.87457627 0.55021834]

mean value: 0.8743178952803997

key: test_precision
value: [0.89473684 0.51351351 0.88888889 1.         0.85714286 0.8
 0.94117647 0.875      1.         1.        ]

mean value: 0.8770458572238757

key: train_precision
value: [0.95092025 0.55932203 0.90449438 0.99305556 0.91573034 0.98113208
 0.97385621 0.99342105 1.         1.        ]

mean value: 0.9271931891207361

key: test_recall
value: [0.89473684 1.         0.84210526 0.68421053 1.         0.88888889
 0.88888889 0.77777778 0.94444444 0.27777778]

mean value: 0.8198830409356725

key: train_recall
value: [0.93939394 1.         0.97575758 0.86666667 0.98192771 0.93975904
 0.89759036 0.90963855 0.77710843 0.37951807]

mean value: 0.8667360350492881

key: test_roc_auc
value: [0.89473684 0.52631579 0.86842105 0.84210526 0.92105263 0.83918129
 0.91812865 0.83625731 0.97222222 0.63888889]

mean value: 0.8257309941520468

key: train_roc_auc
value: [0.94616756 0.61764706 0.93787879 0.93040936 0.94684621 0.96105599
 0.93703047 0.9518781  0.88855422 0.68975904]

mean value: 0.8807226786873548

key: test_jcc
value: [0.80952381 0.51351351 0.76190476 0.68421053 0.85714286 0.72727273
 0.84210526 0.7        0.94444444 0.27777778]

mean value: 0.7117895681053575

key: train_jcc
value: [0.89595376 0.55932203 0.88461538 0.86144578 0.90055249 0.92307692
 0.87647059 0.90419162 0.77710843 0.37951807]

mean value: 0.7962255079162279

MCC on Blind test: 0.7

Accuracy on Blind test: 0.84

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.16781831 0.1466465  0.15688443 0.15079045 0.14964795 0.15363979
 0.15089941 0.15201402 0.14735866 0.15484333]

mean value: 0.15305428504943847

key: score_time
value: [0.01549721 0.01608682 0.01628304 0.01581788 0.01523232 0.01664662
 0.01632547 0.01588321 0.01541471 0.01603723]

mean value: 0.01592245101928711

key: test_mcc
value: [0.9486833  0.89473684 0.85280287 0.83918129 0.89736456 0.89181287
 0.94736842 0.94721815 1.         0.94736842]

mean value: 0.9166536711379966

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.94736842 0.92105263 0.91891892 0.94594595 0.94594595
 0.97297297 0.97297297 1.         0.97297297]

mean value: 0.9571834992887625

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.94736842 0.91428571 0.91891892 0.94736842 0.94444444
 0.97297297 0.97142857 1.         0.97297297]

mean value: 0.9564119411487833

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94736842 1.         0.94444444 0.9        0.94444444
 0.94736842 1.         1.         0.94736842]

mean value: 0.9580994152046783

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.94736842 0.84210526 0.89473684 1.         0.94444444
 1.         0.94444444 1.         1.        ]

mean value: 0.9573099415204678

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.92105263 0.91959064 0.94736842 0.94590643
 0.97368421 0.97222222 1.         0.97368421]

mean value: 0.9574561403508772

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.9        0.84210526 0.85       0.9        0.89473684
 0.94736842 0.94444444 1.         0.94736842]

mean value: 0.9176023391812865

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.96

Accuracy on Blind test: 0.98

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05195355 0.04780698 0.07163262 0.05937195 0.06207514 0.06070542
 0.06475472 0.06434655 0.0617311  0.04681015]

mean value: 0.05911881923675537

key: score_time
value: [0.02426767 0.02480149 0.030478   0.02772665 0.03134751 0.02338552
 0.03266764 0.02413774 0.01961756 0.02972317]

mean value: 0.026815295219421387

key: test_mcc
value: [0.9486833  0.89973541 0.76376262 0.94736842 0.89736456 0.89181287
 0.94736842 0.89181287 1.         0.94736842]

mean value: 0.9135276881295149

key: train_mcc
value: [0.99404571 1.         0.99404571 1.         1.         0.99406397
 0.99406397 0.98816193 0.98229327 0.98229327]

mean value: 0.9928967834016768

key: test_accuracy
value: [0.97368421 0.94736842 0.86842105 0.97297297 0.94594595 0.94594595
 0.97297297 0.94594595 1.         0.97297297]

mean value: 0.9546230440967283

key: train_accuracy
value: [0.99701493 1.         0.99701493 1.         1.         0.99702381
 0.99702381 0.99404762 0.99107143 0.99107143]

mean value: 0.9964267945984364

key: test_fscore
value: [0.97435897 0.94444444 0.84848485 0.97297297 0.94736842 0.94444444
 0.97297297 0.94444444 1.         0.97297297]

mean value: 0.9522464496148707

key: train_fscore
value: [0.99696049 1.         0.99696049 1.         1.         0.99697885
 0.99697885 0.99393939 0.99088146 0.99088146]

mean value: 0.9963580988444394

key: test_precision
value: [0.95       1.         1.         1.         0.9        0.94444444
 0.94736842 0.94444444 1.         0.94736842]

mean value: 0.9633625730994152

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.89473684 0.73684211 0.94736842 1.         0.94444444
 1.         0.94444444 1.         1.        ]

mean value: 0.9467836257309942

key: train_recall
value: [0.99393939 1.         0.99393939 1.         1.         0.9939759
 0.9939759  0.98795181 0.98192771 0.98192771]

mean value: 0.9927637824023366

key: test_roc_auc
value: [0.97368421 0.94736842 0.86842105 0.97368421 0.94736842 0.94590643
 0.97368421 0.94590643 1.         0.97368421]

mean value: 0.9549707602339181

key: train_roc_auc
value: [0.9969697  1.         0.9969697  1.         1.         0.99698795
 0.99698795 0.9939759  0.99096386 0.99096386]

mean value: 0.9963818912011683

key: test_jcc
value: [0.95       0.89473684 0.73684211 0.94736842 0.9        0.89473684
 0.94736842 0.89473684 1.         0.94736842]

mean value: 0.9113157894736842

key: train_jcc
value: [0.99393939 1.         0.99393939 1.         1.         0.9939759
 0.9939759  0.98795181 0.98192771 0.98192771]

mean value: 0.9927637824023366

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.07039738 0.0819912  0.04745841 0.07536864 0.0871973  0.08062267
 0.09268117 0.08872008 0.07291102 0.04991651]

mean value: 0.07472643852233887

key: score_time
value: [0.02195954 0.01393104 0.01386929 0.02197075 0.02660823 0.0219326
 0.02215648 0.02231789 0.01488566 0.02564073]

mean value: 0.020527219772338866

key: test_mcc
value: [0.79388419 0.31622777 0.69989647 0.42489158 0.7888597  0.29766651
 0.56934383 0.56725146 0.62170355 0.4670794 ]

mean value: 0.554680445041403

key: train_mcc
value: [0.99404571 0.99404571 0.99404571 0.99406271 1.         1.
 0.99406397 0.99406397 0.99406397 0.99406397]

mean value: 0.9952455741790717

key: test_accuracy
value: [0.89473684 0.65789474 0.84210526 0.7027027  0.89189189 0.64864865
 0.78378378 0.78378378 0.81081081 0.72972973]

mean value: 0.7746088193456615

key: train_accuracy
value: [0.99701493 0.99701493 0.99701493 0.99702381 1.         1.
 0.99702381 0.99702381 0.99702381 0.99702381]

mean value: 0.9976163823738451

key: test_fscore
value: [0.88888889 0.66666667 0.82352941 0.66666667 0.89473684 0.60606061
 0.76470588 0.77777778 0.8        0.6875    ]

mean value: 0.7576532742283516

key: train_fscore
value: [0.99696049 0.99696049 0.99696049 0.99696049 1.         1.
 0.99697885 0.99697885 0.99697885 0.99697885]

mean value: 0.9975757353143739

key: test_precision
value: [0.94117647 0.65       0.93333333 0.78571429 0.85       0.66666667
 0.8125     0.77777778 0.82352941 0.78571429]

mean value: 0.802641223155929

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.84210526 0.68421053 0.73684211 0.57894737 0.94444444 0.55555556
 0.72222222 0.77777778 0.77777778 0.61111111]

mean value: 0.7230994152046784

key: train_recall
value: [0.99393939 0.99393939 0.99393939 0.99393939 1.         1.
 0.9939759  0.9939759  0.9939759  0.9939759 ]

mean value: 0.9951661190215407

key: test_roc_auc
value: [0.89473684 0.65789474 0.84210526 0.70614035 0.89327485 0.64619883
 0.78216374 0.78362573 0.80994152 0.72660819]

mean value: 0.7742690058479532

key: train_roc_auc
value: [0.9969697  0.9969697  0.9969697  0.9969697  1.         1.
 0.99698795 0.99698795 0.99698795 0.99698795]

mean value: 0.9975830595107703

key: test_jcc
value: [0.8        0.5        0.7        0.5        0.80952381 0.43478261
 0.61904762 0.63636364 0.66666667 0.52380952]

mean value: 0.6190193864106908

key: train_jcc
value: [0.99393939 0.99393939 0.99393939 0.99393939 1.         1.
 0.9939759  0.9939759  0.9939759  0.9939759 ]

mean value: 0.9951661190215407

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.56135869 0.54637766 0.55692697 0.5550313  0.54440451 0.53952956
 0.54246831 0.550493   0.56960869 0.54170847]

mean value: 0.5507907152175904

key: score_time
value: [0.00987864 0.00982213 0.00943971 0.00938654 0.00945401 0.00924611
 0.0097692  0.01039886 0.00971866 0.00918841]

mean value: 0.009630227088928222

key: test_mcc
value: [0.9486833  0.84327404 0.89973541 0.89181287 0.89736456 0.89181287
 0.94736842 0.89181287 1.         0.89736456]

mean value: 0.9109228893996735

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.92105263 0.94736842 0.94594595 0.94594595 0.94594595
 0.97297297 0.94594595 1.         0.94594595]

mean value: 0.9544807965860598

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.91891892 0.94444444 0.94736842 0.94736842 0.94444444
 0.97297297 0.94444444 1.         0.94736842]

mean value: 0.9541689462742095

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94444444 1.         0.94736842 0.9        0.94444444
 0.94736842 0.94444444 1.         0.9       ]

mean value: 0.9478070175438597

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.89473684 0.89473684 0.94736842 1.         0.94444444
 1.         0.94444444 1.         1.        ]

mean value: 0.9625730994152046

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.92105263 0.94736842 0.94590643 0.94736842 0.94590643
 0.97368421 0.94590643 1.         0.94736842]

mean value: 0.9548245614035088

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.85       0.89473684 0.9        0.9        0.89473684
 0.94736842 0.89473684 1.         0.9       ]

mean value: 0.9131578947368421

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02600288 0.02697515 0.02713609 0.02771711 0.0268352  0.02741385
 0.02773118 0.04674792 0.04764318 0.03253341]

mean value: 0.03167359828948975

key: score_time
value: [0.01562953 0.01249695 0.01259661 0.01253057 0.01317787 0.04645109
 0.0191071  0.01679397 0.01876235 0.01601005]

mean value: 0.018355607986450195

key: test_mcc
value: [0.48454371 0.31622777 0.45291081 0.26327408 0.51319869 0.24408665
 0.13259028 0.35104619 0.52960948 0.18768409]

mean value: 0.3475171758801787

key: train_mcc
value: [0.88134724 0.68582485 0.76325259 0.70543403 0.91426696 0.9285613
 0.70124655 0.74453167 0.9253171  0.7690121 ]

mean value: 0.8018794389354479

key: test_accuracy
value: [0.73684211 0.65789474 0.71052632 0.62162162 0.75675676 0.62162162
 0.56756757 0.67567568 0.75675676 0.59459459]

mean value: 0.6699857752489331

key: train_accuracy
value: [0.93731343 0.82089552 0.86865672 0.83333333 0.95535714 0.96428571
 0.83035714 0.85714286 0.96130952 0.87202381]

mean value: 0.8900675195451315

key: test_fscore
value: [0.70588235 0.64864865 0.64516129 0.5625     0.74285714 0.5625
 0.5        0.64705882 0.70967742 0.57142857]

mean value: 0.629571424908237

key: train_fscore
value: [0.93203883 0.77777778 0.84615385 0.79562044 0.95268139 0.96385542
 0.79272727 0.83098592 0.95924765 0.85121107]

mean value: 0.8702299616326061

key: test_precision
value: [0.8        0.66666667 0.83333333 0.69230769 0.76470588 0.64285714
 0.57142857 0.6875     0.84615385 0.58823529]

mean value: 0.709318842921784

key: train_precision
value: [1.         1.         1.         1.         1.         0.96385542
 1.         1.         1.         1.        ]

mean value: 0.9963855421686747

key: test_recall
value: [0.63157895 0.63157895 0.52631579 0.47368421 0.72222222 0.5
 0.44444444 0.61111111 0.61111111 0.55555556]

mean value: 0.5707602339181287

key: train_recall
value: [0.87272727 0.63636364 0.73333333 0.66060606 0.90963855 0.96385542
 0.65662651 0.71084337 0.92168675 0.74096386]

mean value: 0.7806644760861629

key: test_roc_auc
value: [0.73684211 0.65789474 0.71052632 0.62573099 0.75584795 0.61842105
 0.56432749 0.67397661 0.75292398 0.59356725]

mean value: 0.6690058479532164

key: train_roc_auc
value: [0.93636364 0.81818182 0.86666667 0.83030303 0.95481928 0.96428065
 0.82831325 0.85542169 0.96084337 0.87048193]

mean value: 0.8885675321607285

key: test_jcc
value: [0.54545455 0.48       0.47619048 0.39130435 0.59090909 0.39130435
 0.33333333 0.47826087 0.55       0.4       ]

mean value: 0.46367570111048373

key: train_jcc
value: [0.87272727 0.63636364 0.73333333 0.66060606 0.90963855 0.93023256
 0.65662651 0.71084337 0.92168675 0.74096386]

mean value: 0.7773021897314416

MCC on Blind test: 0.45

Accuracy on Blind test: 0.72

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02517295 0.03688502 0.04392886 0.03298044 0.03695798 0.03685045
 0.03669524 0.03654885 0.03679013 0.0368619 ]

mean value: 0.03596718311309814

key: score_time
value: [0.02065396 0.02096987 0.02294397 0.02419424 0.02160645 0.01831365
 0.0233593  0.02103519 0.02230525 0.02347684]

mean value: 0.02188587188720703

key: test_mcc
value: [0.89473684 0.47633051 0.68803296 0.68035483 0.89736456 0.62280702
 0.89736456 0.89181287 1.         0.89679028]

mean value: 0.7945594433337401

key: train_mcc
value: [0.89251337 0.89259616 0.88671444 0.89290904 0.88691246 0.88700621
 0.88101481 0.8870542  0.88691246 0.88691246]

mean value: 0.8880545619279835

key: test_accuracy
value: [0.94736842 0.73684211 0.84210526 0.83783784 0.94594595 0.81081081
 0.94594595 0.94594595 1.         0.94594595]

mean value: 0.8958748221906117

key: train_accuracy
value: [0.94626866 0.94626866 0.94328358 0.94642857 0.94345238 0.94345238
 0.94047619 0.94345238 0.94345238 0.94345238]

mean value: 0.9439987562189055

key: test_fscore
value: [0.94736842 0.75       0.83333333 0.83333333 0.94736842 0.81081081
 0.94736842 0.94444444 1.         0.94117647]

mean value: 0.8955203655668051

key: train_fscore
value: [0.94545455 0.94578313 0.94294294 0.94578313 0.94294294 0.94224924
 0.94011976 0.94328358 0.94294294 0.94294294]

mean value: 0.9434445164976734

key: test_precision
value: [0.94736842 0.71428571 0.88235294 0.88235294 0.9        0.78947368
 0.9        0.94444444 1.         1.        ]

mean value: 0.8960278146346258

key: train_precision
value: [0.94545455 0.94011976 0.93452381 0.94011976 0.94011976 0.95092025
 0.93452381 0.93491124 0.94011976 0.94011976]

mean value: 0.9400932454899698

key: test_recall
value: [0.94736842 0.78947368 0.78947368 0.78947368 1.         0.83333333
 1.         0.94444444 1.         0.88888889]

mean value: 0.8982456140350877

key: train_recall
value: [0.94545455 0.95151515 0.95151515 0.95151515 0.94578313 0.93373494
 0.94578313 0.95180723 0.94578313 0.94578313]

mean value: 0.9468674698795181

key: test_roc_auc
value: [0.94736842 0.73684211 0.84210526 0.83918129 0.94736842 0.81140351
 0.94736842 0.94590643 1.         0.94444444]

mean value: 0.8961988304093568

key: train_roc_auc
value: [0.94625668 0.94634581 0.94340463 0.94651781 0.9434798  0.94333806
 0.94053863 0.94355067 0.9434798  0.9434798 ]

mean value: 0.9440391700962778

key: test_jcc
value: [0.9        0.6        0.71428571 0.71428571 0.9        0.68181818
 0.9        0.89473684 1.         0.88888889]

mean value: 0.8194015341383762

key: train_jcc
value: [0.89655172 0.89714286 0.89204545 0.89714286 0.89204545 0.8908046
 0.88700565 0.89265537 0.89204545 0.89204545]

mean value: 0.8929484871255766

MCC on Blind test: 0.79

Accuracy on Blind test: 0.9

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.25039506 0.26338816 0.26077509 0.28984904 0.34190536 0.20279026
 0.25511289 0.26039338 0.27494764 0.29335022]

mean value: 0.26929070949554446

key: score_time
value: [0.02418995 0.02259231 0.02381563 0.0246079  0.02679062 0.02062678
 0.02431607 0.02038789 0.02436209 0.03717709]

mean value: 0.02488663196563721

key: test_mcc
value: [0.89473684 0.47633051 0.65465367 0.68035483 0.89736456 0.62280702
 0.78362573 0.89181287 1.         0.89679028]

mean value: 0.7798476311382899

key: train_mcc
value: [0.89251337 0.89259616 0.95822045 0.89290904 0.88691246 0.88700621
 0.80949681 0.8870542  0.88691246 0.88691246]

mean value: 0.8880533628410842

key: test_accuracy
value: [0.94736842 0.73684211 0.81578947 0.83783784 0.94594595 0.81081081
 0.89189189 0.94594595 1.         0.94594595]

mean value: 0.8878378378378379

key: train_accuracy
value: [0.94626866 0.94626866 0.97910448 0.94642857 0.94345238 0.94345238
 0.9047619  0.94345238 0.94345238 0.94345238]

mean value: 0.9440094171997157

key: test_fscore
value: [0.94736842 0.75       0.78787879 0.83333333 0.94736842 0.81081081
 0.88888889 0.94444444 1.         0.94117647]

mean value: 0.8851269578049764

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.94545455 0.94578313 0.97885196 0.94578313 0.94294294 0.94224924
 0.90361446 0.94328358 0.94294294 0.94294294]

mean value: 0.9433848883132298

key: test_precision
value: [0.94736842 0.71428571 0.92857143 0.88235294 0.9        0.78947368
 0.88888889 0.94444444 1.         1.        ]

mean value: 0.8995385522630105

key: train_precision
value: [0.94545455 0.94011976 0.97590361 0.94011976 0.94011976 0.95092025
 0.90361446 0.93491124 0.94011976 0.94011976]

mean value: 0.9411402908141235

key: test_recall
value: [0.94736842 0.78947368 0.68421053 0.78947368 1.         0.83333333
 0.88888889 0.94444444 1.         0.88888889]

mean value: 0.876608187134503

key: train_recall
value: [0.94545455 0.95151515 0.98181818 0.95151515 0.94578313 0.93373494
 0.90361446 0.95180723 0.94578313 0.94578313]

mean value: 0.9456809054399415

key: test_roc_auc
value: [0.94736842 0.73684211 0.81578947 0.83918129 0.94736842 0.81140351
 0.89181287 0.94590643 1.         0.94444444]

mean value: 0.8880116959064328

key: train_roc_auc
value: [0.94625668 0.94634581 0.97914439 0.94651781 0.9434798  0.94333806
 0.90474841 0.94355067 0.9434798  0.9434798 ]

mean value: 0.9440341231706072

key: test_jcc
value: [0.9        0.6        0.65       0.71428571 0.9        0.68181818
 0.8        0.89473684 1.         0.88888889]

mean value: 0.8029729627098048

key: train_jcc
value: [0.89655172 0.89714286 0.95857988 0.89714286 0.89204545 0.8908046
 0.82417582 0.89265537 0.89204545 0.89204545]

mean value: 0.8933189472825426

MCC on Blind test: 0.79

Accuracy on Blind test: 0.9

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03251314 0.03675842 0.04648995 0.03551435 0.03649688 0.03578806
 0.03548098 0.0361886  0.02459049 0.03698349]

mean value: 0.0356804370880127

key: score_time
value: [0.01281786 0.0131588  0.01396585 0.01327658 0.01287436 0.01276183
 0.0128417  0.01410079 0.01277828 0.01403594]

mean value: 0.013261198997497559

key: test_mcc
value: [0.9486833  0.9486833  0.73786479 0.84327404 0.73786479 0.63245553
 0.89473684 0.79388419 0.62807634 0.78764146]

mean value: 0.7953164573929877

key: train_mcc
value: [0.85307402 0.87648575 0.87660709 0.85906136 0.86472084 0.87660709
 0.88241401 0.87648575 0.88269694 0.88275364]

mean value: 0.8730906512039277

key: test_accuracy
value: [0.97368421 0.97368421 0.86842105 0.92105263 0.86842105 0.81578947
 0.94736842 0.89473684 0.81081081 0.89189189]

mean value: 0.8965860597439544

key: train_accuracy
value: [0.92647059 0.93823529 0.93823529 0.92941176 0.93235294 0.93823529
 0.94117647 0.93823529 0.94134897 0.94134897]

mean value: 0.9365050888390547

key: test_fscore
value: [0.97435897 0.97435897 0.86486486 0.92307692 0.86486486 0.81081081
 0.94736842 0.9        0.82926829 0.88235294]

mean value: 0.8971325067247442

key: train_fscore
value: [0.9271137  0.93841642 0.93877551 0.93023256 0.93255132 0.93877551
 0.94152047 0.9380531  0.94117647 0.94186047]

mean value: 0.9368475523992993

key: test_precision
value: [0.95       0.95       0.88888889 0.9        0.88888889 0.83333333
 0.94736842 0.85714286 0.77272727 0.9375    ]

mean value: 0.8925849662033872

key: train_precision
value: [0.91907514 0.93567251 0.93063584 0.91954023 0.92982456 0.93063584
 0.93604651 0.9408284  0.94117647 0.93641618]

mean value: 0.9319851696271803

key: test_recall
value: [1.         1.         0.84210526 0.94736842 0.84210526 0.78947368
 0.94736842 0.94736842 0.89473684 0.83333333]

mean value: 0.9043859649122807

key: train_recall
value: [0.93529412 0.94117647 0.94705882 0.94117647 0.93529412 0.94705882
 0.94705882 0.93529412 0.94117647 0.94736842]

mean value: 0.9417956656346749

key: test_roc_auc
value: [0.97368421 0.97368421 0.86842105 0.92105263 0.86842105 0.81578947
 0.94736842 0.89473684 0.80847953 0.89035088]

mean value: 0.8961988304093567

key: train_roc_auc
value: [0.92647059 0.93823529 0.93823529 0.92941176 0.93235294 0.93823529
 0.94117647 0.93823529 0.94134847 0.94133127]

mean value: 0.9365032679738562

key: test_jcc
value: [0.95       0.95       0.76190476 0.85714286 0.76190476 0.68181818
 0.9        0.81818182 0.70833333 0.78947368]

mean value: 0.817875939849624

key: train_jcc
value: [0.86413043 0.8839779  0.88461538 0.86956522 0.87362637 0.88461538
 0.88950276 0.88333333 0.88888889 0.89010989]

mean value: 0.8812365570346593

MCC on Blind test: 0.84

Accuracy on Blind test: 0.92

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.75315642 1.01591659 0.78028083 0.772331   0.9129324  0.76234174
 0.9127357  0.87172246 0.7669878  0.96696448]

mean value: 0.8515369415283203

key: score_time
value: [0.01336694 0.01260972 0.01373863 0.01248026 0.01332521 0.01331091
 0.01295376 0.01237464 0.01341295 0.01255274]

mean value: 0.013012576103210449

key: test_mcc
value: [0.9486833  0.89973541 0.73786479 0.79388419 0.78947368 0.63245553
 0.84327404 0.79388419 0.68035483 0.78764146]

mean value: 0.790725142095092

key: train_mcc
value: [0.88825066 0.89417953 1.         0.8177744  0.90014017 0.91178048
 0.97647059 0.8058963  1.         0.90030617]

mean value: 0.9094798300569955

key: test_accuracy
value: [0.97368421 0.94736842 0.86842105 0.89473684 0.89473684 0.81578947
 0.92105263 0.89473684 0.83783784 0.89189189]

mean value: 0.8940256045519204

key: train_accuracy
value: [0.94411765 0.94705882 1.         0.90882353 0.95       0.95588235
 0.98823529 0.90294118 1.         0.95014663]

mean value: 0.9547205451095394

key: test_fscore
value: [0.97435897 0.95       0.87179487 0.9        0.89473684 0.81081081
 0.92307692 0.9        0.83333333 0.88235294]

mean value: 0.8940464696656647

key: train_fscore
value: [0.94428152 0.94736842 1.         0.90962099 0.95043732 0.95601173
 0.98823529 0.90265487 1.         0.95043732]

mean value: 0.9549047464381037

key: test_precision
value: [0.95       0.9047619  0.85       0.85714286 0.89473684 0.83333333
 0.9        0.85714286 0.88235294 0.9375    ]

mean value: 0.8866970735662686

key: train_precision
value: [0.94152047 0.94186047 1.         0.9017341  0.94219653 0.95321637
 0.98823529 0.90532544 1.         0.94767442]

mean value: 0.9521763099568973

key: test_recall
value: [1.         1.         0.89473684 0.94736842 0.89473684 0.78947368
 0.94736842 0.94736842 0.78947368 0.83333333]

mean value: 0.9043859649122807

key: train_recall
value: [0.94705882 0.95294118 1.         0.91764706 0.95882353 0.95882353
 0.98823529 0.9        1.         0.95321637]

mean value: 0.9576745786033711

key: test_roc_auc
value: [0.97368421 0.94736842 0.86842105 0.89473684 0.89473684 0.81578947
 0.92105263 0.89473684 0.83918129 0.89035088]

mean value: 0.8940058479532164

key: train_roc_auc
value: [0.94411765 0.94705882 1.         0.90882353 0.95       0.95588235
 0.98823529 0.90294118 1.         0.9501376 ]

mean value: 0.9547196422428621

key: test_jcc
value: [0.95       0.9047619  0.77272727 0.81818182 0.80952381 0.68181818
 0.85714286 0.81818182 0.71428571 0.78947368]

mean value: 0.8116097060833903

key: train_jcc
value: [0.89444444 0.9        1.         0.8342246  0.90555556 0.91573034
 0.97674419 0.82258065 1.         0.90555556]

mean value: 0.9154835322772491

MCC on Blind test: 0.76

Accuracy on Blind test: 0.88

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01436734 0.01217151 0.01070666 0.00985885 0.00983238 0.01114702
 0.01055717 0.01079178 0.0106504  0.01036644]

mean value: 0.011044955253601075

key: score_time
value: [0.01219034 0.00957274 0.00930095 0.00910378 0.00889158 0.00985646
 0.00977993 0.00970864 0.00965691 0.00958467]

mean value: 0.009764599800109863

key: test_mcc
value: [0.68803296 0.69989647 0.69989647 0.79388419 0.57894737 0.63960215
 0.47633051 0.73786479 0.57184997 0.69007214]

mean value: 0.6576377014238246

key: train_mcc
value: [0.66106903 0.63812671 0.63133581 0.66254793 0.63888551 0.6871247
 0.63426969 0.68813955 0.67443892 0.63456594]

mean value: 0.6550503783024093

key: test_accuracy
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.81578947
 0.73684211 0.86842105 0.78378378 0.83783784]

mean value: 0.8253200568990042

key: train_accuracy
value: [0.82941176 0.81764706 0.80588235 0.82941176 0.81764706 0.84117647
 0.81470588 0.84117647 0.83577713 0.81524927]

mean value: 0.8248085216491289

key: test_fscore
value: [0.83333333 0.82352941 0.82352941 0.88888889 0.78947368 0.8
 0.72222222 0.87179487 0.77777778 0.8125    ]

mean value: 0.8143049601757032

key: train_fscore
value: [0.82208589 0.80864198 0.77852349 0.81987578 0.80745342 0.83125
 0.80250784 0.83018868 0.82716049 0.80495356]

mean value: 0.813264111779322

key: test_precision
value: [0.88235294 0.93333333 0.93333333 0.94117647 0.78947368 0.875
 0.76470588 0.85       0.82352941 0.92857143]

mean value: 0.8721476485330975

key: train_precision
value: [0.85897436 0.85064935 0.90625    0.86842105 0.85526316 0.88666667
 0.8590604  0.89189189 0.87012987 0.85526316]

mean value: 0.8702569909417754

key: test_recall
value: [0.78947368 0.73684211 0.73684211 0.84210526 0.78947368 0.73684211
 0.68421053 0.89473684 0.73684211 0.72222222]

mean value: 0.7669590643274854

key: train_recall
value: [0.78823529 0.77058824 0.68235294 0.77647059 0.76470588 0.78235294
 0.75294118 0.77647059 0.78823529 0.76023392]

mean value: 0.7642586859305125

key: test_roc_auc
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.81578947
 0.73684211 0.86842105 0.78508772 0.83479532]

mean value: 0.8251461988304094

key: train_roc_auc
value: [0.82941176 0.81764706 0.80588235 0.82941176 0.81764706 0.84117647
 0.81470588 0.84117647 0.83563811 0.81541108]

mean value: 0.8248108015135879

key: test_jcc
value: [0.71428571 0.7        0.7        0.8        0.65217391 0.66666667
 0.56521739 0.77272727 0.63636364 0.68421053]

mean value: 0.6891645120706905

key: train_jcc
value: [0.69791667 0.67875648 0.63736264 0.69473684 0.67708333 0.71122995
 0.67015707 0.70967742 0.70526316 0.67357513]

mean value: 0.6855758677521984

MCC on Blind test: 0.71

Accuracy on Blind test: 0.85

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01090002 0.01093769 0.0109024  0.01084542 0.010324   0.01106477
 0.01059437 0.01046681 0.00997615 0.01118135]

mean value: 0.01071929931640625

key: score_time
value: [0.0093627  0.0100379  0.00968313 0.00984097 0.01014543 0.0097847
 0.00955224 0.00931573 0.00902176 0.00988913]

mean value: 0.00966336727142334

key: test_mcc
value: [0.52704628 0.68421053 0.68421053 0.79388419 0.78947368 0.52704628
 0.78947368 0.63960215 0.48078072 0.57857577]

mean value: 0.6494303793257087

key: train_mcc
value: [0.74738216 0.73530684 0.77092175 0.74199852 0.72941176 0.74163853
 0.72354193 0.77092175 0.75366357 0.73705515]

mean value: 0.745184196259884

key: test_accuracy
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
 0.89473684 0.81578947 0.72972973 0.78378378]

mean value: 0.8224039829302987

key: train_accuracy
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
 0.86176471 0.88529412 0.87683284 0.86803519]

mean value: 0.8724279799896498

key: test_fscore
value: [0.76923077 0.84210526 0.84210526 0.9        0.89473684 0.76923077
 0.89473684 0.82926829 0.77272727 0.75      ]

mean value: 0.8264141314398054

key: train_fscore
value: [0.87536232 0.86803519 0.88695652 0.87356322 0.86470588 0.87283237
 0.86135693 0.88695652 0.87647059 0.87179487]

mean value: 0.8738034415804177

key: test_precision
value: [0.75       0.84210526 0.84210526 0.85714286 0.89473684 0.75
 0.89473684 0.77272727 0.68       0.85714286]

mean value: 0.8140697197539303

key: train_precision
value: [0.86285714 0.86549708 0.87428571 0.85393258 0.86470588 0.85795455
 0.86390533 0.87428571 0.87647059 0.85      ]

mean value: 0.8643894573208194

key: test_recall
value: [0.78947368 0.84210526 0.84210526 0.94736842 0.89473684 0.78947368
 0.89473684 0.89473684 0.89473684 0.66666667]

mean value: 0.8456140350877193

key: train_recall
value: [0.88823529 0.87058824 0.9        0.89411765 0.86470588 0.88823529
 0.85882353 0.9        0.87647059 0.89473684]

mean value: 0.8835913312693499

key: test_roc_auc
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
 0.89473684 0.81578947 0.7251462  0.78070175]

mean value: 0.8216374269005848

key: train_roc_auc
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
 0.86176471 0.88529412 0.87683179 0.86795666]

mean value: 0.8724200206398349

key: test_jcc
value: [0.625      0.72727273 0.72727273 0.81818182 0.80952381 0.625
 0.80952381 0.70833333 0.62962963 0.6       ]

mean value: 0.7079737854737855

key: train_jcc
value: [0.77835052 0.76683938 0.796875   0.7755102  0.76165803 0.77435897
 0.75647668 0.796875   0.78010471 0.77272727]

mean value: 0.7759775771937931

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.010535   0.01050043 0.01047206 0.01034617 0.01016498 0.01028228
 0.01025391 0.01106501 0.01017475 0.00908756]

mean value: 0.010288214683532715

key: score_time
value: [0.01397395 0.01574492 0.01688623 0.01823497 0.0147984  0.01664305
 0.01726961 0.01742172 0.01480031 0.01511502]

mean value: 0.01608881950378418

key: test_mcc
value: [0.57894737 0.57894737 0.31622777 0.57894737 0.43643578 0.42163702
 0.42640143 0.21821789 0.40780312 0.75614764]

mean value: 0.4719712759930457

key: train_mcc
value: [0.65304287 0.6882472  0.68254191 0.65322377 0.69416569 0.69455037
 0.65886913 0.67657595 0.73636217 0.607149  ]

mean value: 0.6744728059621842

key: test_accuracy
value: [0.78947368 0.78947368 0.65789474 0.78947368 0.71052632 0.71052632
 0.71052632 0.60526316 0.7027027  0.86486486]

mean value: 0.7330725462304409

key: train_accuracy
value: [0.82647059 0.84411765 0.84117647 0.82647059 0.84705882 0.84705882
 0.82941176 0.83823529 0.86803519 0.80351906]

mean value: 0.8371554252199414

key: test_fscore
value: [0.78947368 0.78947368 0.64864865 0.78947368 0.74418605 0.71794872
 0.68571429 0.54545455 0.73170732 0.83870968]

mean value: 0.7280790291401931

key: train_fscore
value: [0.82798834 0.84457478 0.84302326 0.8238806  0.84615385 0.84971098
 0.83040936 0.83965015 0.86567164 0.80235988]

mean value: 0.8373422826187441

key: test_precision
value: [0.78947368 0.78947368 0.66666667 0.78947368 0.66666667 0.7
 0.75       0.64285714 0.68181818 1.        ]

mean value: 0.7476429710640237

key: train_precision
value: [0.82080925 0.84210526 0.83333333 0.83636364 0.85119048 0.83522727
 0.8255814  0.83236994 0.87878788 0.80952381]

mean value: 0.8365292256184584

key: test_recall
value: [0.78947368 0.78947368 0.63157895 0.78947368 0.84210526 0.73684211
 0.63157895 0.47368421 0.78947368 0.72222222]

mean value: 0.7195906432748538

key: train_recall
value: [0.83529412 0.84705882 0.85294118 0.81176471 0.84117647 0.86470588
 0.83529412 0.84705882 0.85294118 0.79532164]

mean value: 0.8383556931544548

key: test_roc_auc
value: [0.78947368 0.78947368 0.65789474 0.78947368 0.71052632 0.71052632
 0.71052632 0.60526316 0.7002924  0.86111111]

mean value: 0.7324561403508772

key: train_roc_auc
value: [0.82647059 0.84411765 0.84117647 0.82647059 0.84705882 0.84705882
 0.82941176 0.83823529 0.86799106 0.80354317]

mean value: 0.8371534227726178

key: test_jcc
value: [0.65217391 0.65217391 0.48       0.65217391 0.59259259 0.56
 0.52173913 0.375      0.57692308 0.72222222]

mean value: 0.578499876130311

key: train_jcc
value: [0.70646766 0.73096447 0.72864322 0.70050761 0.73333333 0.73869347
 0.71       0.72361809 0.76315789 0.66995074]

mean value: 0.7205336483765594

MCC on Blind test: 0.49

Accuracy on Blind test: 0.74

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01583338 0.0182128  0.0161047  0.01597238 0.0159862  0.01671791
 0.01628447 0.01583791 0.01580572 0.01597381]

mean value: 0.016272926330566408

key: score_time
value: [0.01147938 0.01161909 0.01118112 0.01065826 0.01088691 0.01061678
 0.01073647 0.01067019 0.01060247 0.01049829]

mean value: 0.01089489459991455

key: test_mcc
value: [0.89473684 0.9486833  0.78947368 0.78947368 0.78947368 0.58218174
 0.89473684 0.73786479 0.51793973 0.78764146]

mean value: 0.773220574982312

key: train_mcc
value: [0.80022155 0.79424133 0.80600787 0.80600787 0.80600787 0.81182089
 0.78828985 0.8        0.82410816 0.80678035]

mean value: 0.8043485716732236

key: test_accuracy
value: [0.94736842 0.97368421 0.89473684 0.89473684 0.89473684 0.78947368
 0.94736842 0.86842105 0.75675676 0.89189189]

mean value: 0.8859174964438122

key: train_accuracy
value: [0.9        0.89705882 0.90294118 0.90294118 0.90294118 0.90588235
 0.89411765 0.9        0.91202346 0.90322581]

mean value: 0.9021131619803346

key: test_fscore
value: [0.94736842 0.97435897 0.89473684 0.89473684 0.89473684 0.77777778
 0.94736842 0.87179487 0.7804878  0.88235294]

mean value: 0.8865719738407196

key: train_fscore
value: [0.90116279 0.89795918 0.90379009 0.90379009 0.90379009 0.90643275
 0.89473684 0.9        0.9122807  0.90489914]

mean value: 0.9028841664606161

key: test_precision
value: [0.94736842 0.95       0.89473684 0.89473684 0.89473684 0.82352941
 0.94736842 0.85       0.72727273 0.9375    ]

mean value: 0.8867249507458486

key: train_precision
value: [0.8908046  0.89017341 0.89595376 0.89595376 0.89595376 0.90116279
 0.88953488 0.9        0.90697674 0.89204545]

mean value: 0.8958559152932181

key: test_recall
value: [0.94736842 1.         0.89473684 0.89473684 0.89473684 0.73684211
 0.94736842 0.89473684 0.84210526 0.83333333]

mean value: 0.8885964912280702

key: train_recall
value: [0.91176471 0.90588235 0.91176471 0.91176471 0.91176471 0.91176471
 0.9        0.9        0.91764706 0.91812865]

mean value: 0.9100481596147231

key: test_roc_auc
value: [0.94736842 0.97368421 0.89473684 0.89473684 0.89473684 0.78947368
 0.94736842 0.86842105 0.75438596 0.89035088]

mean value: 0.8855263157894737

key: train_roc_auc
value: [0.9        0.89705882 0.90294118 0.90294118 0.90294118 0.90588235
 0.89411765 0.9        0.9120399  0.90318197]

mean value: 0.902110423116615

key: test_jcc
value: [0.9        0.95       0.80952381 0.80952381 0.80952381 0.63636364
 0.9        0.77272727 0.64       0.78947368]

mean value: 0.8017136021872864

key: train_jcc
value: [0.82010582 0.81481481 0.82446809 0.82446809 0.82446809 0.82887701
 0.80952381 0.81818182 0.83870968 0.82631579]

mean value: 0.8229932990186044

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.32342815 1.41316032 1.3804853  1.2588563  1.50634933 1.53654742
 1.28589988 1.45925713 1.40028787 1.36803317]

mean value: 1.3932304859161377

key: score_time
value: [0.01891804 0.01499844 0.01266909 0.01278591 0.01561332 0.01240611
 0.01493168 0.01475978 0.02181697 0.01519823]

mean value: 0.015409755706787109

key: test_mcc
value: [0.89473684 0.89973541 0.78947368 0.89973541 0.73786479 0.63245553
 0.84327404 0.73786479 0.56725146 0.78764146]

mean value: 0.7790033421492283

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.94736842 0.89473684 0.94736842 0.86842105 0.81578947
 0.92105263 0.86842105 0.78378378 0.89189189]

mean value: 0.8886201991465149

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.95       0.89473684 0.94444444 0.87179487 0.81081081
 0.91891892 0.87179487 0.78947368 0.88235294]

mean value: 0.8881695806308809

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.9047619  0.89473684 1.         0.85       0.83333333
 0.94444444 0.85       0.78947368 0.9375    ]

mean value: 0.8951618629908104

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94736842 1.         0.89473684 0.89473684 0.89473684 0.78947368
 0.89473684 0.89473684 0.78947368 0.83333333]

mean value: 0.8833333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.94736842 0.89473684 0.94736842 0.86842105 0.81578947
 0.92105263 0.86842105 0.78362573 0.89035088]

mean value: 0.8884502923976608

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.9047619  0.80952381 0.89473684 0.77272727 0.68181818
 0.85       0.77272727 0.65217391 0.78947368]

mean value: 0.8027942880917709

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02541423 0.01783824 0.01625419 0.01690006 0.01592183 0.01577139
 0.01574349 0.01575923 0.01588821 0.01514101]

mean value: 0.017063188552856445

key: score_time
value: [0.01225829 0.00914264 0.00886273 0.00880837 0.00879955 0.00871849
 0.00876212 0.0086875  0.00877857 0.00877881]

mean value: 0.009159708023071289

key: test_mcc
value: [1.         0.84327404 0.79388419 0.9486833  0.84327404 0.89973541
 0.85280287 0.89973541 0.68035483 0.89181287]

mean value: 0.8653556953757348

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.92105263 0.89473684 0.97368421 0.92105263 0.94736842
 0.92105263 0.94736842 0.83783784 0.94594595]

mean value: 0.9310099573257468

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.92307692 0.88888889 0.97435897 0.92307692 0.94444444
 0.92682927 0.95       0.83333333 0.94444444]

mean value: 0.9308453199916614

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9        0.94117647 0.95       0.9        1.
 0.86363636 0.9047619  0.88235294 0.94444444]

mean value: 0.9286372124607418

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.94736842 0.84210526 1.         0.94736842 0.89473684
 1.         1.         0.78947368 0.94444444]

mean value: 0.9365497076023391

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.92105263 0.89473684 0.97368421 0.92105263 0.94736842
 0.92105263 0.94736842 0.83918129 0.94590643]

mean value: 0.9311403508771929

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.85714286 0.8        0.95       0.85714286 0.89473684
 0.86363636 0.9047619  0.71428571 0.89473684]

mean value: 0.8736443381180223

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10692024 0.10610008 0.10551596 0.10557318 0.1064024  0.10624623
 0.10674787 0.10648775 0.10734653 0.10737967]

mean value: 0.10647199153900147

key: score_time
value: [0.01746225 0.01743841 0.01741266 0.01759815 0.01770043 0.01763558
 0.01764989 0.0177474  0.01797533 0.01771569]

mean value: 0.017633581161499025

key: test_mcc
value: [0.9486833  0.89473684 0.63960215 0.79388419 0.73786479 0.73786479
 0.84327404 0.79388419 0.56934383 0.83871328]

mean value: 0.7797851390935022

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.94736842 0.81578947 0.89473684 0.86842105 0.86842105
 0.92105263 0.89473684 0.78378378 0.91891892]

mean value: 0.8886913229018493

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.94736842 0.8        0.9        0.86486486 0.87179487
 0.92307692 0.9        0.8        0.91428571]

mean value: 0.889574976943398

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94736842 0.875      0.85714286 0.88888889 0.85
 0.9        0.85714286 0.76190476 0.94117647]

mean value: 0.8828624256720232

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.94736842 0.73684211 0.94736842 0.84210526 0.89473684
 0.94736842 0.94736842 0.84210526 0.88888889]

mean value: 0.8994152046783626

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.81578947 0.89473684 0.86842105 0.86842105
 0.92105263 0.89473684 0.78216374 0.91812865]

mean value: 0.8884502923976607

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.9        0.66666667 0.81818182 0.76190476 0.77272727
 0.85714286 0.81818182 0.66666667 0.84210526]

mean value: 0.8053577124629756

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.82

Accuracy on Blind test: 0.91

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00968361 0.00963068 0.00972104 0.00968266 0.00966477 0.00967169
 0.00958872 0.00959301 0.0098207  0.00972056]

mean value: 0.009677743911743164

key: score_time
value: [0.00886869 0.0086751  0.00869846 0.00869846 0.00874829 0.00864029
 0.00864649 0.00878    0.00873041 0.00866818]

mean value: 0.008715438842773437

key: test_mcc
value: [0.37047929 0.78947368 0.42640143 0.68803296 0.47633051 0.58218174
 0.42640143 0.47633051 0.29618896 0.62280702]

mean value: 0.5154627531777716

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.68421053 0.89473684 0.71052632 0.84210526 0.73684211 0.78947368
 0.71052632 0.73684211 0.64864865 0.81081081]

mean value: 0.7564722617354196

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.89473684 0.68571429 0.83333333 0.72222222 0.8
 0.68571429 0.75       0.66666667 0.81081081]

mean value: 0.7515865113233534

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.70588235 0.89473684 0.75       0.88235294 0.76470588 0.76190476
 0.75       0.71428571 0.65       0.78947368]

mean value: 0.7663342178976854

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63157895 0.89473684 0.63157895 0.78947368 0.68421053 0.84210526
 0.63157895 0.78947368 0.68421053 0.83333333]

mean value: 0.7412280701754386

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.68421053 0.89473684 0.71052632 0.84210526 0.73684211 0.78947368
 0.71052632 0.73684211 0.64766082 0.81140351]

mean value: 0.7564327485380117

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.80952381 0.52173913 0.71428571 0.56521739 0.66666667
 0.52173913 0.6        0.5        0.68181818]

mean value: 0.6080990024468285

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.43

Accuracy on Blind test: 0.71

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.52656865 1.51507688 1.49541473 1.58690763 1.57518911 1.50592113
 1.52353764 1.52296948 1.51906967 1.57698035]

mean value: 1.5347635269165039

key: score_time
value: [0.09216976 0.09152102 0.09123611 0.09885144 0.09371042 0.09946156
 0.09564042 0.09658599 0.09777331 0.09974384]

mean value: 0.09566938877105713

key: test_mcc
value: [1.         0.89973541 0.78947368 0.89473684 0.84327404 0.84327404
 1.         0.9486833  0.78362573 0.89181287]

mean value: 0.8894615917123104

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.94736842 0.89473684 0.94736842 0.92105263 0.92105263
 1.         0.97368421 0.89189189 0.94594595]

mean value: 0.9443100995732574

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94444444 0.89473684 0.94736842 0.92307692 0.91891892
 1.         0.97435897 0.89473684 0.94444444]

mean value: 0.9442085810506863

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.89473684 0.94736842 0.9        0.94444444
 1.         0.95       0.89473684 0.94444444]

mean value: 0.9475730994152046

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.89473684 0.89473684 0.94736842 0.94736842 0.89473684
 1.         1.         0.89473684 0.94444444]

mean value: 0.941812865497076

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.94736842 0.89473684 0.94736842 0.92105263 0.92105263
 1.         0.97368421 0.89181287 0.94590643]

mean value: 0.9442982456140351

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.89473684 0.80952381 0.9        0.85714286 0.85
 1.         0.95       0.80952381 0.89473684]

mean value: 0.8965664160401002

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.948488   0.93976998 0.88425088 0.91976881 0.97527862 0.93516684
 0.89031529 0.95844722 0.9312284  0.9260869 ]

mean value: 0.930880093574524

key: score_time
value: [0.17965937 0.26049066 0.24061608 0.1728282  0.16420078 0.18590951
 0.26020312 0.22205067 0.26080751 0.14662576]

mean value: 0.20933916568756103

key: test_mcc
value: [1.         0.89973541 0.73786479 0.89473684 0.84327404 0.89973541
 0.89973541 0.89973541 0.73020842 0.89181287]

mean value: 0.8696838599499482

key: train_mcc
value: [0.95300713 0.94720632 0.95884012 0.95294118 0.95897286 0.95884012
 0.95300713 0.96477265 0.97653939 0.95896113]

mean value: 0.9583088027940004

key: test_accuracy
value: [1.         0.94736842 0.86842105 0.94736842 0.92105263 0.94736842
 0.94736842 0.94736842 0.86486486 0.94594595]

mean value: 0.9337126600284494

key: train_accuracy
value: [0.97647059 0.97352941 0.97941176 0.97647059 0.97941176 0.97941176
 0.97647059 0.98235294 0.98826979 0.97947214]

mean value: 0.9791271347248577

key: test_fscore
value: [1.         0.94444444 0.86486486 0.94736842 0.92307692 0.94444444
 0.94444444 0.95       0.87179487 0.94444444]

mean value: 0.933488285856707

key: train_fscore
value: [0.97633136 0.97329377 0.97935103 0.97647059 0.97922849 0.97947214
 0.97633136 0.98224852 0.98823529 0.97947214]

mean value: 0.9790434694122674

key: test_precision
value: [1.         1.         0.88888889 0.94736842 0.9        1.
 1.         0.9047619  0.85       0.94444444]

mean value: 0.9435463659147869
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: train_precision
value: [0.98214286 0.98203593 0.98224852 0.97647059 0.98802395 0.97660819
 0.98214286 0.98809524 0.98823529 0.98235294]

mean value: 0.9828356363994447

key: test_recall
value: [1.         0.89473684 0.84210526 0.94736842 0.94736842 0.89473684
 0.89473684 1.         0.89473684 0.94444444]

mean value: 0.9260233918128655

key: train_recall
value: [0.97058824 0.96470588 0.97647059 0.97647059 0.97058824 0.98235294
 0.97058824 0.97647059 0.98823529 0.97660819]

mean value: 0.9753078775369797

key: test_roc_auc
value: [1.         0.94736842 0.86842105 0.94736842 0.92105263 0.94736842
 0.94736842 0.94736842 0.86403509 0.94590643]

mean value: 0.933625730994152

key: train_roc_auc
value: [0.97647059 0.97352941 0.97941176 0.97647059 0.97941176 0.97941176
 0.97647059 0.98235294 0.98826969 0.97948056]

mean value: 0.9791279669762643

key: test_jcc
value: [1.         0.89473684 0.76190476 0.9        0.85714286 0.89473684
 0.89473684 0.9047619  0.77272727 0.89473684]

mean value: 0.877548416495785

key: train_jcc
value: [0.95375723 0.94797688 0.95953757 0.95402299 0.95930233 0.95977011
 0.95375723 0.96511628 0.97674419 0.95977011]

mean value: 0.9589754910822583

MCC on Blind test: 0.85

Accuracy on Blind test: 0.92

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02467036 0.00961518 0.00966334 0.00970984 0.00982165 0.00965047
 0.00962496 0.00963306 0.00963211 0.00960302]

mean value: 0.011162400245666504

key: score_time
value: [0.01448631 0.00879812 0.00893068 0.0087347  0.0088098  0.0087955
 0.00874925 0.00880694 0.00874519 0.00879836]

mean value: 0.009365487098693847

key: test_mcc
value: [0.52704628 0.68421053 0.68421053 0.79388419 0.78947368 0.52704628
 0.78947368 0.63960215 0.48078072 0.57857577]

mean value: 0.6494303793257087

key: train_mcc
value: [0.74738216 0.73530684 0.77092175 0.74199852 0.72941176 0.74163853
 0.72354193 0.77092175 0.75366357 0.73705515]

mean value: 0.745184196259884

key: test_accuracy
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
 0.89473684 0.81578947 0.72972973 0.78378378]

mean value: 0.8224039829302987

key: train_accuracy
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
 0.86176471 0.88529412 0.87683284 0.86803519]

mean value: 0.8724279799896498

key: test_fscore
value: [0.76923077 0.84210526 0.84210526 0.9        0.89473684 0.76923077
 0.89473684 0.82926829 0.77272727 0.75      ]

mean value: 0.8264141314398054

key: train_fscore
value: [0.87536232 0.86803519 0.88695652 0.87356322 0.86470588 0.87283237
 0.86135693 0.88695652 0.87647059 0.87179487]

mean value: 0.8738034415804177

key: test_precision
value: [0.75       0.84210526 0.84210526 0.85714286 0.89473684 0.75
 0.89473684 0.77272727 0.68       0.85714286]

mean value: 0.8140697197539303

key: train_precision
value: [0.86285714 0.86549708 0.87428571 0.85393258 0.86470588 0.85795455
 0.86390533 0.87428571 0.87647059 0.85      ]

mean value: 0.8643894573208194

key: test_recall
value: [0.78947368 0.84210526 0.84210526 0.94736842 0.89473684 0.78947368
 0.89473684 0.89473684 0.89473684 0.66666667]

mean value: 0.8456140350877193

key: train_recall
value: [0.88823529 0.87058824 0.9        0.89411765 0.86470588 0.88823529
 0.85882353 0.9        0.87647059 0.89473684]

mean value: 0.8835913312693499

key: test_roc_auc
value: [0.76315789 0.84210526 0.84210526 0.89473684 0.89473684 0.76315789
 0.89473684 0.81578947 0.7251462  0.78070175]

mean value: 0.8216374269005848

key: train_roc_auc
value: [0.87352941 0.86764706 0.88529412 0.87058824 0.86470588 0.87058824
 0.86176471 0.88529412 0.87683179 0.86795666]

mean value: 0.8724200206398349

key: test_jcc
value: [0.625      0.72727273 0.72727273 0.81818182 0.80952381 0.625
 0.80952381 0.70833333 0.62962963 0.6       ]

mean value: 0.7079737854737855

key: train_jcc
value: [0.77835052 0.76683938 0.796875   0.7755102  0.76165803 0.77435897
 0.75647668 0.796875   0.78010471 0.77272727]

mean value: 0.7759775771937931

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.2068522  0.1708672  0.06113815 0.06603909 0.06120896 0.05981898
 0.05868864 0.29174948 0.05496788 0.05988955]

mean value: 0.10912201404571534

key: score_time
value: [0.01325941 0.01139426 0.01159883 0.01139307 0.01082778 0.01058769
 0.01076388 0.01153779 0.01074386 0.01062775]

mean value: 0.011273431777954101

key: test_mcc
value: [1.         1.         1.         0.9486833  0.89473684 0.9486833
 0.84327404 0.9486833  0.78362573 0.89181287]

mean value: 0.92594993754596

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         1.         0.97368421 0.94736842 0.97368421
 0.92105263 0.97368421 0.89189189 0.94594595]

mean value: 0.9627311522048364

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         1.         0.97435897 0.94736842 0.97297297
 0.92307692 0.97435897 0.89473684 0.94444444]

mean value: 0.9631317552370184

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95       0.94736842 1.
 0.9        0.95       0.89473684 0.94444444]

mean value: 0.9586549707602339

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.94736842 0.94736842
 0.94736842 1.         0.89473684 0.94444444]

mean value: 0.9681286549707602

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         1.         0.97368421 0.94736842 0.97368421
 0.92105263 0.97368421 0.89181287 0.94590643]

mean value: 0.962719298245614

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         1.         0.95       0.9        0.94736842
 0.85714286 0.95       0.80952381 0.89473684]

mean value: 0.9308771929824561

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.04102659 0.08542848 0.07878828 0.07048774 0.07999039 0.06906962
 0.06940484 0.05672765 0.0792861  0.05792046]

mean value: 0.06881301403045655

key: score_time
value: [0.0229919  0.02294731 0.02151322 0.01808548 0.02362418 0.02200913
 0.01912975 0.01265907 0.01236296 0.01703215]

mean value: 0.019235515594482423

key: test_mcc
value: [0.80757285 0.9486833  0.79388419 0.84327404 0.73786479 0.48454371
 0.73786479 0.73786479 0.51461988 0.78362573]

mean value: 0.7389798067892077

key: train_mcc
value: [0.92941176 0.94124161 0.95294118 0.92941176 0.95294118 0.94707521
 0.94124161 0.9353103  0.94722901 0.95896113]

mean value: 0.9435764748091761

key: test_accuracy
value: [0.89473684 0.97368421 0.89473684 0.92105263 0.86842105 0.73684211
 0.86842105 0.86842105 0.75675676 0.89189189]

mean value: 0.8674964438122333

key: train_accuracy
value: [0.96470588 0.97058824 0.97647059 0.96470588 0.97647059 0.97352941
 0.97058824 0.96764706 0.97360704 0.97947214]

mean value: 0.9717785061238572

key: test_fscore
value: [0.9047619  0.97435897 0.88888889 0.92307692 0.86486486 0.70588235
 0.87179487 0.87179487 0.75675676 0.88888889]

mean value: 0.8651069298128121

key: train_fscore
value: [0.96470588 0.9704142  0.97647059 0.96470588 0.97647059 0.97360704
 0.9704142  0.96755162 0.97345133 0.97947214]

mean value: 0.9717263472281472

key: test_precision
value: [0.82608696 0.95       0.94117647 0.9        0.88888889 0.8
 0.85       0.85       0.77777778 0.88888889]

mean value: 0.867281898266553

key: train_precision
value: [0.96470588 0.97619048 0.97647059 0.96470588 0.97647059 0.97076023
 0.97619048 0.9704142  0.97633136 0.98235294]

mean value: 0.97345926307822

key: test_recall
value: [1.         1.         0.84210526 0.94736842 0.84210526 0.63157895
 0.89473684 0.89473684 0.73684211 0.88888889]

mean value: 0.8678362573099415

key: train_recall
value: [0.96470588 0.96470588 0.97647059 0.96470588 0.97647059 0.97647059
 0.96470588 0.96470588 0.97058824 0.97660819]

mean value: 0.9700137598899209

key: test_roc_auc
value: [0.89473684 0.97368421 0.89473684 0.92105263 0.86842105 0.73684211
 0.86842105 0.86842105 0.75730994 0.89181287]

mean value: 0.8675438596491228

key: train_roc_auc
value: [0.96470588 0.97058824 0.97647059 0.96470588 0.97647059 0.97352941
 0.97058824 0.96764706 0.97359821 0.97948056]

mean value: 0.9717784657722739

key: test_jcc
value: [0.82608696 0.95       0.8        0.85714286 0.76190476 0.54545455
 0.77272727 0.77272727 0.60869565 0.8       ]

mean value: 0.7694739318652362

key: train_jcc
value: [0.93181818 0.94252874 0.95402299 0.93181818 0.95402299 0.94857143
 0.94252874 0.93714286 0.94827586 0.95977011]

mean value: 0.9450500074638005

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01010537 0.01241589 0.01016307 0.00971317 0.01059246 0.01026273
 0.0100174  0.00931168 0.00948501 0.00963616]

mean value: 0.010170292854309083

key: score_time
value: [0.00914979 0.00916171 0.00918078 0.00903225 0.00939584 0.00868344
 0.009166   0.00891471 0.00897503 0.00866079]

mean value: 0.009032034873962402

key: test_mcc
value: [0.59222009 0.63960215 0.63245553 0.63960215 0.68421053 0.63245553
 0.59222009 0.73786479 0.4670794  0.69007214]

mean value: 0.6307782394614956

key: train_mcc
value: [0.66563935 0.61817134 0.70059418 0.65349541 0.66610178 0.72946225
 0.60715472 0.75894171 0.71355814 0.67276567]

mean value: 0.6785884546131096

key: test_accuracy
value: [0.78947368 0.81578947 0.81578947 0.81578947 0.84210526 0.81578947
 0.78947368 0.86842105 0.72972973 0.83783784]

mean value: 0.8120199146514936

key: train_accuracy
value: [0.83235294 0.80882353 0.85       0.82647059 0.83235294 0.86470588
 0.80294118 0.87941176 0.85630499 0.83577713]

mean value: 0.8389140934966361

key: test_fscore
value: [0.76470588 0.8        0.81081081 0.8        0.84210526 0.81081081
 0.76470588 0.87179487 0.76190476 0.8125    ]

mean value: 0.8039338283185032

key: train_fscore
value: [0.82779456 0.8048048  0.84684685 0.82282282 0.82674772 0.86390533
 0.79635258 0.87833828 0.85196375 0.8313253 ]

mean value: 0.8350901992163299

key: test_precision
value: [0.86666667 0.875      0.83333333 0.875      0.84210526 0.83333333
 0.86666667 0.85       0.69565217 0.92857143]

mean value: 0.8466328865642367

key: train_precision
value: [0.85093168 0.82208589 0.86503067 0.8404908  0.85534591 0.86904762
 0.82389937 0.88622754 0.8757764  0.85714286]

mean value: 0.8545978740616875

key: test_recall
value: [0.68421053 0.73684211 0.78947368 0.73684211 0.84210526 0.78947368
 0.68421053 0.89473684 0.84210526 0.72222222]

mean value: 0.7722222222222223

key: train_recall
value: [0.80588235 0.78823529 0.82941176 0.80588235 0.8        0.85882353
 0.77058824 0.87058824 0.82941176 0.80701754]

mean value: 0.8165841073271414

key: test_roc_auc
value: [0.78947368 0.81578947 0.81578947 0.81578947 0.84210526 0.81578947
 0.78947368 0.86842105 0.72660819 0.83479532]

mean value: 0.8114035087719298

key: train_roc_auc
value: [0.83235294 0.80882353 0.85       0.82647059 0.83235294 0.86470588
 0.80294118 0.87941176 0.85622635 0.83586171]

mean value: 0.8389146886824905

key: test_jcc
value: [0.61904762 0.66666667 0.68181818 0.66666667 0.72727273 0.68181818
 0.61904762 0.77272727 0.61538462 0.68421053]

mean value: 0.673466007676534

key: train_jcc
value: [0.70618557 0.67336683 0.734375   0.69897959 0.70466321 0.76041667
 0.66161616 0.78306878 0.74210526 0.71134021]

mean value: 0.7176117286148205

MCC on Blind test: 0.77

Accuracy on Blind test: 0.89

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01501966 0.02042222 0.01662898 0.02123141 0.01934409 0.0200212
 0.01753616 0.01804042 0.01837301 0.01642108]

mean value: 0.018303823471069337

key: score_time
value: [0.0092628  0.0112505  0.01110458 0.01176238 0.01192117 0.01195502
 0.01184368 0.01198912 0.01202178 0.01190639]

mean value: 0.011501741409301759

key: test_mcc
value: [0.9486833  0.89973541 0.38729833 0.84327404 0.67936622 0.56613852
 0.29277002 0.79388419 0.56934383 0.62525715]

mean value: 0.6605751008798381

key: train_mcc
value: [0.87660709 0.8617507  0.57735027 0.91190671 0.81150267 0.73854895
 0.42491829 0.83159022 0.93562485 0.84815135]

mean value: 0.7817951100491364

key: test_accuracy
value: [0.97368421 0.94736842 0.65789474 0.92105263 0.81578947 0.76315789
 0.57894737 0.89473684 0.78378378 0.78378378]

mean value: 0.8120199146514936

key: train_accuracy
value: [0.93823529 0.92941176 0.75       0.95588235 0.89705882 0.85294118
 0.65294118 0.90882353 0.96774194 0.92082111]

mean value: 0.8773857167500432

key: test_fscore
value: [0.97435897 0.95       0.73469388 0.92307692 0.84444444 0.70967742
 0.7037037  0.9        0.8        0.71428571]

mean value: 0.825424105677562

key: train_fscore
value: [0.93877551 0.93220339 0.8        0.95626822 0.90666667 0.82758621
 0.74235808 0.89967638 0.96735905 0.91588785]

mean value: 0.8886781350091697

key: test_precision
value: [0.95       0.9047619  0.6        0.9        0.73076923 0.91666667
 0.54285714 0.85714286 0.76190476 1.        ]

mean value: 0.8164102564102564

key: train_precision
value: [0.93063584 0.89673913 0.66666667 0.94797688 0.82926829 1.
 0.59027778 1.         0.9760479  0.98      ]

mean value: 0.8817612488516776

key: test_recall
value: [1.         1.         0.94736842 0.94736842 1.         0.57894737
 1.         0.94736842 0.84210526 0.55555556]

mean value: 0.8818713450292397

key: train_recall
value: [0.94705882 0.97058824 1.         0.96470588 1.         0.70588235
 1.         0.81764706 0.95882353 0.85964912]

mean value: 0.9224355005159959

key: test_roc_auc
value: [0.97368421 0.94736842 0.65789474 0.92105263 0.81578947 0.76315789
 0.57894737 0.89473684 0.78216374 0.77777778]

mean value: 0.8112573099415205

key: train_roc_auc
value: [0.93823529 0.92941176 0.75       0.95588235 0.89705882 0.85294118
 0.65294118 0.90882353 0.96771586 0.92100103]

mean value: 0.8774011007911937

key: test_jcc
value: [0.95       0.9047619  0.58064516 0.85714286 0.73076923 0.55
 0.54285714 0.81818182 0.66666667 0.55555556]

mean value: 0.7156580337225499

key: train_jcc
value: [0.88461538 0.87301587 0.66666667 0.91620112 0.82926829 0.70588235
 0.59027778 0.81764706 0.93678161 0.84482759]

mean value: 0.8065183719244069

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01763225 0.0185833  0.01747012 0.01765466 0.01662111 0.01753116
 0.01757503 0.01850629 0.01743722 0.01835084]

mean value: 0.017736196517944336

key: score_time
value: [0.01220918 0.01224709 0.0121901  0.01216316 0.01189256 0.01195359
 0.01198053 0.01190829 0.01229215 0.01211333]

mean value: 0.012094998359680175

key: test_mcc
value: [0.80757285 0.89973541 0.61017022 0.84327404 0.79388419 0.63960215
 0.79388419 0.78947368 0.62807634 0.78764146]

mean value: 0.7593314528036907

key: train_mcc
value: [0.8028464  0.8452381  0.67431767 0.90076395 0.87721456 0.90688708
 0.91178048 0.82150888 0.87394751 0.91280274]

mean value: 0.8527307362889015

key: test_accuracy
value: [0.89473684 0.94736842 0.78947368 0.92105263 0.89473684 0.81578947
 0.89473684 0.89473684 0.81081081 0.89189189]

mean value: 0.8755334281650071

key: train_accuracy
value: [0.89705882 0.91764706 0.81470588 0.95       0.93823529 0.95294118
 0.95588235 0.90294118 0.93548387 0.95601173]

mean value: 0.9220907365878903

key: test_fscore
value: [0.9047619  0.94444444 0.81818182 0.92307692 0.88888889 0.8
 0.88888889 0.89473684 0.82926829 0.88235294]

mean value: 0.8774600944207529

key: train_fscore
value: [0.90410959 0.91082803 0.84289277 0.94894895 0.93693694 0.95180723
 0.95575221 0.89250814 0.93785311 0.95522388]

mean value: 0.9236860841053656

key: test_precision
value: [0.82608696 1.         0.72       0.9        0.94117647 0.875
 0.94117647 0.89473684 0.77272727 0.9375    ]

mean value: 0.8808404012530746

key: train_precision
value: [0.84615385 0.99305556 0.73160173 0.96932515 0.95705521 0.97530864
 0.95857988 1.         0.90217391 0.97560976]

mean value: 0.9308863694182445

key: test_recall
value: [1.         0.89473684 0.94736842 0.94736842 0.84210526 0.73684211
 0.84210526 0.89473684 0.89473684 0.83333333]

mean value: 0.8833333333333333

key: train_recall
value: [0.97058824 0.84117647 0.99411765 0.92941176 0.91764706 0.92941176
 0.95294118 0.80588235 0.97647059 0.93567251]

mean value: 0.9253319573443413

key: test_roc_auc
value: [0.89473684 0.94736842 0.78947368 0.92105263 0.89473684 0.81578947
 0.89473684 0.89473684 0.80847953 0.89035088]

mean value: 0.8751461988304093

key: train_roc_auc
value: [0.89705882 0.91764706 0.81470588 0.95       0.93823529 0.95294118
 0.95588235 0.90294118 0.93560372 0.95607155]

mean value: 0.922108703130375

key: test_jcc
value: [0.82608696 0.89473684 0.69230769 0.85714286 0.8        0.66666667
 0.8        0.80952381 0.70833333 0.78947368]

mean value: 0.7844271841811887

key: train_jcc
value: [0.825      0.83625731 0.72844828 0.90285714 0.88135593 0.90804598
 0.91525424 0.80588235 0.88297872 0.91428571]

mean value: 0.8600365665794898

MCC on Blind test: 0.81

Accuracy on Blind test: 0.9

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.16849089 0.14875078 0.15005994 0.15339446 0.15740991 0.15145826
 0.14945006 0.15879703 0.16036916 0.15662646]

mean value: 0.155480694770813

key: score_time
value: [0.0154891  0.01526761 0.01554847 0.01697516 0.01604891 0.01547098
 0.01667619 0.01671481 0.01670885 0.01569343]

mean value: 0.016059350967407227

key: test_mcc
value: [1.         1.         1.         0.9486833  0.9486833  0.9486833
 0.89973541 0.89473684 0.73099415 0.89181287]

mean value: 0.9263329164643102

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         1.         0.97368421 0.97368421 0.97368421
 0.94736842 0.94736842 0.86486486 0.94594595]

mean value: 0.9626600284495022

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         1.         0.97435897 0.97297297 0.97297297
 0.95       0.94736842 0.86486486 0.94444444]

mean value: 0.9626982650666861

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95       1.         1.
 0.9047619  0.94736842 0.88888889 0.94444444]

mean value: 0.9635463659147869

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.94736842 0.94736842
 1.         0.94736842 0.84210526 0.94444444]

mean value: 0.9628654970760233

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         1.         0.97368421 0.97368421 0.97368421
 0.94736842 0.94736842 0.86549708 0.94590643]

mean value: 0.9627192982456141

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         1.         0.95       0.94736842 0.94736842
 0.9047619  0.9        0.76190476 0.89473684]

mean value: 0.9306140350877192

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05852747 0.04814076 0.04541254 0.04895949 0.05033207 0.04772234
 0.06385469 0.04760981 0.04737306 0.05188274]

mean value: 0.0509814977645874

key: score_time
value: [0.02652955 0.02458501 0.02486563 0.02379942 0.02537274 0.02120161
 0.02789068 0.02503252 0.02294707 0.02493882]

mean value: 0.02471630573272705

key: test_mcc
value: [1.         1.         1.         0.9486833  0.89473684 0.89973541
 0.9486833  0.9486833  0.73099415 0.94736842]

mean value: 0.9318884720198657

key: train_mcc
value: [1.         0.99413485 1.         1.         0.98830369 0.99413485
 0.98823529 0.98830369 0.99415185 0.98833809]

mean value: 0.9935602308976983

key: test_accuracy
value: [1.         1.         1.         0.97368421 0.94736842 0.94736842
 0.97368421 0.97368421 0.86486486 0.97297297]

mean value: 0.9653627311522048

key: train_accuracy
value: [1.         0.99705882 1.         1.         0.99411765 0.99705882
 0.99411765 0.99411765 0.99706745 0.9941349 ]

mean value: 0.996767293427635

key: test_fscore
value: [1.         1.         1.         0.97435897 0.94736842 0.94444444
 0.97435897 0.97435897 0.86486486 0.97297297]

mean value: 0.9652727626411837

key: train_fscore
value: [1.         0.99705015 1.         1.         0.99408284 0.99705015
 0.99411765 0.99408284 0.99705015 0.99411765]

mean value: 0.9967551417068896

key: test_precision
value: [1.         1.         1.         0.95       0.94736842 1.
 0.95       0.95       0.88888889 0.94736842]

mean value: 0.9633625730994152

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 0.99411765 1.         1.         1.        ]

mean value: 0.9994117647058823

key: test_recall
value: [1.         1.         1.         1.         0.94736842 0.89473684
 1.         1.         0.84210526 1.        ]

mean value: 0.968421052631579

key: train_recall
value: [1.         0.99411765 1.         1.         0.98823529 0.99411765
 0.99411765 0.98823529 0.99411765 0.98830409]

mean value: 0.994124527003784

key: test_roc_auc
value: [1.         1.         1.         0.97368421 0.94736842 0.94736842
 0.97368421 0.97368421 0.86549708 0.97368421]

mean value: 0.9654970760233919

key: train_roc_auc
value: [1.         0.99705882 1.         1.         0.99411765 0.99705882
 0.99411765 0.99411765 0.99705882 0.99415205]

mean value: 0.9967681458548332

key: test_jcc
value: [1.         1.         1.         0.95       0.9        0.89473684
 0.95       0.95       0.76190476 0.94736842]

mean value: 0.9354010025062657

key: train_jcc
value: [1.         0.99411765 1.         1.         0.98823529 0.99411765
 0.98830409 0.98823529 0.99411765 0.98830409]

mean value: 0.9935431716546268

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.08551073 0.10424566 0.10746527 0.11186576 0.08251929 0.10721231
 0.0956862  0.1061132  0.16514015 0.17688584]

mean value: 0.11426444053649902

key: score_time
value: [0.02205372 0.02442288 0.02611351 0.02609515 0.02227306 0.02218318
 0.022053   0.02277589 0.03644013 0.04053092]

mean value: 0.026494145393371582

key: test_mcc
value: [0.58218174 0.68421053 0.42640143 0.68803296 0.59222009 0.68803296
 0.68803296 0.68803296 0.35558302 0.69007214]

mean value: 0.6082800795061165

key: train_mcc
value: [0.99413485 0.99413485 0.99413485 1.         1.         0.99413485
 0.99413485 0.99413485 0.99415185 0.99415205]

mean value: 0.9953112973615207

key: test_accuracy
value: [0.78947368 0.84210526 0.71052632 0.84210526 0.78947368 0.84210526
 0.84210526 0.84210526 0.67567568 0.83783784]

mean value: 0.8013513513513513

key: train_accuracy
value: [0.99705882 0.99705882 0.99705882 1.         1.         0.99705882
 0.99705882 0.99705882 0.99706745 0.99706745]

mean value: 0.9976487838537175

key: test_fscore
value: [0.77777778 0.84210526 0.68571429 0.83333333 0.80952381 0.83333333
 0.85       0.83333333 0.71428571 0.8125    ]

mean value: 0.7991906850459483

key: train_fscore
value: [0.99705015 0.99705015 0.99705015 1.         1.         0.99705015
 0.99705015 0.99705015 0.99705015 0.99706745]

mean value: 0.9976418481128729

key: test_precision
value: [0.82352941 0.84210526 0.75       0.88235294 0.73913043 0.88235294
 0.80952381 0.88235294 0.65217391 0.92857143]

mean value: 0.8192093084373337

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.73684211 0.84210526 0.63157895 0.78947368 0.89473684 0.78947368
 0.89473684 0.78947368 0.78947368 0.72222222]

mean value: 0.7880116959064327

key: train_recall
value: [0.99411765 0.99411765 0.99411765 1.         1.         0.99411765
 0.99411765 0.99411765 0.99411765 0.99415205]

mean value: 0.9952975576195391

key: test_roc_auc
value: [0.78947368 0.84210526 0.71052632 0.84210526 0.78947368 0.84210526
 0.84210526 0.84210526 0.67251462 0.83479532]

mean value: 0.8007309941520468

key: train_roc_auc
value: [0.99705882 0.99705882 0.99705882 1.         1.         0.99705882
 0.99705882 0.99705882 0.99705882 0.99707602]

mean value: 0.9976487788097695

key: test_jcc
value: [0.63636364 0.72727273 0.52173913 0.71428571 0.68       0.71428571
 0.73913043 0.71428571 0.55555556 0.68421053]

mean value: 0.6687129153582243

key: train_jcc
value: [0.99411765 0.99411765 0.99411765 1.         1.         0.99411765
 0.99411765 0.99411765 0.99411765 0.99415205]

mean value: 0.9952975576195391

MCC on Blind test: 0.57

Accuracy on Blind test: 0.78

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.57054996 0.5652597  0.55597973 0.57494354 0.56049299 0.54836679
 0.56327939 0.55440116 0.55499816 0.55580854]

mean value: 0.5604079961776733

key: score_time
value: [0.00960946 0.01006937 0.0094645  0.00990939 0.00955105 0.00955939
 0.00990939 0.00954437 0.0094986  0.00949216]

mean value: 0.009660768508911132

key: test_mcc
value: [1.         1.         0.9486833  0.9486833  0.89473684 0.89973541
 0.89973541 0.9486833  0.78362573 0.89181287]

mean value: 0.9215696154432907

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         0.97368421 0.97368421 0.94736842 0.94736842
 0.94736842 0.97368421 0.89189189 0.94594595]

mean value: 0.960099573257468

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         0.97435897 0.97435897 0.94736842 0.94444444
 0.95       0.97435897 0.89473684 0.94444444]

mean value: 0.9604071075123707

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.95       0.95       0.94736842 1.
 0.9047619  0.95       0.89473684 0.94444444]

mean value: 0.9541311612364244

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.94736842 0.89473684
 1.         1.         0.89473684 0.94444444]

mean value: 0.9681286549707602

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         0.97368421 0.97368421 0.94736842 0.94736842
 0.94736842 0.97368421 0.89181287 0.94590643]

mean value: 0.9600877192982457

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         0.95       0.95       0.9        0.89473684
 0.9047619  0.95       0.80952381 0.89473684]

mean value: 0.9253759398496241

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02796578 0.02674651 0.02773929 0.02755022 0.02779508 0.02704573
 0.0277493  0.03268075 0.02809811 0.02789998]

mean value: 0.028127074241638184

key: score_time
value: [0.01260114 0.0186615  0.01520371 0.01710987 0.01517677 0.01661873
 0.01516271 0.02039289 0.01575422 0.01550627]

mean value: 0.01621878147125244

key: test_mcc
value: [0.52704628 0.26462806 0.21320072 0.68803296 0.63245553 0.26462806
 0.16151457 0.37686733 0.18980224 0.6754386 ]

mean value: 0.3993614351469908

key: train_mcc
value: [0.94838881 0.95884012 0.70087664 0.9653073  0.88852332 0.86751214
 0.86751214 0.79170339 0.87096663 0.95366475]

mean value: 0.8813295248742146

key: test_accuracy
value: [0.76315789 0.63157895 0.60526316 0.84210526 0.81578947 0.63157895
 0.57894737 0.68421053 0.59459459 0.83783784]

mean value: 0.6985064011379801

key: train_accuracy
value: [0.97352941 0.97941176 0.82941176 0.98235294 0.94117647 0.92941176
 0.92941176 0.88529412 0.93548387 0.97653959]

mean value: 0.9362023460410557

key: test_fscore
value: [0.75675676 0.65       0.57142857 0.83333333 0.81081081 0.61111111
 0.52941176 0.64705882 0.65116279 0.83333333]

mean value: 0.6894407295706886

key: train_fscore
value: [0.97280967 0.97935103 0.79432624 0.98203593 0.9375     0.92405063
 0.92405063 0.87043189 0.93529412 0.97701149]

mean value: 0.9296861640810983

key: test_precision
value: [0.77777778 0.61904762 0.625      0.88235294 0.83333333 0.64705882
 0.6        0.73333333 0.58333333 0.83333333]

mean value: 0.7134570494864613

key: train_precision
value: [1.         0.98224852 1.         1.         1.         1.
 1.         1.         0.93529412 0.96045198]

mean value: 0.9877994615758248

key: test_recall
value: [0.73684211 0.68421053 0.52631579 0.78947368 0.78947368 0.57894737
 0.47368421 0.57894737 0.73684211 0.83333333]

mean value: 0.6728070175438596

key: train_recall
value: [0.94705882 0.97647059 0.65882353 0.96470588 0.88235294 0.85882353
 0.85882353 0.77058824 0.93529412 0.99415205]

mean value: 0.8847093223254214

key: test_roc_auc
value: [0.76315789 0.63157895 0.60526316 0.84210526 0.81578947 0.63157895
 0.57894737 0.68421053 0.59064327 0.8377193 ]

mean value: 0.6980994152046783

key: train_roc_auc
value: [0.97352941 0.97941176 0.82941176 0.98235294 0.94117647 0.92941176
 0.92941176 0.88529412 0.93548332 0.97648779]

mean value: 0.9361971104231166

key: test_jcc
value: [0.60869565 0.48148148 0.4        0.71428571 0.68181818 0.44
 0.36       0.47826087 0.48275862 0.71428571]

mean value: 0.5361586234299878

key: train_jcc
value: [0.94705882 0.95953757 0.65882353 0.96470588 0.88235294 0.85882353
 0.85882353 0.77058824 0.87845304 0.95505618]

mean value: 0.8734223261291885

MCC on Blind test: 0.41

Accuracy on Blind test: 0.71

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01517224 0.0148356  0.01910758 0.02715397 0.03688359 0.02013373
 0.03677201 0.03099537 0.03356433 0.03359127]

mean value: 0.026820969581604005

key: score_time
value: [0.01220894 0.01221657 0.01221132 0.02378225 0.02123022 0.02143621
 0.02273417 0.02497792 0.022928   0.02345252]

mean value: 0.019717812538146973

key: test_mcc
value: [0.89473684 0.9486833  0.78947368 0.84327404 0.84327404 0.58218174
 0.84327404 0.79388419 0.56725146 0.78764146]

mean value: 0.7893674798967042

key: train_mcc
value: [0.87064849 0.89411765 0.90588235 0.87684993 0.89417953 0.90014017
 0.87660709 0.89417953 0.90043693 0.90030617]

mean value: 0.8913347841716839

key: test_accuracy
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.78947368
 0.92105263 0.89473684 0.78378378 0.89189189]

mean value: 0.8938833570412518

key: train_accuracy
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.95
 0.93823529 0.94705882 0.95014663 0.95014663]

mean value: 0.9456175608073141

key: test_fscore
value: [0.94736842 0.97435897 0.89473684 0.92307692 0.92307692 0.77777778
 0.91891892 0.9        0.78947368 0.88235294]

mean value: 0.8931141405754409

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93567251 0.94705882 0.95294118 0.93913043 0.94736842 0.95043732
 0.93877551 0.94736842 0.95043732 0.95043732]

mean value: 0.9459627255064605

key: test_precision
value: [0.94736842 0.95       0.89473684 0.9        0.9        0.82352941
 0.94444444 0.85714286 0.78947368 0.9375    ]

mean value: 0.8944195660720429

key: train_precision
value: [0.93023256 0.94705882 0.95294118 0.92571429 0.94186047 0.94219653
 0.93063584 0.94186047 0.94219653 0.94767442]

mean value: 0.9402371094425134

key: test_recall
value: [0.94736842 1.         0.89473684 0.94736842 0.94736842 0.73684211
 0.89473684 0.94736842 0.78947368 0.83333333]

mean value: 0.893859649122807

key: train_recall
value: [0.94117647 0.94705882 0.95294118 0.95294118 0.95294118 0.95882353
 0.94705882 0.95294118 0.95882353 0.95321637]

mean value: 0.9517922256621947

key: test_roc_auc
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.78947368
 0.92105263 0.89473684 0.78362573 0.89035088]

mean value: 0.8937134502923977

key: train_roc_auc
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.95
 0.93823529 0.94705882 0.950172   0.9501376 ]

mean value: 0.9456191950464397

key: test_jcc
value: [0.9        0.95       0.80952381 0.85714286 0.85714286 0.63636364
 0.85       0.81818182 0.65217391 0.78947368]

mean value: 0.8120002575608983

key: train_jcc
value: [0.87912088 0.89944134 0.91011236 0.8852459  0.9        0.90555556
 0.88461538 0.9        0.90555556 0.90555556]

mean value: 0.897520253237496

MCC on Blind test: 0.79

Accuracy on Blind test: 0.9

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.1303339  0.28677583 0.31174827 0.25746155 0.25486231 0.268327
 0.14300919 0.26858115 0.38862586 0.40423918]

mean value: 0.27139642238616946

key: score_time
value: [0.01270962 0.0216949  0.02374864 0.02220631 0.02106261 0.03258085
 0.01293516 0.02181482 0.02210927 0.02351737]

mean value: 0.021437954902648926

key: test_mcc
value: [0.89473684 0.9486833  0.78947368 0.84327404 0.84327404 0.63245553
 0.84327404 0.79388419 0.56725146 0.78764146]

mean value: 0.7943948594573259

key: train_mcc
value: [0.87064849 0.89411765 0.90588235 0.87684993 0.89417953 0.9353103
 0.87660709 0.92947609 0.90043693 0.90030617]

mean value: 0.898381453059501

key: test_accuracy
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.81578947
 0.92105263 0.89473684 0.78378378 0.89189189]

mean value: 0.8965149359886202

key: train_accuracy
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.96764706
 0.93823529 0.96470588 0.95014663 0.95014663]

mean value: 0.94914697257202

key: test_fscore
value: [0.94736842 0.97435897 0.89473684 0.92307692 0.92307692 0.81081081
 0.91891892 0.9        0.78947368 0.88235294]

mean value: 0.8964174438787442

key: train_fscore
value: [0.93567251 0.94705882 0.95294118 0.93913043 0.94736842 0.96755162
 0.93877551 0.96449704 0.95043732 0.95043732]

mean value: 0.9493870180066716

key: test_precision
value: [0.94736842 0.95       0.89473684 0.9        0.9        0.83333333
 0.94444444 0.85714286 0.78947368 0.9375    ]

mean value: 0.8953999582289056

key: train_precision
value: [0.93023256 0.94705882 0.95294118 0.92571429 0.94186047 0.9704142
 0.93063584 0.9702381  0.94219653 0.94767442]

mean value: 0.9458966393938475

key: test_recall
value: [0.94736842 1.         0.89473684 0.94736842 0.94736842 0.78947368
 0.89473684 0.94736842 0.78947368 0.83333333]

mean value: 0.8991228070175439

key: train_recall
value: [0.94117647 0.94705882 0.95294118 0.95294118 0.95294118 0.96470588
 0.94705882 0.95882353 0.95882353 0.95321637]

mean value: 0.95296869625043

key: test_roc_auc
value: [0.94736842 0.97368421 0.89473684 0.92105263 0.92105263 0.81578947
 0.92105263 0.89473684 0.78362573 0.89035088]

mean value: 0.896345029239766

key: train_roc_auc
value: [0.93529412 0.94705882 0.95294118 0.93823529 0.94705882 0.96764706
 0.93823529 0.96470588 0.950172   0.9501376 ]

mean value: 0.9491486068111455

key: test_jcc
value: [0.9        0.95       0.80952381 0.85714286 0.85714286 0.68181818
 0.85       0.81818182 0.65217391 0.78947368]

mean value: 0.8165457121063529

key: train_jcc
value: [0.87912088 0.89944134 0.91011236 0.8852459  0.9        0.93714286
 0.88461538 0.93142857 0.90555556 0.90555556]

mean value: 0.9038218405390832

MCC on Blind test: 0.79

Accuracy on Blind test: 0.9

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.027807   0.0355041  0.06297231 0.05780268 0.05370164 0.03470182
 0.03526163 0.03333521 0.0357399  0.03369379]

mean value: 0.0410520076751709

key: score_time
value: [0.01207423 0.01532435 0.01223469 0.01207042 0.0144453  0.01446915
 0.0145359  0.01214051 0.01207852 0.01209593]

mean value: 0.01314690113067627

key: test_mcc
value: [0.89473684 0.9486833  0.73786479 0.84327404 0.73786479 0.48454371
 0.89473684 0.79388419 0.62807634 0.78764146]

mean value: 0.77513062978033

key: train_mcc
value: [0.85295593 0.86472084 0.87660709 0.85888297 0.85882353 0.86472084
 0.87058824 0.87064849 0.86511868 0.86511404]

mean value: 0.8648180657385829

key: test_accuracy
value: [0.94736842 0.97368421 0.86842105 0.92105263 0.86842105 0.73684211
 0.94736842 0.89473684 0.81081081 0.89189189]

mean value: 0.8860597439544808

key: train_accuracy
value: [0.92647059 0.93235294 0.93823529 0.92941176 0.92941176 0.93235294
 0.93529412 0.93529412 0.93255132 0.93255132]

mean value: 0.9323926168707952

key: test_fscore
value: [0.94736842 0.97435897 0.86486486 0.92307692 0.86486486 0.70588235
 0.94736842 0.9        0.82926829 0.88235294]

mean value: 0.8839406056071464

key: train_fscore
value: [0.92668622 0.93255132 0.93877551 0.92982456 0.92941176 0.93255132
 0.93529412 0.93491124 0.93255132 0.93294461]

mean value: 0.9325501978931156

key: test_precision
value: [0.94736842 0.95       0.88888889 0.9        0.88888889 0.8
 0.94736842 0.85714286 0.77272727 0.9375    ]

mean value: 0.8889884749753171

key: train_precision
value: [0.92397661 0.92982456 0.93063584 0.9244186  0.92941176 0.92982456
 0.93529412 0.94047619 0.92982456 0.93023256]

mean value: 0.9303919366167779

key: test_recall
value: [0.94736842 1.         0.84210526 0.94736842 0.84210526 0.63157895
 0.94736842 0.94736842 0.89473684 0.83333333]

mean value: 0.8833333333333333

key: train_recall
value: [0.92941176 0.93529412 0.94705882 0.93529412 0.92941176 0.93529412
 0.93529412 0.92941176 0.93529412 0.93567251]

mean value: 0.9347437220502236

key: test_roc_auc
value: [0.94736842 0.97368421 0.86842105 0.92105263 0.86842105 0.73684211
 0.94736842 0.89473684 0.80847953 0.89035088]

mean value: 0.8856725146198831

key: train_roc_auc
value: [0.92647059 0.93235294 0.93823529 0.92941176 0.92941176 0.93235294
 0.93529412 0.93529412 0.93255934 0.93254214]

mean value: 0.9323925008599931

key: test_jcc
value: [0.9        0.95       0.76190476 0.85714286 0.76190476 0.54545455
 0.9        0.81818182 0.70833333 0.78947368]

mean value: 0.7992395762132605

key: train_jcc
value: [0.86338798 0.87362637 0.88461538 0.86885246 0.86813187 0.87362637
 0.87845304 0.87777778 0.87362637 0.87431694]

mean value: 0.8736414567127365

MCC on Blind test: 0.83

Accuracy on Blind test: 0.91

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.12819099 0.96782017 0.88282394 1.06483197 0.87350464 1.21616435
 1.23634672 1.01481581 1.04923725 1.17672253]

mean value: 1.0610458374023437

key: score_time
value: [0.01469541 0.01223922 0.01531339 0.01517987 0.01223707 0.01528811
 0.01213527 0.01546454 0.0202651  0.015342  ]

mean value: 0.014815998077392579

key: test_mcc
value: [0.89973541 0.89973541 0.73786479 0.9486833  0.84327404 0.53300179
 0.84327404 0.79388419 0.51461988 0.78362573]

mean value: 0.7797698583492705

key: train_mcc
value: [0.97653817 0.89417953 1.         0.976741   0.88235294 0.90588235
 0.98236994 0.98250594 0.99415185 0.98833809]

mean value: 0.9583059813818373

key: test_accuracy
value: [0.94736842 0.94736842 0.86842105 0.97368421 0.92105263 0.76315789
 0.92105263 0.89473684 0.75675676 0.89189189]

mean value: 0.8885490753911807

key: train_accuracy
value: [0.98823529 0.94705882 1.         0.98823529 0.94117647 0.95294118
 0.99117647 0.99117647 0.99706745 0.9941349 ]

mean value: 0.9791202346041056

key: test_fscore
value: [0.95       0.95       0.87179487 0.97297297 0.92307692 0.74285714
 0.92307692 0.9        0.75675676 0.88888889]

mean value: 0.887942447942448

key: train_fscore
value: [0.98816568 0.94674556 1.         0.98809524 0.94117647 0.95294118
 0.99115044 0.99109792 0.99705015 0.99411765]

mean value: 0.9790540287635601

key: test_precision
value: [0.9047619  0.9047619  0.85       1.         0.9        0.8125
 0.9        0.85714286 0.77777778 0.88888889]

mean value: 0.8795833333333334

key: train_precision
value: [0.99404762 0.95238095 1.         1.         0.94117647 0.95294118
 0.99408284 1.         1.         1.        ]

mean value: 0.9834629058724081

key: test_recall
value: [1.         1.         0.89473684 0.94736842 0.94736842 0.68421053
 0.94736842 0.94736842 0.73684211 0.88888889]

mean value: 0.8994152046783626

key: train_recall
value: [0.98235294 0.94117647 1.         0.97647059 0.94117647 0.95294118
 0.98823529 0.98235294 0.99411765 0.98830409]

mean value: 0.9747127622979016

key: test_roc_auc
value: [0.94736842 0.94736842 0.86842105 0.97368421 0.92105263 0.76315789
 0.92105263 0.89473684 0.75730994 0.89181287]

mean value: 0.8885964912280702

key: train_roc_auc
value: [0.98823529 0.94705882 1.         0.98823529 0.94117647 0.95294118
 0.99117647 0.99117647 0.99705882 0.99415205]

mean value: 0.9791210870313037

key: test_jcc
value: [0.9047619  0.9047619  0.77272727 0.94736842 0.85714286 0.59090909
 0.85714286 0.81818182 0.60869565 0.8       ]

mean value: 0.806169177885425

key: train_jcc
value: [0.97660819 0.8988764  1.         0.97647059 0.88888889 0.91011236
 0.98245614 0.98235294 0.99411765 0.98830409]

mean value: 0.9598187250457052

MCC on Blind test: 0.76

Accuracy on Blind test: 0.88

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01373482 0.01149917 0.01022553 0.00962138 0.00941873 0.01020408
 0.00948644 0.00971484 0.0105443  0.00974584]

mean value: 0.010419511795043945

key: score_time
value: [0.01228786 0.00936007 0.00903177 0.00898194 0.00892377 0.00910449
 0.00882721 0.0090754  0.00997901 0.00893044]

mean value: 0.009450197219848633

key: test_mcc
value: [0.68803296 0.69989647 0.69989647 0.79388419 0.57894737 0.59222009
 0.47633051 0.73786479 0.57184997 0.64287856]

mean value: 0.6481801382126267

key: train_mcc
value: [0.64994387 0.6215412  0.63334622 0.65705784 0.64172131 0.6871247
 0.65360504 0.66133552 0.66975134 0.65909576]

mean value: 0.6534522792161372

key: test_accuracy
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.78947368
 0.73684211 0.86842105 0.78378378 0.81081081]

mean value: 0.8199857752489331

key: train_accuracy
value: [0.82352941 0.80882353 0.80588235 0.82647059 0.81764706 0.84117647
 0.82352941 0.82647059 0.83284457 0.82697947]

mean value: 0.8233353458685527

key: test_fscore
value: [0.83333333 0.82352941 0.82352941 0.88888889 0.78947368 0.76470588
 0.72222222 0.87179487 0.77777778 0.77419355]

mean value: 0.806944903249707

key: train_fscore
value: [0.81481481 0.79750779 0.77702703 0.81619938 0.80379747 0.83125
 0.81012658 0.8115016  0.82242991 0.81619938]

mean value: 0.8100853938516973

key: test_precision
value: [0.88235294 0.93333333 0.93333333 0.94117647 0.78947368 0.86666667
 0.76470588 0.85       0.82352941 0.92307692]

mean value: 0.8707648646503136

key: train_precision
value: [0.85714286 0.84768212 0.91269841 0.86754967 0.86986301 0.88666667
 0.87671233 0.88811189 0.87417219 0.87333333]

mean value: 0.8753932473928845

key: test_recall
value: [0.78947368 0.73684211 0.73684211 0.84210526 0.78947368 0.68421053
 0.68421053 0.89473684 0.73684211 0.66666667]

mean value: 0.756140350877193

key: train_recall
value: [0.77647059 0.75294118 0.67647059 0.77058824 0.74705882 0.78235294
 0.75294118 0.74705882 0.77647059 0.76608187]

mean value: 0.75484348125215

key: test_roc_auc
value: [0.84210526 0.84210526 0.84210526 0.89473684 0.78947368 0.78947368
 0.73684211 0.86842105 0.78508772 0.80701754]

mean value: 0.8197368421052632

key: train_roc_auc
value: [0.82352941 0.80882353 0.80588235 0.82647059 0.81764706 0.84117647
 0.82352941 0.82647059 0.83267974 0.82715858]

mean value: 0.8233367733058135

key: test_jcc
value: [0.71428571 0.7        0.7        0.8        0.65217391 0.61904762
 0.56521739 0.77272727 0.63636364 0.63157895]

mean value: 0.6791394494140489

key: train_jcc
value: [0.6875     0.66321244 0.63535912 0.68947368 0.67195767 0.71122995
 0.68085106 0.6827957  0.6984127  0.68947368]

mean value: 0.6810265999325266

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01158476 0.01068711 0.0107584  0.00968242 0.01254201 0.01016307
 0.01010203 0.00989175 0.00978208 0.01023674]

mean value: 0.010543036460876464

key: score_time
value: [0.01002502 0.00914407 0.00887299 0.00956154 0.01029301 0.0097959
 0.00908971 0.0090239  0.00896478 0.00972867]

mean value: 0.009449958801269531

key: test_mcc
value: [0.47368421 0.68421053 0.68421053 0.79388419 0.78947368 0.47368421
 0.78947368 0.63960215 0.48078072 0.62807634]

mean value: 0.6437080232004303

key: train_mcc
value: [0.73561236 0.70588235 0.75314969 0.72986649 0.70593121 0.73561236
 0.71769673 0.75314969 0.73607623 0.71966354]

mean value: 0.7292640648226028

key: test_accuracy
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
 0.89473684 0.81578947 0.72972973 0.81081081]

mean value: 0.8198435277382645

key: train_accuracy
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
 0.85882353 0.87647059 0.86803519 0.85923754]

mean value: 0.8644919786096257

key: test_fscore
value: [0.73684211 0.84210526 0.84210526 0.9        0.89473684 0.73684211
 0.89473684 0.82926829 0.77272727 0.78787879]

mean value: 0.8237242774341619

key: train_fscore
value: [0.86956522 0.85294118 0.87790698 0.86705202 0.85380117 0.86956522
 0.85964912 0.87790698 0.86725664 0.86363636]

mean value: 0.8659280881065122

key: test_precision
value: [0.73684211 0.84210526 0.84210526 0.85714286 0.89473684 0.73684211
 0.89473684 0.77272727 0.68       0.86666667]

mean value: 0.8123905217589428

key: train_precision
value: [0.85714286 0.85294118 0.86781609 0.85227273 0.84883721 0.85714286
 0.85465116 0.86781609 0.86982249 0.83977901]

mean value: 0.8568221664762061

key: test_recall
value: [0.73684211 0.84210526 0.84210526 0.94736842 0.89473684 0.73684211
 0.89473684 0.89473684 0.89473684 0.72222222]

mean value: 0.8406432748538012

key: train_recall
value: [0.88235294 0.85294118 0.88823529 0.88235294 0.85882353 0.88235294
 0.86470588 0.88823529 0.86470588 0.88888889]

mean value: 0.8753594771241829

key: test_roc_auc
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
 0.89473684 0.81578947 0.7251462  0.80847953]

mean value: 0.8191520467836257

key: train_roc_auc
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
 0.85882353 0.87647059 0.86802546 0.85915033]

mean value: 0.8644822841417269

key: test_jcc
value: [0.58333333 0.72727273 0.72727273 0.81818182 0.80952381 0.58333333
 0.80952381 0.70833333 0.62962963 0.65      ]

mean value: 0.7046404521404521

key: train_jcc
value: [0.76923077 0.74358974 0.78238342 0.76530612 0.74489796 0.76923077
 0.75384615 0.78238342 0.765625   0.76      ]

mean value: 0.7636493356908327

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00920463 0.01011729 0.01018023 0.01018381 0.0103085  0.01037335
 0.01010799 0.01031637 0.01011348 0.01027107]

mean value: 0.010117673873901367

key: score_time
value: [0.01739001 0.01198483 0.01202416 0.01202321 0.0119853  0.01478839
 0.01201344 0.01181197 0.01210189 0.01525283]

mean value: 0.013137602806091308

key: test_mcc
value: [0.52704628 0.57894737 0.31622777 0.52704628 0.43643578 0.36842105
 0.37047929 0.21821789 0.40780312 0.75614764]

mean value: 0.450677246185986

key: train_mcc
value: [0.63533809 0.67063465 0.67657595 0.63547005 0.71207276 0.68853317
 0.64710361 0.64723801 0.72491598 0.58359133]

mean value: 0.6621473590091169

key: test_accuracy
value: [0.76315789 0.78947368 0.65789474 0.76315789 0.71052632 0.68421053
 0.68421053 0.60526316 0.7027027  0.86486486]

mean value: 0.7225462304409673

key: train_accuracy
value: [0.81764706 0.83529412 0.83823529 0.81764706 0.85588235 0.84411765
 0.82352941 0.82352941 0.86217009 0.79178886]

mean value: 0.8309841297222701

key: test_fscore
value: [0.75675676 0.78947368 0.64864865 0.75675676 0.74418605 0.68421053
 0.66666667 0.54545455 0.73170732 0.83870968]

mean value: 0.7162570625813843

key: train_fscore
value: [0.81871345 0.83625731 0.83965015 0.81547619 0.85373134 0.84637681
 0.8245614  0.8255814  0.85885886 0.79178886]

mean value: 0.8310995765381942

key: test_precision
value: [0.77777778 0.78947368 0.66666667 0.77777778 0.66666667 0.68421053
 0.70588235 0.64285714 0.68181818 1.        ]

mean value: 0.7393130777031706

key: train_precision
value: [0.81395349 0.83139535 0.83236994 0.8253012  0.86666667 0.83428571
 0.81976744 0.81609195 0.87730061 0.79411765]

mean value: 0.8311250021616702

key: test_recall
value: [0.73684211 0.78947368 0.63157895 0.73684211 0.84210526 0.68421053
 0.63157895 0.47368421 0.78947368 0.72222222]

mean value: 0.7038011695906432

key: train_recall
value: [0.82352941 0.84117647 0.84705882 0.80588235 0.84117647 0.85882353
 0.82941176 0.83529412 0.84117647 0.78947368]

mean value: 0.8313003095975232

key: test_roc_auc
value: [0.76315789 0.78947368 0.65789474 0.76315789 0.71052632 0.68421053
 0.68421053 0.60526316 0.7002924  0.86111111]

mean value: 0.7219298245614035

key: train_roc_auc
value: [0.81764706 0.83529412 0.83823529 0.81764706 0.85588235 0.84411765
 0.82352941 0.82352941 0.8621087  0.79179567]

mean value: 0.8309786721706227

key: test_jcc
value: [0.60869565 0.65217391 0.48       0.60869565 0.59259259 0.52
 0.5        0.375      0.57692308 0.72222222]

mean value: 0.5636303109129196

key: train_jcc
value: [0.69306931 0.71859296 0.72361809 0.68844221 0.74479167 0.73366834
 0.70149254 0.7029703  0.75263158 0.65533981]

mean value: 0.7114616800753307

MCC on Blind test: 0.5

Accuracy on Blind test: 0.75

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01786804 0.01848149 0.01594734 0.01577377 0.0179441  0.01872945
 0.01715374 0.01642799 0.01686239 0.01881289]

mean value: 0.017400121688842772

key: score_time
value: [0.01160026 0.01157236 0.01054382 0.01053405 0.01060677 0.01129889
 0.01175284 0.01054502 0.01147771 0.01128459]

mean value: 0.011121630668640137

key: test_mcc
value: [0.84327404 0.9486833  0.78947368 0.78947368 0.78947368 0.48454371
 0.89473684 0.73786479 0.51793973 0.78764146]

mean value: 0.7583104925854314

key: train_mcc
value: [0.80005537 0.78823529 0.79413139 0.81766121 0.80005537 0.8058963
 0.78236648 0.80005537 0.82404541 0.79483211]

mean value: 0.8007334285346163

key: test_accuracy
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.89473684 0.73684211
 0.94736842 0.86842105 0.75675676 0.89189189]

mean value: 0.878022759601707

key: train_accuracy
value: [0.9        0.89411765 0.89705882 0.90882353 0.9        0.90294118
 0.89117647 0.9        0.91202346 0.8973607 ]

mean value: 0.9003501811281698

key: test_fscore
value: [0.91891892 0.97435897 0.89473684 0.89473684 0.89473684 0.70588235
 0.94736842 0.87179487 0.7804878  0.88235294]

mean value: 0.8765374811436882

key: train_fscore
value: [0.9005848  0.89411765 0.8973607  0.90909091 0.9005848  0.90322581
 0.89085546 0.89940828 0.91176471 0.89855072]

mean value: 0.9005543828827779

key: test_precision
value: [0.94444444 0.95       0.89473684 0.89473684 0.89473684 0.8
 0.94736842 0.85       0.72727273 0.9375    ]

mean value: 0.8840796119085592

key: train_precision
value: [0.89534884 0.89411765 0.89473684 0.90643275 0.89534884 0.9005848
 0.89349112 0.9047619  0.91176471 0.8908046 ]

mean value: 0.8987392040048102

key: test_recall
value: [0.89473684 1.         0.89473684 0.89473684 0.89473684 0.63157895
 0.94736842 0.89473684 0.84210526 0.83333333]

mean value: 0.8728070175438596

key: train_recall
value: [0.90588235 0.89411765 0.9        0.91176471 0.90588235 0.90588235
 0.88823529 0.89411765 0.91176471 0.90643275]

mean value: 0.9024079807361541

key: test_roc_auc
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.89473684 0.73684211
 0.94736842 0.86842105 0.75438596 0.89035088]

mean value: 0.8776315789473684

key: train_roc_auc
value: [0.9        0.89411765 0.89705882 0.90882353 0.9        0.90294118
 0.89117647 0.9        0.9120227  0.89733402]

mean value: 0.9003474372205023

key: test_jcc
value: [0.85       0.95       0.80952381 0.80952381 0.80952381 0.54545455
 0.9        0.77272727 0.64       0.78947368]

mean value: 0.7876226930963773

key: train_jcc
value: [0.81914894 0.80851064 0.81382979 0.83333333 0.81914894 0.82352941
 0.80319149 0.8172043  0.83783784 0.81578947]

mean value: 0.8191524144929399

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.49616385 1.30514312 1.44180202 1.37484336 1.37929106 1.37126446
 1.30904698 1.39595342 1.31129622 1.29941964]

mean value: 1.3684224128723144

key: score_time
value: [0.01476836 0.01247644 0.01286674 0.02777433 0.01502228 0.01509333
 0.01474261 0.01487732 0.0184629  0.01495028]

mean value: 0.016103458404541016

key: test_mcc
value: [0.89473684 0.89973541 0.68421053 0.89973541 0.85280287 0.58218174
 0.89473684 0.79388419 0.51319869 0.94721815]

mean value: 0.7962440656984944

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.94736842 0.84210526 0.94736842 0.92105263 0.78947368
 0.94736842 0.89473684 0.75675676 0.97297297]

mean value: 0.8966571834992887

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.95       0.84210526 0.94444444 0.92682927 0.77777778
 0.94736842 0.9        0.76923077 0.97142857]

mean value: 0.8976552936437403

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.9047619  0.84210526 1.         0.86363636 0.82352941
 0.94736842 0.85714286 0.75       1.        ]

mean value: 0.8935912642568989

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94736842 1.         0.84210526 0.89473684 1.         0.73684211
 0.94736842 0.94736842 0.78947368 0.94444444]

mean value: 0.9049707602339181

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.94736842 0.84210526 0.94736842 0.92105263 0.78947368
 0.94736842 0.89473684 0.75584795 0.97222222]

mean value: 0.8964912280701754

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.9047619  0.72727273 0.89473684 0.86363636 0.63636364
 0.9        0.81818182 0.625      0.94444444]

mean value: 0.8214397736766158

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02944374 0.0182209  0.01535916 0.01744986 0.01732898 0.01718378
 0.01695347 0.01524591 0.01629186 0.01514053]

mean value: 0.017861819267272948

key: score_time
value: [0.01158214 0.00925207 0.00901937 0.0089376  0.00885868 0.00885081
 0.00914764 0.00883889 0.00884986 0.00873947]

mean value: 0.009207653999328613

key: test_mcc
value: [0.9486833  0.9486833  0.84327404 0.9486833  0.9486833  0.89973541
 0.85280287 0.84327404 0.73099415 0.94736842]

mean value: 0.8912182126989485

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.97368421 0.92105263 0.97368421 0.97368421 0.94736842
 0.92105263 0.92105263 0.86486486 0.97297297]

mean value: 0.9443100995732575

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.97297297 0.92307692 0.97435897 0.97435897 0.94444444
 0.92682927 0.91891892 0.86486486 0.97297297]

mean value: 0.9447157288620703

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       1.         0.9        0.95       0.95       1.
 0.86363636 0.94444444 0.88888889 0.94736842]

mean value: 0.9394338118022328

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.94736842 0.94736842 1.         1.         0.89473684
 1.         0.89473684 0.84210526 1.        ]

mean value: 0.9526315789473684

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.97368421 0.92105263 0.97368421 0.97368421 0.94736842
 0.92105263 0.92105263 0.86549708 0.97368421]

mean value: 0.9444444444444444

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.94736842 0.85714286 0.95       0.95       0.89473684
 0.86363636 0.85       0.76190476 0.94736842]

mean value: 0.8972157666894509

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.87

Accuracy on Blind test: 0.93

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.11228299 0.11245561 0.10948372 0.11234665 0.10924101 0.1089797
 0.11292195 0.11369872 0.11021519 0.10727549]

mean value: 0.1108901023864746

key: score_time
value: [0.01883864 0.01856804 0.01872277 0.01774263 0.01919746 0.01752329
 0.01925826 0.01753545 0.01749849 0.0175736 ]

mean value: 0.018245863914489745

key: test_mcc
value: [0.9486833  0.89473684 0.63960215 0.84327404 0.78947368 0.73786479
 0.89473684 0.79388419 0.62170355 1.        ]

mean value: 0.8163959384712077

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.94736842 0.81578947 0.92105263 0.89473684 0.86842105
 0.94736842 0.89473684 0.81081081 1.        ]

mean value: 0.9073968705547653

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97435897 0.94736842 0.8        0.92307692 0.89473684 0.87179487
 0.94736842 0.9        0.82051282 1.        ]

mean value: 0.9079217273954115

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.94736842 0.875      0.9        0.89473684 0.85
 0.94736842 0.85714286 0.8        1.        ]

mean value: 0.9021616541353383

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.94736842 0.73684211 0.94736842 0.89473684 0.89473684
 0.94736842 0.94736842 0.84210526 1.        ]

mean value: 0.9157894736842105

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.81578947 0.92105263 0.89473684 0.86842105
 0.94736842 0.89473684 0.80994152 1.        ]

mean value: 0.9073099415204678

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95       0.9        0.66666667 0.85714286 0.80952381 0.77272727
 0.9        0.81818182 0.69565217 1.        ]

mean value: 0.8369894598155467

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01023006 0.0107708  0.00978875 0.01090908 0.00982141 0.00973773
 0.01041436 0.009727   0.01096225 0.0110054 ]

mean value: 0.010336685180664062

key: score_time
value: [0.00909853 0.00951576 0.00937533 0.00973797 0.00959563 0.0088625
 0.00879812 0.0088625  0.00882602 0.00961709]

mean value: 0.009228944778442383

key: test_mcc
value: [0.57894737 0.63245553 0.21821789 0.47368421 0.58218174 0.68803296
 0.47633051 0.42640143 0.19005848 0.62170355]

mean value: 0.48880136757948445

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.81578947 0.60526316 0.73684211 0.78947368 0.84210526
 0.73684211 0.71052632 0.59459459 0.81081081]

mean value: 0.743172119487909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.78947368 0.81081081 0.54545455 0.73684211 0.8        0.85
 0.72222222 0.73170732 0.59459459 0.8       ]

mean value: 0.7381105279629028

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.78947368 0.83333333 0.64285714 0.73684211 0.76190476 0.80952381
 0.76470588 0.68181818 0.61111111 0.82352941]

mean value: 0.7455099424139672

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.78947368 0.78947368 0.47368421 0.73684211 0.84210526 0.89473684
 0.68421053 0.78947368 0.57894737 0.77777778]

mean value: 0.735672514619883

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78947368 0.81578947 0.60526316 0.73684211 0.78947368 0.84210526
 0.73684211 0.71052632 0.59502924 0.80994152]

mean value: 0.7431286549707602

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.65217391 0.68181818 0.375      0.58333333 0.66666667 0.73913043
 0.56521739 0.57692308 0.42307692 0.66666667]

mean value: 0.5930006587615283

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.52

Accuracy on Blind test: 0.76

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.55051184 1.54603195 1.53799772 1.54854393 1.5016973  1.51683092
 1.5347023  1.53684664 1.5649302  1.55764413]

mean value: 1.5395736932754516

key: score_time
value: [0.09356093 0.09390473 0.09295797 0.09251022 0.09202743 0.09703946
 0.09734011 0.09730768 0.09886241 0.09727573]

mean value: 0.09527866840362549

key: test_mcc
value: [1.         1.         0.78947368 0.89473684 0.89973541 0.89973541
 1.         0.9486833  0.78362573 0.94736842]

mean value: 0.9163358798097961

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         0.89473684 0.94736842 0.94736842 0.94736842
 1.         0.97368421 0.89189189 0.97297297]

mean value: 0.9575391180654338

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         0.89473684 0.94736842 0.95       0.94444444
 1.         0.97435897 0.89473684 0.97297297]

mean value: 0.9578618497039549

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.89473684 0.94736842 0.9047619  1.
 1.         0.95       0.89473684 0.94736842]

mean value: 0.9538972431077695

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.89473684 0.94736842 1.         0.89473684
 1.         1.         0.89473684 1.        ]

mean value: 0.9631578947368421

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         0.89473684 0.94736842 0.94736842 0.94736842
 1.         0.97368421 0.89181287 0.97368421]

mean value: 0.9576023391812866

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         0.80952381 0.9        0.9047619  0.89473684
 1.         0.95       0.80952381 0.94736842]

mean value: 0.9215914786967419

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.91693258 0.89162326 0.93540263 0.97587872 0.90822935 0.87675786
 0.95299554 0.92298007 1.00643253 0.8825717 ]

mean value: 0.9269804239273072

key: score_time
value: [0.22907305 0.24806905 0.17145753 0.21838999 0.27262688 0.16964602
 0.24580407 0.27468348 0.24008393 0.2604115 ]

mean value: 0.23302454948425294

key: test_mcc
value: [1.         0.89973541 0.68803296 0.89473684 0.84327404 0.85280287
 0.9486833  0.89973541 0.67849265 0.94736842]

mean value: 0.8652861897439335

key: train_mcc
value: [0.95884012 0.95884012 0.95884012 0.95294118 0.95300713 0.95884012
 0.95884012 0.96477265 0.97653939 0.95896113]

mean value: 0.9600422066629911

key: test_accuracy
value: [1.         0.94736842 0.84210526 0.94736842 0.92105263 0.92105263
 0.97368421 0.94736842 0.83783784 0.97297297]

mean value: 0.931081081081081

key: train_accuracy
value: [0.97941176 0.97941176 0.97941176 0.97647059 0.97647059 0.97941176
 0.97941176 0.98235294 0.98826979 0.97947214]

mean value: 0.9800094876660341

key: test_fscore
value: [1.         0.94444444 0.83333333 0.94736842 0.92307692 0.91428571
 0.97297297 0.95       0.85       0.97297297]

mean value: 0.9308454782138993

key: train_fscore
value: [0.97935103 0.97935103 0.97935103 0.97647059 0.97633136 0.97947214
 0.97935103 0.98224852 0.98823529 0.97947214]

mean value: 0.9799634175328182

key: test_precision
value: [1.         1.         0.88235294 0.94736842 0.9        1.
 1.         0.9047619  0.80952381 0.94736842]

mean value: 0.9391375497567448/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(


key: train_precision
value: [0.98224852 0.98224852 0.98224852 0.97647059 0.98214286 0.97660819
 0.98224852 0.98809524 0.98823529 0.98235294]

mean value: 0.9822899188742247

key: test_recall
value: [1.         0.89473684 0.78947368 0.94736842 0.94736842 0.84210526
 0.94736842 1.         0.89473684 1.        ]

mean value: 0.9263157894736842

key: train_recall
value: [0.97647059 0.97647059 0.97647059 0.97647059 0.97058824 0.98235294
 0.97647059 0.97647059 0.98823529 0.97660819]

mean value: 0.9776608187134502

key: test_roc_auc
value: [1.         0.94736842 0.84210526 0.94736842 0.92105263 0.92105263
 0.97368421 0.94736842 0.83625731 0.97368421]

mean value: 0.9309941520467836

key: train_roc_auc
value: [0.97941176 0.97941176 0.97941176 0.97647059 0.97647059 0.97941176
 0.97941176 0.98235294 0.98826969 0.97948056]

mean value: 0.9800103199174407

key: test_jcc
value: [1.         0.89473684 0.71428571 0.9        0.85714286 0.84210526
 0.94736842 0.9047619  0.73913043 0.94736842]

mean value: 0.8746899858341506

key: train_jcc
value: [0.95953757 0.95953757 0.95953757 0.95402299 0.95375723 0.95977011
 0.95953757 0.96511628 0.97674419 0.95977011]

mean value: 0.960733119795795

MCC on Blind test: 0.86

Accuracy on Blind test: 0.93

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.010741   0.00969005 0.01051641 0.00985312 0.00964284 0.010144
 0.00974083 0.0098412  0.00984383 0.01043248]

mean value: 0.010044574737548828

key: score_time
value: [0.0090158  0.00899315 0.00946307 0.00896311 0.0089345  0.01059484
 0.00890207 0.00879717 0.00891232 0.00898099]

mean value: 0.009155702590942384

key: test_mcc
value: [0.47368421 0.68421053 0.68421053 0.79388419 0.78947368 0.47368421
 0.78947368 0.63960215 0.48078072 0.62807634]

mean value: 0.6437080232004303

key: train_mcc
value: [0.73561236 0.70588235 0.75314969 0.72986649 0.70593121 0.73561236
 0.71769673 0.75314969 0.73607623 0.71966354]

mean value: 0.7292640648226028

key: test_accuracy
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
 0.89473684 0.81578947 0.72972973 0.81081081]

mean value: 0.8198435277382645

key: train_accuracy
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
 0.85882353 0.87647059 0.86803519 0.85923754]

mean value: 0.8644919786096257

key: test_fscore
value: [0.73684211 0.84210526 0.84210526 0.9        0.89473684 0.73684211
 0.89473684 0.82926829 0.77272727 0.78787879]

mean value: 0.8237242774341619

key: train_fscore
value: [0.86956522 0.85294118 0.87790698 0.86705202 0.85380117 0.86956522
 0.85964912 0.87790698 0.86725664 0.86363636]

mean value: 0.8659280881065122

key: test_precision
value: [0.73684211 0.84210526 0.84210526 0.85714286 0.89473684 0.73684211
 0.89473684 0.77272727 0.68       0.86666667]

mean value: 0.8123905217589428

key: train_precision
value: [0.85714286 0.85294118 0.86781609 0.85227273 0.84883721 0.85714286
 0.85465116 0.86781609 0.86982249 0.83977901]

mean value: 0.8568221664762061

key: test_recall
value: [0.73684211 0.84210526 0.84210526 0.94736842 0.89473684 0.73684211
 0.89473684 0.89473684 0.89473684 0.72222222]

mean value: 0.8406432748538012

key: train_recall
value: [0.88235294 0.85294118 0.88823529 0.88235294 0.85882353 0.88235294
 0.86470588 0.88823529 0.86470588 0.88888889]

mean value: 0.8753594771241829

key: test_roc_auc
value: [0.73684211 0.84210526 0.84210526 0.89473684 0.89473684 0.73684211
 0.89473684 0.81578947 0.7251462  0.80847953]

mean value: 0.8191520467836257

key: train_roc_auc
value: [0.86764706 0.85294118 0.87647059 0.86470588 0.85294118 0.86764706
 0.85882353 0.87647059 0.86802546 0.85915033]

mean value: 0.8644822841417269

key: test_jcc
value: [0.58333333 0.72727273 0.72727273 0.81818182 0.80952381 0.58333333
 0.80952381 0.70833333 0.62962963 0.65      ]

mean value: 0.7046404521404521

key: train_jcc
value: [0.76923077 0.74358974 0.78238342 0.76530612 0.74489796 0.76923077
 0.75384615 0.78238342 0.765625   0.76      ]

mean value: 0.7636493356908327

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.10662818 0.08921003 0.0561254  0.17810702 0.05234981 0.05486059
 0.06721592 0.05890155 0.26667285 0.05537868]

mean value: 0.0985450029373169

key: score_time
value: [0.01744914 0.01164341 0.01065397 0.01203752 0.01097751 0.01060772
 0.01059103 0.01130843 0.01121664 0.01066351]

mean value: 0.011714887619018555

key: test_mcc
value: [1.         1.         1.         0.9486833  0.9486833  0.89973541
 0.84327404 0.89473684 0.78362573 0.94736842]

mean value: 0.9266107043807079

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         1.         0.97368421 0.97368421 0.94736842
 0.92105263 0.94736842 0.89189189 0.97297297]

mean value: 0.9628022759601707

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         1.         0.97435897 0.97435897 0.94444444
 0.92307692 0.94736842 0.89473684 0.97297297]

mean value: 0.9631317552370184

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95       0.95       1.
 0.9        0.94736842 0.89473684 0.94736842]

mean value: 0.9589473684210527

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.89473684
 0.94736842 0.94736842 0.89473684 1.        ]

mean value: 0.968421052631579

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         1.         0.97368421 0.97368421 0.94736842
 0.92105263 0.94736842 0.89181287 0.97368421]

mean value: 0.9628654970760234

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         1.         0.95       0.95       0.89473684
 0.85714286 0.9        0.80952381 0.94736842]

mean value: 0.9308771929824561

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.04259491 0.05594087 0.03498626 0.03416085 0.03477049 0.06916142
 0.06280589 0.03609562 0.05616331 0.07135487]

mean value: 0.04980344772338867

key: score_time
value: [0.02427268 0.01239538 0.01218939 0.01224589 0.01225781 0.02066946
 0.01230192 0.01249409 0.02214265 0.01604795]

mean value: 0.015701723098754884

key: test_mcc
value: [0.74620251 1.         0.73786479 0.84327404 0.78947368 0.38829014
 0.68421053 0.80757285 0.51461988 0.83918129]

mean value: 0.7350689707890694

key: train_mcc
value: [0.92354539 0.92966915 0.95884012 0.92947609 0.95294118 0.95300713
 0.94124161 0.9353103  0.95314596 0.94762566]

mean value: 0.9424802586910812

key: test_accuracy
value: [0.86842105 1.         0.86842105 0.92105263 0.89473684 0.68421053
 0.84210526 0.89473684 0.75675676 0.91891892]

mean value: 0.8649359886201992

key: train_accuracy
value: [0.96176471 0.96470588 0.97941176 0.96470588 0.97647059 0.97647059
 0.97058824 0.96764706 0.97653959 0.97360704]

mean value: 0.9711911333448335

key: test_fscore
value: [0.87804878 1.         0.86486486 0.92307692 0.89473684 0.625
 0.84210526 0.9047619  0.75675676 0.91891892]

mean value: 0.8608270254130331

key: train_fscore
value: [0.96165192 0.96428571 0.97935103 0.96449704 0.97647059 0.97660819
 0.9704142  0.96755162 0.97660819 0.97329377]

mean value: 0.9710732260210945

key: test_precision
value: [0.81818182 1.         0.88888889 0.9        0.89473684 0.76923077
 0.84210526 0.82608696 0.77777778 0.89473684]

mean value: 0.8611745157969415

key: train_precision
value: [0.96449704 0.97590361 0.98224852 0.9702381  0.97647059 0.97093023
 0.97619048 0.9704142  0.97093023 0.98795181]

mean value: 0.9745774809780501

key: test_recall
value: [0.94736842 1.         0.84210526 0.94736842 0.89473684 0.52631579
 0.84210526 1.         0.73684211 0.94444444]

mean value: 0.8681286549707602

key: train_recall
value: [0.95882353 0.95294118 0.97647059 0.95882353 0.97647059 0.98235294
 0.96470588 0.96470588 0.98235294 0.95906433]

mean value: 0.9676711386308909

key: test_roc_auc
value: [0.86842105 1.         0.86842105 0.92105263 0.89473684 0.68421053
 0.84210526 0.89473684 0.75730994 0.91959064]

mean value: 0.8650584795321637

key: train_roc_auc
value: [0.96176471 0.96470588 0.97941176 0.96470588 0.97647059 0.97647059
 0.97058824 0.96764706 0.97655659 0.97364981]

mean value: 0.9711971104231166

key: test_jcc
value: [0.7826087  1.         0.76190476 0.85714286 0.80952381 0.45454545
 0.72727273 0.82608696 0.60869565 0.85      ]

mean value: 0.7677780914737437

key: train_jcc
value: [0.92613636 0.93103448 0.95953757 0.93142857 0.95402299 0.95428571
 0.94252874 0.93714286 0.95428571 0.94797688]

mean value: 0.9438379878542824

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0131712  0.01293206 0.00980425 0.01048255 0.01042271 0.01054835
 0.01047897 0.01062369 0.0106225  0.01053381]

mean value: 0.01096200942993164

key: score_time
value: [0.01183963 0.00915504 0.00976515 0.00946379 0.00945783 0.00950193
 0.00948477 0.00947523 0.00913143 0.00954247]

mean value: 0.00968172550201416

key: test_mcc
value: [0.59222009 0.68803296 0.63245553 0.63960215 0.68421053 0.58218174
 0.63960215 0.73786479 0.52960948 0.73821295]

mean value: 0.6463992364353213

key: train_mcc
value: [0.65923425 0.60639664 0.70632241 0.65322377 0.67175144 0.7236421
 0.60766169 0.74707175 0.71317436 0.69522435]

mean value: 0.6783702762219075

key: test_accuracy
value: [0.78947368 0.84210526 0.81578947 0.81578947 0.84210526 0.78947368
 0.81578947 0.86842105 0.75675676 0.86486486]

mean value: 0.8200568990042674

key: train_accuracy
value: [0.82941176 0.80294118 0.85294118 0.82647059 0.83529412 0.86176471
 0.80294118 0.87352941 0.85630499 0.84750733]

mean value: 0.8389106434362601

key: test_fscore
value: [0.76470588 0.83333333 0.81081081 0.8        0.84210526 0.77777778
 0.8        0.87179487 0.79069767 0.84848485]

mean value: 0.8139710462131082

key: train_fscore
value: [0.82634731 0.7987988  0.8502994  0.8238806  0.83030303 0.86053412
 0.79510703 0.87315634 0.85285285 0.84615385]

mean value: 0.8357433332161395

key: test_precision
value: [0.86666667 0.88235294 0.83333333 0.875      0.84210526 0.82352941
 0.875      0.85       0.70833333 0.93333333]

mean value: 0.8489654282765737

key: train_precision
value: [0.84146341 0.81595092 0.86585366 0.83636364 0.85625    0.86826347
 0.82802548 0.87573964 0.87116564 0.85628743]

mean value: 0.8515363294832559

key: test_recall
value: [0.68421053 0.78947368 0.78947368 0.73684211 0.84210526 0.73684211
 0.73684211 0.89473684 0.89473684 0.77777778]

mean value: 0.7883040935672514

key: train_recall
value: [0.81176471 0.78235294 0.83529412 0.81176471 0.80588235 0.85294118
 0.76470588 0.87058824 0.83529412 0.83625731]

mean value: 0.8206845545235638

key: test_roc_auc
value: [0.78947368 0.84210526 0.81578947 0.81578947 0.84210526 0.78947368
 0.81578947 0.86842105 0.75292398 0.8625731 ]

mean value: 0.8194444444444444

key: train_roc_auc
value: [0.82941176 0.80294118 0.85294118 0.82647059 0.83529412 0.86176471
 0.80294118 0.87352941 0.85624355 0.84754042]

mean value: 0.8389078087375301

key: test_jcc
value: [0.61904762 0.71428571 0.68181818 0.66666667 0.72727273 0.63636364
 0.66666667 0.77272727 0.65384615 0.73684211]

mean value: 0.6875536743957796

key: train_jcc
value: [0.70408163 0.665      0.73958333 0.70050761 0.70984456 0.75520833
 0.65989848 0.77486911 0.7434555  0.73333333]

mean value: 0.7185781890938955

MCC on Blind test: 0.77

Accuracy on Blind test: 0.89

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01748848 0.01892996 0.01658511 0.01925921 0.01871228 0.01994371
 0.01729989 0.01780486 0.0200069  0.01862764]

mean value: 0.018465805053710937

key: score_time
value: [0.0090909  0.01126218 0.01121306 0.01176643 0.01179862 0.01175475
 0.01167011 0.01188803 0.01181984 0.0126853 ]

mean value: 0.011494922637939452

key: test_mcc
value: [0.76376262 0.89973541 0.38729833 0.78947368 0.84327404 0.52704628
 0.29277002 0.79388419 0.62807634 0.75614764]

mean value: 0.6681468554833931

key: train_mcc
value: [0.8452381  0.79448906 0.5864073  0.88333157 0.89010061 0.90189002
 0.40544243 0.84174979 0.88932517 0.89043758]

mean value: 0.7928411614629753

key: test_accuracy
value: [0.86842105 0.94736842 0.65789474 0.89473684 0.92105263 0.76315789
 0.57894737 0.89473684 0.81081081 0.86486486]

mean value: 0.820199146514936

key: train_accuracy
value: [0.91764706 0.88823529 0.75588235 0.94117647 0.94411765 0.95
 0.64117647 0.91470588 0.94428152 0.94428152]

mean value: 0.8841504226323961

key: test_fscore
value: [0.84848485 0.95       0.73469388 0.89473684 0.91891892 0.76923077
 0.7037037  0.9        0.82926829 0.83870968]

mean value: 0.8387746930096805

key: train_fscore
value: [0.91082803 0.89893617 0.80378251 0.94252874 0.94224924 0.95156695
 0.73593074 0.90675241 0.94524496 0.94259819]

mean value: 0.8980417920511166

key: test_precision
value: [1.         0.9047619  0.6        0.89473684 0.94444444 0.75
 0.54285714 0.85714286 0.77272727 1.        ]

mean value: 0.8266670464038886

key: train_precision
value: [0.99305556 0.82038835 0.67193676 0.92134831 0.97484277 0.92265193
 0.58219178 1.         0.92655367 0.975     ]

mean value: 0.8787969132705697

key: test_recall
value: [0.73684211 1.         0.94736842 0.89473684 0.89473684 0.78947368
 1.         0.94736842 0.89473684 0.72222222]

mean value: 0.882748538011696

key: train_recall
value: [0.84117647 0.99411765 1.         0.96470588 0.91176471 0.98235294
 1.         0.82941176 0.96470588 0.9122807 ]

mean value: 0.9400515995872033

key: test_roc_auc
value: [0.86842105 0.94736842 0.65789474 0.89473684 0.92105263 0.76315789
 0.57894737 0.89473684 0.80847953 0.86111111]

mean value: 0.8195906432748539

key: train_roc_auc
value: [0.91764706 0.88823529 0.75588235 0.94117647 0.94411765 0.95
 0.64117647 0.91470588 0.94434125 0.94437564]

mean value: 0.8841658066735466

key: test_jcc
value: [0.73684211 0.9047619  0.58064516 0.80952381 0.85       0.625
 0.54285714 0.81818182 0.70833333 0.72222222]

mean value: 0.7298367497433711

key: train_jcc
value: [0.83625731 0.81642512 0.67193676 0.89130435 0.8908046  0.9076087
 0.58219178 0.82941176 0.89617486 0.89142857]

mean value: 0.8213543811131508

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01583433 0.01828694 0.01696563 0.02924252 0.03335357 0.01711559
 0.01663041 0.01640534 0.01945257 0.01540542]

mean value: 0.019869232177734376

key: score_time
value: [0.01194024 0.01182508 0.02836466 0.01295829 0.02824974 0.0117662
 0.01195216 0.01181483 0.01190305 0.01181436]

mean value: 0.015258860588073731

key: test_mcc
value: [0.74620251 0.85280287 0.57894737 0.84327404 0.84327404 0.69989647
 0.76376262 0.63828474 0.63129316 0.69356297]

mean value: 0.7291300780936345

key: train_mcc
value: [0.78047467 0.86610667 0.91771057 0.87209836 0.90594505 0.81649658
 0.67766324 0.74545617 0.88351945 0.72157164]

mean value: 0.8187042409466225

key: test_accuracy
value: [0.86842105 0.92105263 0.78947368 0.92105263 0.92105263 0.84210526
 0.86842105 0.78947368 0.81081081 0.83783784]

mean value: 0.8569701280227596

key: train_accuracy
value: [0.88235294 0.92941176 0.95882353 0.93529412 0.95294118 0.9
 0.81470588 0.85882353 0.93841642 0.84750733]

mean value: 0.901827669484216

key: test_fscore
value: [0.87804878 0.91428571 0.78947368 0.92307692 0.92307692 0.85714286
 0.88372093 0.82608696 0.8        0.85      ]

mean value: 0.8644912769035046

key: train_fscore
value: [0.89304813 0.9245283  0.95857988 0.93714286 0.95266272 0.90909091
 0.84367246 0.87564767 0.93416928 0.86597938]

mean value: 0.909452158542273

key: test_precision
value: [0.81818182 1.         0.78947368 0.9        0.9        0.7826087
 0.79166667 0.7037037  0.875      0.77272727]

mean value: 0.8333361841142162

key: train_precision
value: [0.81862745 0.99324324 0.96428571 0.91111111 0.95833333 0.83333333
 0.72961373 0.78240741 1.         0.77419355]

mean value: 0.8765148875987211

key: test_recall
value: [0.94736842 0.84210526 0.78947368 0.94736842 0.94736842 0.94736842
 1.         1.         0.73684211 0.94444444]

mean value: 0.910233918128655

key: train_recall
value: [0.98235294 0.86470588 0.95294118 0.96470588 0.94705882 1.
 1.         0.99411765 0.87647059 0.98245614]

mean value: 0.9564809081527348

key: test_roc_auc
value: [0.86842105 0.92105263 0.78947368 0.92105263 0.92105263 0.84210526
 0.86842105 0.78947368 0.8128655  0.84064327]

mean value: 0.8574561403508771

key: train_roc_auc
value: [0.88235294 0.92941176 0.95882353 0.93529412 0.95294118 0.9
 0.81470588 0.85882353 0.93823529 0.84711042]

mean value: 0.9017698658410733

key: test_jcc
value: [0.7826087  0.84210526 0.65217391 0.85714286 0.85714286 0.75
 0.79166667 0.7037037  0.66666667 0.73913043]

mean value: 0.7642341057958907

key: train_jcc
value: [0.80676329 0.85964912 0.92045455 0.88172043 0.90960452 0.83333333
 0.72961373 0.77880184 0.87647059 0.76363636]

mean value: 0.8360047765595798

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.17080402 0.1569531  0.15340972 0.15698004 0.15308666 0.15458584
 0.14575839 0.14861679 0.15207219 0.14971685]

mean value: 0.15419836044311525

key: score_time
value: [0.01653957 0.0166285  0.01610899 0.01642632 0.0170877  0.0152936
 0.01553082 0.01582837 0.01660395 0.01644444]

mean value: 0.01624922752380371

key: test_mcc
value: [1.         1.         1.         0.9486833  1.         0.89973541
 0.84327404 0.89473684 0.78362573 0.94736842]

mean value: 0.9317423745756566

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         1.         0.97368421 1.         0.94736842
 0.92105263 0.94736842 0.89189189 0.97297297]

mean value: 0.9654338549075391

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         1.         0.97435897 1.         0.94444444
 0.92307692 0.94736842 0.89473684 0.97297297]

mean value: 0.965695857801121

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95       1.         1.
 0.9        0.94736842 0.89473684 0.94736842]

mean value: 0.9639473684210527

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.89473684
 0.94736842 0.94736842 0.89473684 1.        ]

mean value: 0.968421052631579

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         1.         0.97368421 1.         0.94736842
 0.92105263 0.94736842 0.89181287 0.97368421]

mean value: 0.9654970760233919

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         1.         0.95       1.         0.89473684
 0.85714286 0.9        0.80952381 0.94736842]

mean value: 0.9358771929824561

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.92

Accuracy on Blind test: 0.96

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05442548 0.06178308 0.06514335 0.05190825 0.05278492 0.05515146
 0.04277182 0.05043507 0.06439567 0.04802108]

mean value: 0.054682016372680664

key: score_time
value: [0.03232479 0.03214812 0.02032709 0.02793241 0.03441763 0.01812482
 0.02434039 0.01886606 0.02517366 0.03107882]

mean value: 0.026473379135131835

key: test_mcc
value: [1.         1.         1.         0.9486833  0.9486833  0.89973541
 0.84327404 0.9486833  0.73099415 0.94736842]

mean value: 0.9267421920804961

key: train_mcc
value: [1.         0.99413485 1.         1.         0.98830369 0.99413485
 0.98823529 0.99413485 1.         0.98833809]

mean value: 0.9947281618348918

key: test_accuracy
value: [1.         1.         1.         0.97368421 0.97368421 0.94736842
 0.92105263 0.97368421 0.86486486 0.97297297]

mean value: 0.9627311522048364

key: train_accuracy
value: [1.         0.99705882 1.         1.         0.99411765 0.99705882
 0.99411765 0.99705882 1.         0.9941349 ]

mean value: 0.9973546662066586

key: test_fscore
value: [1.         1.         1.         0.97435897 0.97435897 0.94444444
 0.92307692 0.97435897 0.86486486 0.97297297]

mean value: 0.9628436128436129

key: train_fscore
value: [1.         0.99705015 1.         1.         0.99408284 0.99705015
 0.99411765 0.99705015 1.         0.99411765]

mean value: 0.997346857683221

key: test_precision
value: [1.         1.         1.         0.95       0.95       1.
 0.9        0.95       0.88888889 0.94736842]

mean value: 0.958625730994152

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 0.99411765 1.         1.         1.        ]

mean value: 0.9994117647058823

key: test_recall
value: [1.         1.         1.         1.         1.         0.89473684
 0.94736842 1.         0.84210526 1.        ]

mean value: 0.968421052631579

key: train_recall
value: [1.         0.99411765 1.         1.         0.98823529 0.99411765
 0.99411765 0.99411765 1.         0.98830409]

mean value: 0.9953009975920193

key: test_roc_auc
value: [1.         1.         1.         0.97368421 0.97368421 0.94736842
 0.92105263 0.97368421 0.86549708 0.97368421]

mean value: 0.9628654970760234

key: train_roc_auc
value: [1.         0.99705882 1.         1.         0.99411765 0.99705882
 0.99411765 0.99705882 1.         0.99415205]

mean value: 0.9973563811489509

key: test_jcc
value: [1.         1.         1.         0.95       0.95       0.89473684
 0.85714286 0.95       0.76190476 0.94736842]

mean value: 0.9311152882205513

key: train_jcc
value: [1.         0.99411765 1.         1.         0.98823529 0.99411765
 0.98830409 0.99411765 1.         0.98830409]

mean value: 0.994719642242862

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.08240032 0.13941526 0.11900902 0.09115481 0.0813446  0.11244035
 0.08711672 0.18356252 0.14831948 0.12165046]

mean value: 0.11664135456085205

key: score_time
value: [0.02674294 0.03760934 0.0219245  0.01437283 0.02252603 0.02183795
 0.02192736 0.04869914 0.02245545 0.02846289]

mean value: 0.02665584087371826

key: test_mcc
value: [0.58218174 0.68421053 0.37047929 0.68803296 0.65465367 0.63960215
 0.68803296 0.68803296 0.24189738 0.73821295]

mean value: 0.5975336578127587

key: train_mcc
value: [0.99413485 0.99413485 0.99413485 1.         1.         0.99413485
 0.99413485 0.99413485 0.99415185 0.99415205]

mean value: 0.9953112973615207

key: test_accuracy
value: [0.78947368 0.84210526 0.68421053 0.84210526 0.81578947 0.81578947
 0.84210526 0.84210526 0.62162162 0.86486486]

mean value: 0.7960170697012802

key: train_accuracy
value: [0.99705882 0.99705882 0.99705882 1.         1.         0.99705882
 0.99705882 0.99705882 0.99706745 0.99706745]

mean value: 0.9976487838537175

key: test_fscore
value: [0.77777778 0.84210526 0.66666667 0.83333333 0.8372093  0.8
 0.85       0.83333333 0.65       0.84848485]

mean value: 0.7938910525079436

key: train_fscore
value: [0.99705015 0.99705015 0.99705015 1.         1.         0.99705015
 0.99705015 0.99705015 0.99705015 0.99706745]

mean value: 0.9976418481128729

key: test_precision
value: [0.82352941 0.84210526 0.70588235 0.88235294 0.75       0.875
 0.80952381 0.88235294 0.61904762 0.93333333]

mean value: 0.812312767212148

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.73684211 0.84210526 0.63157895 0.78947368 0.94736842 0.73684211
 0.89473684 0.78947368 0.68421053 0.77777778]

mean value: 0.7830409356725146

key: train_recall
value: [0.99411765 0.99411765 0.99411765 1.         1.         0.99411765
 0.99411765 0.99411765 0.99411765 0.99415205]

mean value: 0.9952975576195391

key: test_roc_auc
value: [0.78947368 0.84210526 0.68421053 0.84210526 0.81578947 0.81578947
 0.84210526 0.84210526 0.61988304 0.8625731 ]

mean value: 0.7956140350877193

key: train_roc_auc
value: [0.99705882 0.99705882 0.99705882 1.         1.         0.99705882
 0.99705882 0.99705882 0.99705882 0.99707602]

mean value: 0.9976487788097695

key: test_jcc
value: [0.63636364 0.72727273 0.5        0.71428571 0.72       0.66666667
 0.73913043 0.71428571 0.48148148 0.73684211]

mean value: 0.6636328480401706

key: train_jcc
value: [0.99411765 0.99411765 0.99411765 1.         1.         0.99411765
 0.99411765 0.99411765 0.99411765 0.99415205]

mean value: 0.9952975576195391

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.56569266 0.5649147  0.54991865 0.56822872 0.55563831 0.56164074
 0.56107974 0.54614925 0.56099558 0.56145477]

mean value: 0.5595713138580323

key: score_time
value: [0.0103898  0.00958323 0.00938058 0.00951958 0.00956798 0.01010776
 0.00926852 0.00950861 0.01079369 0.01012635]

mean value: 0.009824609756469727

key: test_mcc
value: [1.         1.         0.9486833  0.9486833  0.9486833  0.89973541
 0.89973541 0.9486833  0.78362573 0.94736842]

mean value: 0.9325198165933714

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         0.97368421 0.97368421 0.97368421 0.94736842
 0.94736842 0.97368421 0.89189189 0.97297297]

mean value: 0.9654338549075391

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         0.97435897 0.97435897 0.97435897 0.94444444
 0.95       0.97435897 0.89473684 0.97297297]

mean value: 0.9659590156958578

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.95       0.95       0.95       1.
 0.9047619  0.95       0.89473684 0.94736842]

mean value: 0.9546867167919799

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.89473684
 1.         1.         0.89473684 1.        ]

mean value: 0.9789473684210527

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         0.97368421 0.97368421 0.97368421 0.94736842
 0.94736842 0.97368421 0.89181287 0.97368421]

mean value: 0.9654970760233919

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         0.95       0.95       0.95       0.89473684
 0.9047619  0.95       0.80952381 0.94736842]

mean value: 0.9356390977443609

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02689314 0.02751875 0.02677655 0.0268662  0.02773142 0.03597403
 0.02919483 0.02741051 0.02737808 0.02891994]

mean value: 0.028466343879699707

key: score_time
value: [0.01260877 0.01430941 0.01301813 0.01564336 0.01542616 0.01571608
 0.01542115 0.0155189  0.01564002 0.01539946]

mean value: 0.014870142936706543

key: test_mcc
value: [0.63245553 0.21821789 0.26315789 0.59222009 0.74620251 0.31622777
 0.26462806 0.42640143 0.19504453 0.83871328]

mean value: 0.4493268986749522

key: train_mcc
value: [0.99413485 0.92077472 0.77216846 0.976741   0.98250594 0.91533482
 0.9707394  0.93725826 0.89204798 0.98826969]

mean value: 0.9349975124716327

key: test_accuracy
value: [0.81578947 0.60526316 0.63157895 0.78947368 0.86842105 0.65789474
 0.63157895 0.71052632 0.59459459 0.91891892]

mean value: 0.7224039829302987

key: train_accuracy
value: [0.99705882 0.95882353 0.87352941 0.98823529 0.99117647 0.95588235
 0.98529412 0.96764706 0.94428152 0.9941349 ]

mean value: 0.9656063481110919

key: test_fscore
value: [0.82051282 0.65116279 0.63157895 0.76470588 0.87804878 0.64864865
 0.61111111 0.68571429 0.66666667 0.91428571]

mean value: 0.7272435647846088

key: train_fscore
value: [0.99705015 0.96045198 0.85521886 0.98809524 0.99109792 0.95384615
 0.9851632  0.96656535 0.94647887 0.99415205]

mean value: 0.9638119769217577

key: test_precision
value: [0.8        0.58333333 0.63157895 0.86666667 0.81818182 0.66666667
 0.64705882 0.75       0.57692308 0.94117647]

mean value: 0.728158580325763

key: train_precision
value: [1.         0.92391304 1.         1.         1.         1.
 0.99401198 1.         0.90810811 0.99415205]

mean value: 0.9820185174417899

key: test_recall
value: [0.84210526 0.73684211 0.63157895 0.68421053 0.94736842 0.63157895
 0.57894737 0.63157895 0.78947368 0.88888889]

mean value: 0.7362573099415204

key: train_recall
value: [0.99411765 1.         0.74705882 0.97647059 0.98235294 0.91176471
 0.97647059 0.93529412 0.98823529 0.99415205]

mean value: 0.9505916752665978

key: test_roc_auc
value: [0.81578947 0.60526316 0.63157895 0.78947368 0.86842105 0.65789474
 0.63157895 0.71052632 0.58918129 0.91812865]

mean value: 0.7217836257309942

key: train_roc_auc
value: [0.99705882 0.95882353 0.87352941 0.98823529 0.99117647 0.95588235
 0.98529412 0.96764706 0.94441004 0.99413485]

mean value: 0.9656191950464397

key: test_jcc
value: [0.69565217 0.48275862 0.46153846 0.61904762 0.7826087  0.48
 0.44       0.52173913 0.5        0.84210526]

mean value: 0.5825449964433631

key: train_jcc
value: [0.99411765 0.92391304 0.74705882 0.97647059 0.98235294 0.91176471
 0.97076023 0.93529412 0.89839572 0.98837209]

mean value: 0.9328499915874191

MCC on Blind test: 0.43

Accuracy on Blind test: 0.71

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0276432  0.05927157 0.03859544 0.03848791 0.03850269 0.05718088
 0.03406334 0.01551056 0.01546884 0.01610494]

mean value: 0.03408293724060059

key: score_time
value: [0.02301216 0.02319098 0.02459955 0.0241189  0.0216651  0.02239084
 0.0124681  0.01247931 0.0128572  0.01285744]

mean value: 0.018963956832885744

key: test_mcc
value: [0.84327404 0.9486833  0.73786479 0.84327404 0.84327404 0.48454371
 0.89473684 0.79388419 0.56725146 0.78764146]

mean value: 0.7744427877554028

key: train_mcc
value: [0.87064849 0.88241401 0.89417953 0.86484056 0.88825066 0.89417953
 0.87058824 0.88235294 0.89442724 0.88269694]

mean value: 0.8824578138125448

key: test_accuracy
value: [0.92105263 0.97368421 0.86842105 0.92105263 0.92105263 0.73684211
 0.94736842 0.89473684 0.78378378 0.89189189]

mean value: 0.8859886201991465

key: train_accuracy
value: [0.93529412 0.94117647 0.94705882 0.93235294 0.94411765 0.94705882
 0.93529412 0.94117647 0.94721408 0.94134897]

mean value: 0.9412092461618078

key: test_fscore
value: [0.91891892 0.97435897 0.87179487 0.92307692 0.92307692 0.70588235
 0.94736842 0.9        0.78947368 0.88235294]

mean value: 0.8836304010607416

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93567251 0.9408284  0.94674556 0.93294461 0.9439528  0.94736842
 0.93529412 0.94117647 0.94705882 0.94152047]

mean value: 0.9412562188544396

key: test_precision
value: [0.94444444 0.95       0.85       0.9        0.9        0.8
 0.94736842 0.85714286 0.78947368 0.9375    ]

mean value: 0.8875929406850459

key: train_precision
value: [0.93023256 0.94642857 0.95238095 0.92485549 0.94674556 0.94186047
 0.93529412 0.94117647 0.94705882 0.94152047]

mean value: 0.9407553480125959

key: test_recall
value: [0.89473684 1.         0.89473684 0.94736842 0.94736842 0.63157895
 0.94736842 0.94736842 0.78947368 0.83333333]

mean value: 0.8833333333333333

key: train_recall
value: [0.94117647 0.93529412 0.94117647 0.94117647 0.94117647 0.95294118
 0.93529412 0.94117647 0.94705882 0.94152047]

mean value: 0.9417991056071551

key: test_roc_auc
value: [0.92105263 0.97368421 0.86842105 0.92105263 0.92105263 0.73684211
 0.94736842 0.89473684 0.78362573 0.89035088]

mean value: 0.8858187134502924

key: train_roc_auc
value: [0.93529412 0.94117647 0.94705882 0.93235294 0.94411765 0.94705882
 0.93529412 0.94117647 0.94721362 0.94134847]

mean value: 0.9412091503267974

key: test_jcc
value: [0.85       0.95       0.77272727 0.85714286 0.85714286 0.54545455
 0.9        0.81818182 0.65217391 0.78947368]

mean value: 0.7992296947903355

key: train_jcc
value: [0.87912088 0.88826816 0.8988764  0.87431694 0.89385475 0.9
 0.87845304 0.88888889 0.89944134 0.88950276]

mean value: 0.8890723159309888

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.27244306 0.33298659 0.28044224 0.39420319 0.54344702 0.35148716
 0.35520601 0.30538011 0.34263945 0.27838564]

mean value: 0.3456620454788208

key: score_time
value: [0.02331948 0.01721334 0.02358603 0.02463841 0.02457571 0.02402425
 0.02256465 0.02319074 0.02490997 0.02106881]

mean value: 0.022909140586853026

key: test_mcc
value: [0.84327404 0.9486833  0.78947368 0.79388419 0.84327404 0.48454371
 0.89473684 0.79388419 0.56725146 0.78764146]

mean value: 0.7746646917717809

key: train_mcc
value: [0.87064849 0.88241401 0.95300713 0.8058963  0.88825066 0.89417953
 0.87058824 0.88235294 0.89442724 0.88269694]

mean value: 0.8824461478103343

key: test_accuracy
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.92105263 0.73684211
 0.94736842 0.89473684 0.78378378 0.89189189]

mean value: 0.8859886201991465

key: train_accuracy
value: [0.93529412 0.94117647 0.97647059 0.90294118 0.94411765 0.94705882
 0.93529412 0.94117647 0.94721408 0.94134897]

mean value: 0.9412092461618078

key: test_fscore
value: [0.91891892 0.97435897 0.89473684 0.9        0.92307692 0.70588235
 0.94736842 0.9        0.78947368 0.88235294]

mean value: 0.8836169057840885

key: train_fscore
value: [0.93567251 0.9408284  0.97633136 0.90265487 0.9439528  0.94736842
 0.93529412 0.94117647 0.94705882 0.94152047]

mean value: 0.9411858248203606

key: test_precision
value: [0.94444444 0.95       0.89473684 0.85714286 0.9        0.8
 0.94736842 0.85714286 0.78947368 0.9375    ]

mean value: 0.887780910609858

key: train_precision
value: [0.93023256 0.94642857 0.98214286 0.90532544 0.94674556 0.94186047
 0.93529412 0.94117647 0.94705882 0.94152047]

mean value: 0.9417785337345366

key: test_recall
value: [0.89473684 1.         0.89473684 0.94736842 0.94736842 0.63157895
 0.94736842 0.94736842 0.78947368 0.83333333]

mean value: 0.8833333333333333

key: train_recall
value: [0.94117647 0.93529412 0.97058824 0.9        0.94117647 0.95294118
 0.93529412 0.94117647 0.94705882 0.94152047]

mean value: 0.9406226350189199

key: test_roc_auc
value: [0.92105263 0.97368421 0.89473684 0.89473684 0.92105263 0.73684211
 0.94736842 0.89473684 0.78362573 0.89035088]

mean value: 0.8858187134502924

key: train_roc_auc
value: [0.93529412 0.94117647 0.97647059 0.90294118 0.94411765 0.94705882
 0.93529412 0.94117647 0.94721362 0.94134847]

mean value: 0.9412091503267974

key: test_jcc
value: [0.85       0.95       0.80952381 0.81818182 0.85714286 0.54545455
 0.9        0.81818182 0.65217391 0.78947368]

mean value: 0.7990132445738853

key: train_jcc
value: [0.87912088 0.88826816 0.95375723 0.82258065 0.89385475 0.9
 0.87845304 0.88888889 0.89944134 0.88950276]

mean value: 0.8893867685519612

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03672051 0.03351378 0.03409219 0.04177117 0.0275321  0.03296661
 0.07960463 0.03673577 0.03128195 0.07671475]

mean value: 0.04309334754943848

key: score_time
value: [0.01211333 0.01203656 0.01435566 0.01213312 0.01190567 0.01184034
 0.0147748  0.02327275 0.01196027 0.01549888]

mean value: 0.013989138603210449

key: test_mcc
value: [0.94736842 0.83918129 0.63129316 0.78362573 0.78362573 0.73099415
 0.80369958 0.83918129 0.72333935 0.83462233]

mean value: 0.7916931020080674

key: train_mcc
value: [0.85498357 0.87915298 0.89743309 0.86706827 0.88521358 0.87312888
 0.86104418 0.86706827 0.85548378 0.87350983]

mean value: 0.8714086429236412

key: test_accuracy
value: [0.97297297 0.91891892 0.81081081 0.89189189 0.89189189 0.86486486
 0.89189189 0.91891892 0.86111111 0.91666667]

mean value: 0.893993993993994

key: train_accuracy
value: [0.92749245 0.93957704 0.94864048 0.93353474 0.94259819 0.93655589
 0.9305136  0.93353474 0.92771084 0.93674699]

mean value: 0.9356904961234667

key: test_fscore
value: [0.97297297 0.91891892 0.82051282 0.88888889 0.89473684 0.86486486
 0.88235294 0.91891892 0.85714286 0.91891892]

mean value: 0.8938228944420895

key: train_fscore
value: [0.92771084 0.93975904 0.94832827 0.93373494 0.94259819 0.93655589
 0.9305136  0.93333333 0.92814371 0.93655589]

mean value: 0.9357233697617179

key: test_precision
value: [0.94736842 0.89473684 0.76190476 0.88888889 0.89473684 0.88888889
 1.         0.94444444 0.88235294 0.89473684]

mean value: 0.8998058872671876

key: train_precision
value: [0.92771084 0.93975904 0.95705521 0.93373494 0.93975904 0.93373494
 0.92771084 0.93333333 0.92261905 0.93939394]

mean value: 0.9354811173624463

key: test_recall
value: [1.         0.94444444 0.88888889 0.88888889 0.89473684 0.84210526
 0.78947368 0.89473684 0.83333333 0.94444444]

mean value: 0.8921052631578947

key: train_recall
value: [0.92771084 0.93975904 0.93975904 0.93373494 0.94545455 0.93939394
 0.93333333 0.93333333 0.93373494 0.93373494]

mean value: 0.9359948886454911

key: test_roc_auc
value: [0.97368421 0.91959064 0.8128655  0.89181287 0.89181287 0.86549708
 0.89473684 0.91959064 0.86111111 0.91666667]

mean value: 0.8947368421052632

key: train_roc_auc
value: [0.92749179 0.93957649 0.9486674  0.93353414 0.94260679 0.93656444
 0.93052209 0.93353414 0.92771084 0.93674699]

mean value: 0.9356955093099671

key: test_jcc
value: [0.94736842 0.85       0.69565217 0.8        0.80952381 0.76190476
 0.78947368 0.85       0.75       0.85      ]

mean value: 0.8103922850604772

key: train_jcc
value: [0.86516854 0.88636364 0.9017341  0.87570621 0.89142857 0.88068182
 0.8700565  0.875      0.86592179 0.88068182]

mean value: 0.8792742987101834

MCC on Blind test: 0.83

Accuracy on Blind test: 0.91

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.78824234 0.96631408 0.8446548  1.35734797 1.61382103 1.39275098
 1.2790935  1.1367147  1.12887931 0.96179318]

mean value: 1.1469611883163453

key: score_time
value: [0.01487708 0.01531816 0.01532412 0.01569772 0.0157392  0.01540256
 0.01546884 0.01227903 0.02109766 0.01225138]

mean value: 0.015345573425292969

key: test_mcc
value: [0.94736842 0.62280702 0.57184997 0.78362573 0.83871328 0.78362573
 0.94736842 0.89181287 0.83462233 0.83462233]

mean value: 0.8056416089443539

key: train_mcc
value: [0.90339187 1.         0.9939759  0.90332238 0.98203333 0.90339187
 1.         0.98189054 1.         0.99399394]

mean value: 0.9661999838560218

key: test_accuracy
value: [0.97297297 0.81081081 0.78378378 0.89189189 0.91891892 0.89189189
 0.97297297 0.94594595 0.91666667 0.91666667]

mean value: 0.9022522522522523

key: train_accuracy
value: [0.95166163 1.         0.99697885 0.95166163 0.99093656 0.95166163
 1.         0.99093656 1.         0.99698795]

mean value: 0.9830824809813271

key: test_fscore
value: [0.97297297 0.81081081 0.78947368 0.88888889 0.92307692 0.89473684
 0.97297297 0.94736842 0.91428571 0.91891892]

mean value: 0.9033506149295623

key: train_fscore
value: [0.95151515 1.         0.99697885 0.95180723 0.99082569 0.95180723
 1.         0.99088146 1.         0.99697885]

mean value: 0.9830794460313929

key: test_precision
value: [0.94736842 0.78947368 0.75       0.88888889 0.9        0.89473684
 1.         0.94736842 0.94117647 0.89473684]

mean value: 0.895374957000344

key: train_precision
value: [0.95731707 1.         1.         0.95180723 1.         0.94610778
 1.         0.99390244 1.         1.        ]

mean value: 0.9849134525541923

key: test_recall
value: [1.         0.83333333 0.83333333 0.88888889 0.94736842 0.89473684
 0.94736842 0.94736842 0.88888889 0.94444444]

mean value: 0.9125730994152047

key: train_recall
value: [0.94578313 1.         0.9939759  0.95180723 0.98181818 0.95757576
 1.         0.98787879 1.         0.9939759 ]

mean value: 0.9812814895947426

key: test_roc_auc
value: [0.97368421 0.81140351 0.78508772 0.89181287 0.91812865 0.89181287
 0.97368421 0.94590643 0.91666667 0.91666667]

mean value: 0.902485380116959

key: train_roc_auc
value: [0.95167945 1.         0.99698795 0.95166119 0.99090909 0.95167945
 1.         0.99092735 1.         0.99698795]

mean value: 0.9830832420591457

key: test_jcc
value: [0.94736842 0.68181818 0.65217391 0.8        0.85714286 0.80952381
 0.94736842 0.9        0.84210526 0.85      ]

mean value: 0.8287500866791484

key: train_jcc
value: [0.90751445 1.         0.9939759  0.90804598 0.98181818 0.90804598
 1.         0.98192771 1.         0.9939759 ]

mean value: 0.9675304104780511

MCC on Blind test: 0.69

Accuracy on Blind test: 0.84

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01437283 0.01098561 0.0105083  0.01576018 0.01028442 0.01016164
 0.01007247 0.01817203 0.01193213 0.01369476]

mean value: 0.01259443759918213

key: score_time
value: [0.01252317 0.00955033 0.01525402 0.01239777 0.0093441  0.00946498
 0.00936222 0.01498938 0.01097512 0.01218009]

mean value: 0.011604118347167968

key: test_mcc
value: [0.73099415 0.40469382 0.73099415 0.51319869 0.48981224 0.62280702
 0.60308132 0.45906433 0.61977979 0.79772404]

mean value: 0.5972149539148133

key: train_mcc
value: [0.64442374 0.62134114 0.66921665 0.68999143 0.67585241 0.64541184
 0.67034019 0.65820219 0.70948192 0.69643271]

mean value: 0.6680694215597842

key: test_accuracy
value: [0.86486486 0.7027027  0.86486486 0.75675676 0.72972973 0.81081081
 0.78378378 0.72972973 0.80555556 0.88888889]

mean value: 0.7937687687687688

key: train_accuracy
value: [0.81873112 0.80966767 0.83383686 0.8429003  0.83081571 0.82175227
 0.83383686 0.82779456 0.85240964 0.84638554]

mean value: 0.8318130528154916

key: test_fscore
value: [0.86486486 0.68571429 0.86486486 0.74285714 0.6875     0.81081081
 0.75       0.73684211 0.78787879 0.875     ]

mean value: 0.7806332862253915

key: train_fscore
value: [0.80519481 0.80250784 0.82866044 0.8343949  0.81081081 0.81388013
 0.82539683 0.81904762 0.84345048 0.83809524]

mean value: 0.8221439081547757

key: test_precision
value: [0.84210526 0.70588235 0.84210526 0.76470588 0.84615385 0.83333333
 0.92307692 0.73684211 0.86666667 1.        ]

mean value: 0.8360871636103834

key: train_precision
value: [0.87323944 0.83660131 0.85806452 0.88513514 0.91603053 0.84868421
 0.86666667 0.86       0.89795918 0.88590604]

mean value: 0.8728287030559482

key: test_recall
value: [0.88888889 0.66666667 0.88888889 0.72222222 0.57894737 0.78947368
 0.63157895 0.73684211 0.72222222 0.77777778]

mean value: 0.7403508771929824

key: train_recall
value: [0.74698795 0.77108434 0.80120482 0.78915663 0.72727273 0.78181818
 0.78787879 0.78181818 0.79518072 0.79518072]

mean value: 0.7777583059510771

key: test_roc_auc
value: [0.86549708 0.70175439 0.86549708 0.75584795 0.73391813 0.81140351
 0.7880117  0.72953216 0.80555556 0.88888889]

mean value: 0.7945906432748537

key: train_roc_auc
value: [0.81894852 0.80978459 0.83393574 0.84306316 0.83050383 0.82163198
 0.83369843 0.82765608 0.85240964 0.84638554]

mean value: 0.831801752464403

key: test_jcc
value: [0.76190476 0.52173913 0.76190476 0.59090909 0.52380952 0.68181818
 0.6        0.58333333 0.65       0.77777778]

mean value: 0.6453196561892214

key: train_jcc
value: [0.67391304 0.67015707 0.70744681 0.71584699 0.68181818 0.68617021
 0.7027027  0.69354839 0.72928177 0.72131148]

mean value: 0.6982196642336499

MCC on Blind test: 0.67

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01818132 0.01185274 0.01171231 0.0114913  0.01783705 0.01429081
 0.0138998  0.013304   0.01724982 0.01014686]

mean value: 0.013996601104736328

key: score_time
value: [0.01515007 0.00999379 0.01022625 0.01021886 0.01386929 0.01219106
 0.01203775 0.01533842 0.01011515 0.0094707 ]

mean value: 0.011861133575439452

key: test_mcc
value: [0.74044197 0.62280702 0.4633451  0.57184997 0.56934383 0.62170355
 0.7888597  0.6754386  0.4472136  0.78262379]

mean value: 0.628362712085414

key: train_mcc
value: [0.73459045 0.71601738 0.79022336 0.74626648 0.74713145 0.68655466
 0.70405667 0.72205184 0.77108434 0.76674551]

mean value: 0.7384722135344838

key: test_accuracy
value: [0.86486486 0.81081081 0.72972973 0.78378378 0.78378378 0.81081081
 0.89189189 0.83783784 0.72222222 0.88888889]

mean value: 0.8124624624624625

key: train_accuracy
value: [0.86706949 0.85800604 0.89425982 0.87311178 0.87311178 0.8429003
 0.85196375 0.86102719 0.88554217 0.88253012]

mean value: 0.8689522440214028

key: test_fscore
value: [0.87179487 0.81081081 0.73684211 0.78947368 0.8        0.82051282
 0.88888889 0.84210526 0.70588235 0.88235294]

mean value: 0.8148663738756617

key: train_fscore
value: [0.86982249 0.85885886 0.89795918 0.8742515  0.87573964 0.83850932
 0.85285285 0.86060606 0.88554217 0.88629738]

mean value: 0.8700439444712924

key: test_precision
value: [0.80952381 0.78947368 0.7        0.75       0.76190476 0.8
 0.94117647 0.84210526 0.75       0.9375    ]

mean value: 0.8081683989385228

key: train_precision
value: [0.85465116 0.85628743 0.8700565  0.86904762 0.85549133 0.85987261
 0.8452381  0.86060606 0.88554217 0.85875706]

mean value: 0.8615550031773642

key: test_recall
value: [0.94444444 0.83333333 0.77777778 0.83333333 0.84210526 0.84210526
 0.84210526 0.84210526 0.66666667 0.83333333]

mean value: 0.8257309941520468

key: train_recall
value: [0.88554217 0.86144578 0.92771084 0.87951807 0.8969697  0.81818182
 0.86060606 0.86060606 0.88554217 0.91566265]

mean value: 0.8791785323110625

key: test_roc_auc
value: [0.86695906 0.81140351 0.73099415 0.78508772 0.78216374 0.80994152
 0.89327485 0.8377193  0.72222222 0.88888889]

mean value: 0.8128654970760234

key: train_roc_auc
value: [0.86701351 0.85799562 0.89415845 0.87309237 0.87318364 0.84282585
 0.85198978 0.86102592 0.88554217 0.88253012]

mean value: 0.8689357429718876

key: test_jcc
value: [0.77272727 0.68181818 0.58333333 0.65217391 0.66666667 0.69565217
 0.8        0.72727273 0.54545455 0.78947368]

mean value: 0.6914572498439775

key: train_jcc
value: [0.76963351 0.75263158 0.81481481 0.77659574 0.77894737 0.72192513
 0.7434555  0.75531915 0.79459459 0.79581152]

mean value: 0.77037289076449

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00932622 0.01532483 0.01057291 0.01194143 0.01086783 0.01041532
 0.01080632 0.01105237 0.01040554 0.01048732]

mean value: 0.011120009422302245

key: score_time
value: [0.01911092 0.01833034 0.021945   0.01794815 0.01830196 0.01822591
 0.01857352 0.01737976 0.01754856 0.01728535]

mean value: 0.018464946746826173

key: test_mcc
value: [0.46019501 0.63129316 0.07739329 0.51461988 0.35558302 0.51461988
 0.57184997 0.25301653 0.3354102  0.50709255]

mean value: 0.42210735008522005

key: train_mcc
value: [0.69212796 0.66871448 0.71631061 0.65571257 0.67976195 0.69792238
 0.64957827 0.65571257 0.67513995 0.66308388]

mean value: 0.6754064609832358

key: test_accuracy
value: [0.72972973 0.81081081 0.54054054 0.75675676 0.67567568 0.75675676
 0.78378378 0.62162162 0.66666667 0.75      ]

mean value: 0.7092342342342343

key: train_accuracy
value: [0.84592145 0.83383686 0.85800604 0.82779456 0.83987915 0.8489426
 0.82477341 0.82779456 0.8373494  0.8313253 ]

mean value: 0.8375623339278564

key: test_fscore
value: [0.70588235 0.82051282 0.48484848 0.75675676 0.71428571 0.75675676
 0.77777778 0.58823529 0.64705882 0.76923077]

mean value: 0.7021345550757315

key: train_fscore
value: [0.84866469 0.82972136 0.86053412 0.82674772 0.83890578 0.84756098
 0.82317073 0.82882883 0.84023669 0.83431953]

mean value: 0.8378690419889865

key: test_precision
value: [0.75       0.76190476 0.53333333 0.73684211 0.65217391 0.77777778
 0.82352941 0.66666667 0.6875     0.71428571]

mean value: 0.7104013684039596

key: train_precision
value: [0.83625731 0.85350318 0.84795322 0.83435583 0.84146341 0.85276074
 0.82822086 0.82142857 0.8255814  0.81976744]

mean value: 0.8361291957614069

key: test_recall
value: [0.66666667 0.88888889 0.44444444 0.77777778 0.78947368 0.73684211
 0.73684211 0.52631579 0.61111111 0.83333333]

mean value: 0.7011695906432749

key: train_recall
value: [0.86144578 0.80722892 0.87349398 0.81927711 0.83636364 0.84242424
 0.81818182 0.83636364 0.85542169 0.84939759]

mean value: 0.8399598393574297

key: test_roc_auc
value: [0.72807018 0.8128655  0.5380117  0.75730994 0.67251462 0.75730994
 0.78508772 0.62426901 0.66666667 0.75      ]

mean value: 0.7092105263157895

key: train_roc_auc
value: [0.84587441 0.83391749 0.85795911 0.82782037 0.83986857 0.84892296
 0.82475356 0.82782037 0.8373494  0.8313253 ]

mean value: 0.837561153705732

key: test_jcc
value: [0.54545455 0.69565217 0.32       0.60869565 0.55555556 0.60869565
 0.63636364 0.41666667 0.47826087 0.625     ]

mean value: 0.5490344751866492

key: train_jcc
value: [0.7371134  0.70899471 0.75520833 0.70466321 0.72251309 0.73544974
 0.69948187 0.70769231 0.7244898  0.71573604]

mean value: 0.7211342490784889

MCC on Blind test: 0.5

Accuracy on Blind test: 0.75

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01921725 0.01870728 0.01728058 0.01826978 0.01832366 0.01756644
 0.01772666 0.02263999 0.01768255 0.01821613]

mean value: 0.018563032150268555

key: score_time
value: [0.01221204 0.01207209 0.01167274 0.01201773 0.01157498 0.01153445
 0.01147413 0.01173925 0.01177478 0.01164174]

mean value: 0.011771392822265626

key: test_mcc
value: [0.94736842 0.89181287 0.63129316 0.73099415 0.78362573 0.73099415
 0.7888597  0.89181287 0.50709255 0.88888889]

mean value: 0.7792742482712791

key: train_mcc
value: [0.81269853 0.80669661 0.84296615 0.81269853 0.80062066 0.81283091
 0.79462558 0.80074488 0.82543601 0.78313253]

mean value: 0.8092450407103096

key: test_accuracy
value: [0.97297297 0.94594595 0.81081081 0.86486486 0.89189189 0.86486486
 0.89189189 0.94594595 0.75       0.94444444]

mean value: 0.8883633633633634

key: train_accuracy
value: [0.90634441 0.90332326 0.92145015 0.90634441 0.90030211 0.90634441
 0.89728097 0.90030211 0.9126506  0.89156627]

mean value: 0.9045908710370182

key: test_fscore
value: [0.97297297 0.94444444 0.82051282 0.86486486 0.89473684 0.86486486
 0.88888889 0.94736842 0.72727273 0.94444444]

mean value: 0.8870371291423923

key: train_fscore
value: [0.90690691 0.90419162 0.92121212 0.90690691 0.90030211 0.90690691
 0.89759036 0.9009009  0.91343284 0.89156627]

mean value: 0.9049916936730755

key: test_precision
value: [0.94736842 0.94444444 0.76190476 0.84210526 0.89473684 0.88888889
 0.94117647 0.94736842 0.8        0.94444444]

mean value: 0.8912437957639195

key: train_precision
value: [0.90419162 0.89880952 0.92682927 0.90419162 0.89759036 0.89880952
 0.89221557 0.89285714 0.90532544 0.89156627]

mean value: 0.901238633145709

key: test_recall
value: [1.         0.94444444 0.88888889 0.88888889 0.89473684 0.84210526
 0.84210526 0.94736842 0.66666667 0.94444444]

mean value: 0.8859649122807017

key: train_recall
value: [0.90963855 0.90963855 0.91566265 0.90963855 0.9030303  0.91515152
 0.9030303  0.90909091 0.92168675 0.89156627]

mean value: 0.9088134355604235

key: test_roc_auc
value: [0.97368421 0.94590643 0.8128655  0.86549708 0.89181287 0.86549708
 0.89327485 0.94590643 0.75       0.94444444]

mean value: 0.8888888888888888

key: train_roc_auc
value: [0.90633443 0.90330413 0.92146769 0.90633443 0.90031033 0.90637094
 0.89729828 0.90032859 0.9126506  0.89156627]

mean value: 0.904596568090544

key: test_jcc
value: [0.94736842 0.89473684 0.69565217 0.76190476 0.80952381 0.76190476
 0.8        0.9        0.57142857 0.89473684]

mean value: 0.8037256183938106

key: train_jcc
value: [0.82967033 0.82513661 0.85393258 0.82967033 0.81868132 0.82967033
 0.81420765 0.81967213 0.84065934 0.80434783]

mean value: 0.8265648452150891

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.30559778 1.49760962 1.48370218 1.27121735 1.36745763 1.29357028
 1.46157598 1.41085505 1.2677567  1.47282529]

mean value: 1.3832167863845826

key: score_time
value: [0.01324964 0.01464891 0.01291299 0.01481247 0.01527667 0.01298475
 0.01542163 0.01258349 0.01508832 0.01303482]

mean value: 0.01400136947631836

key: test_mcc
value: [0.89736456 0.73099415 0.56725146 0.83918129 0.83871328 0.7888597
 0.84959079 0.83918129 0.72333935 0.9459053 ]

mean value: 0.8020381173135782

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94594595 0.86486486 0.78378378 0.91891892 0.91891892 0.89189189
 0.91891892 0.91891892 0.86111111 0.97222222]

mean value: 0.8995495495495496

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.86486486 0.77777778 0.91891892 0.92307692 0.88888889
 0.91428571 0.91891892 0.85714286 0.97297297]

mean value: 0.8984216257900468

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.84210526 0.77777778 0.89473684 0.9        0.94117647
 1.         0.94444444 0.88235294 0.94736842]

mean value: 0.9029962160302718

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.88888889 0.77777778 0.94444444 0.94736842 0.84210526
 0.84210526 0.89473684 0.83333333 1.        ]

mean value: 0.8970760233918128

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.86549708 0.78362573 0.91959064 0.91812865 0.89327485
 0.92105263 0.91959064 0.86111111 0.97222222]

mean value: 0.9001461988304094

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.76190476 0.63636364 0.85       0.85714286 0.8
 0.84210526 0.85       0.75       0.94736842]

mean value: 0.8194884939621782

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02773237 0.02290416 0.01795602 0.01657343 0.01584625 0.01707602
 0.0161159  0.01845789 0.01577401 0.01874471]

mean value: 0.0187180757522583

key: score_time
value: [0.01235223 0.01075268 0.0102129  0.00925064 0.00990987 0.00948524
 0.00892353 0.00983405 0.00992322 0.00996733]

mean value: 0.010061168670654297

key: test_mcc
value: [0.78362573 0.7888597  0.84834956 0.94736842 0.74044197 0.83918129
 0.94736842 0.89736456 0.72333935 1.        ]

mean value: 0.8515898998369181

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89189189 0.89189189 0.91891892 0.97297297 0.86486486 0.91891892
 0.97297297 0.94594595 0.86111111 1.        ]

mean value: 0.923948948948949

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.89473684 0.90909091 0.97297297 0.85714286 0.91891892
 0.97297297 0.94444444 0.86486486 1.        ]

mean value: 0.9224033671402092

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.85       1.         0.94736842 0.9375     0.94444444
 1.         1.         0.84210526 1.        ]

mean value: 0.941030701754386

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.88888889 0.94444444 0.83333333 1.         0.78947368 0.89473684
 0.94736842 0.89473684 0.88888889 1.        ]

mean value: 0.908187134502924

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.89181287 0.89327485 0.91666667 0.97368421 0.86695906 0.91959064
 0.97368421 0.94736842 0.86111111 1.        ]

mean value: 0.9244152046783626

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.80952381 0.83333333 0.94736842 0.75       0.85
 0.94736842 0.89473684 0.76190476 1.        ]

mean value: 0.8594235588972431

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.11874175 0.11397886 0.12404823 0.11526704 0.1260879  0.13378692
 0.11424398 0.10551548 0.10430574 0.10480857]

mean value: 0.11607844829559326

key: score_time
value: [0.01930761 0.01774311 0.02675867 0.02725172 0.01928139 0.02496815
 0.01777816 0.01777339 0.01758528 0.01737881]

mean value: 0.02058262825012207

key: test_mcc
value: [0.89736456 0.78362573 0.51461988 0.7888597  0.83871328 0.6754386
 0.7888597  0.75938069 0.61977979 0.83462233]

mean value: 0.7501264258080947

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94594595 0.89189189 0.75675676 0.89189189 0.91891892 0.83783784
 0.89189189 0.86486486 0.80555556 0.91666667]

mean value: 0.8722222222222222

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.88888889 0.75675676 0.89473684 0.92307692 0.84210526
 0.88888889 0.84848485 0.78787879 0.91891892]

mean value: 0.8697104539209802

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.88888889 0.73684211 0.85       0.9        0.84210526
 0.94117647 1.         0.86666667 0.89473684]

mean value: 0.8820416236670107

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.88888889 0.77777778 0.94444444 0.94736842 0.84210526
 0.84210526 0.73684211 0.72222222 0.94444444]

mean value: 0.8646198830409356

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.89181287 0.75730994 0.89327485 0.91812865 0.8377193
 0.89327485 0.86842105 0.80555556 0.91666667]

mean value: 0.872953216374269

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.8        0.60869565 0.80952381 0.85714286 0.72727273
 0.8        0.73684211 0.65       0.85      ]

mean value: 0.7739477151376465

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00972319 0.00971651 0.00972152 0.00972605 0.00971293 0.00968099
 0.00972724 0.00964713 0.00986457 0.00967932]

mean value: 0.009719944000244141

key: score_time
value: [0.00870895 0.00870967 0.00869751 0.00882626 0.00884151 0.00898957
 0.0088141  0.00873256 0.00869012 0.00863934]

mean value: 0.008764958381652832

key: test_mcc
value: [0.56725146 0.62280702 0.18768409 0.74044197 0.57184997 0.30384671
 0.83918129 0.56725146 0.52048344 0.63614643]

mean value: 0.555694382971283

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78378378 0.81081081 0.59459459 0.86486486 0.78378378 0.64864865
 0.91891892 0.78378378 0.75       0.80555556]

mean value: 0.7744744744744745

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.77777778 0.81081081 0.57142857 0.87179487 0.77777778 0.62857143
 0.91891892 0.78947368 0.70967742 0.77419355]

mean value: 0.7630424809032619

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.78947368 0.58823529 0.80952381 0.82352941 0.6875
 0.94444444 0.78947368 0.84615385 0.92307692]

mean value: 0.7979188875280206

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.77777778 0.83333333 0.55555556 0.94444444 0.73684211 0.57894737
 0.89473684 0.78947368 0.61111111 0.66666667]

mean value: 0.7388888888888889

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78362573 0.81140351 0.59356725 0.86695906 0.78508772 0.6505848
 0.91959064 0.78362573 0.75       0.80555556]

mean value: 0.775

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63636364 0.68181818 0.4        0.77272727 0.63636364 0.45833333
 0.85       0.65217391 0.55       0.63157895]

mean value: 0.626935892101796

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.56

Accuracy on Blind test: 0.78

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.51501703 1.47666764 1.49276137 1.48433304 1.4902401  1.48782825
 1.50618243 1.49330306 1.57110786 1.59888744]

mean value: 1.5116328239440917

key: score_time
value: [0.09284711 0.09556246 0.09648204 0.09837961 0.09728193 0.09940147
 0.0990901  0.0914495  0.09889078 0.09455132]

mean value: 0.09639363288879395

key: test_mcc
value: [0.94736842 0.89736456 0.89181287 0.94736842 0.89181287 0.89736456
 0.94736842 0.94736842 0.72333935 0.9459053 ]

mean value: 0.9037073192277006

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97297297 0.94594595 0.94594595 0.97297297 0.94594595 0.94594595
 0.97297297 0.97297297 0.86111111 0.97222222]

mean value: 0.950900900900901

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.94736842 0.94444444 0.97297297 0.94736842 0.94444444
 0.97297297 0.97297297 0.85714286 0.97297297]

mean value: 0.9505633453001874

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.9        0.94444444 0.94736842 0.94736842 1.
 1.         1.         0.88235294 0.94736842]

mean value: 0.9516271069831441

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.94444444 1.         0.94736842 0.89473684
 0.94736842 0.94736842 0.83333333 1.        ]

mean value: 0.9514619883040936

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.94590643 0.97368421 0.94590643 0.94736842
 0.97368421 0.97368421 0.86111111 0.97222222]

mean value: 0.9514619883040936

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.9        0.89473684 0.94736842 0.9        0.89473684
 0.94736842 0.94736842 0.75       0.94736842]

mean value: 0.9076315789473683

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.89117813 0.91720891 0.93513155 0.87951016 0.92528319 0.92419934
 0.97259903 1.01499391 0.9730444  0.98030806]

mean value: 0.9413456678390503

key: score_time
value: [0.21520376 0.24883294 0.16850185 0.21201348 0.27353764 0.13810372
 0.15924072 0.23260546 0.27458358 0.19869089]

mean value: 0.21213140487670898

key: test_mcc
value: [0.89181287 0.83918129 0.83918129 0.89736456 0.89181287 0.84959079
 0.89736456 1.         0.72333935 0.9459053 ]

mean value: 0.8775552871651642

key: train_mcc
value: [0.96381759 0.95786323 0.94563709 0.96994925 0.96381495 0.95785863
 0.96381495 0.95785863 0.96385542 0.95784871]

mean value: 0.9602318452069324

key: test_accuracy
value: [0.94594595 0.91891892 0.91891892 0.94594595 0.94594595 0.91891892
 0.94594595 1.         0.86111111 0.97222222]

mean value: 0.9373873873873874

key: train_accuracy
value: [0.98187311 0.97885196 0.97280967 0.98489426 0.98187311 0.97885196
 0.98187311 0.97885196 0.98192771 0.97891566]

mean value: 0.9800722527572525

key: test_fscore
value: [0.94444444 0.91891892 0.91891892 0.94736842 0.94736842 0.91428571
 0.94444444 1.         0.85714286 0.97297297]

mean value: 0.9365865113233535

key: train_fscore
value: [0.98181818 0.9787234  0.97280967 0.98480243 0.98170732 0.97859327
 0.98170732 0.97859327 0.98192771 0.97885196]

mean value: 0.9799534538436605

key: test_precision
value: [0.94444444 0.89473684 0.89473684 0.9        0.94736842 1.
 1.         1.         0.88235294 0.94736842]

mean value: 0.9411007911936704/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(


key: train_precision
value: [0.98780488 0.98773006 0.97575758 0.99386503 0.98773006 0.98765432
 0.98773006 0.98765432 0.98192771 0.98181818]

mean value: 0.9859672203167147

key: test_recall
value: [0.94444444 0.94444444 0.94444444 1.         0.94736842 0.84210526
 0.89473684 1.         0.83333333 1.        ]

mean value: 0.9350877192982456

key: train_recall
value: [0.97590361 0.96987952 0.96987952 0.97590361 0.97575758 0.96969697
 0.97575758 0.96969697 0.98192771 0.97590361]

mean value: 0.9740306681270536

key: test_roc_auc
value: [0.94590643 0.91959064 0.91959064 0.94736842 0.94590643 0.92105263
 0.94736842 1.         0.86111111 0.97222222]

mean value: 0.9380116959064327

key: train_roc_auc
value: [0.9818912  0.97887915 0.97281855 0.9849215  0.98185469 0.97882439
 0.98185469 0.97882439 0.98192771 0.97891566]

mean value: 0.9800711938663746

key: test_jcc
value: [0.89473684 0.85       0.85       0.9        0.9        0.84210526
 0.89473684 1.         0.75       0.94736842]

mean value: 0.8828947368421053

key: train_jcc
value: [0.96428571 0.95833333 0.94705882 0.97005988 0.96407186 0.95808383
 0.96407186 0.95808383 0.96449704 0.95857988]

mean value: 0.9607126051710413

MCC on Blind test: 0.86

Accuracy on Blind test: 0.93

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02643442 0.01237798 0.00983572 0.01009274 0.00978518 0.0098691
 0.00975704 0.00971055 0.01049852 0.00966072]

mean value: 0.011802196502685547

key: score_time
value: [0.01092529 0.00903606 0.00895619 0.00881696 0.00897527 0.008919
 0.00883269 0.00885367 0.008847   0.00878906]

mean value: 0.009095120429992675

key: test_mcc
value: [0.74044197 0.62280702 0.4633451  0.57184997 0.56934383 0.62170355
 0.7888597  0.6754386  0.4472136  0.78262379]

mean value: 0.628362712085414

key: train_mcc
value: [0.73459045 0.71601738 0.79022336 0.74626648 0.74713145 0.68655466
 0.70405667 0.72205184 0.77108434 0.76674551]

mean value: 0.7384722135344838

key: test_accuracy
value: [0.86486486 0.81081081 0.72972973 0.78378378 0.78378378 0.81081081
 0.89189189 0.83783784 0.72222222 0.88888889]

mean value: 0.8124624624624625

key: train_accuracy
value: [0.86706949 0.85800604 0.89425982 0.87311178 0.87311178 0.8429003
 0.85196375 0.86102719 0.88554217 0.88253012]

mean value: 0.8689522440214028

key: test_fscore
value: [0.87179487 0.81081081 0.73684211 0.78947368 0.8        0.82051282
 0.88888889 0.84210526 0.70588235 0.88235294]

mean value: 0.8148663738756617

key: train_fscore
value: [0.86982249 0.85885886 0.89795918 0.8742515  0.87573964 0.83850932
 0.85285285 0.86060606 0.88554217 0.88629738]

mean value: 0.8700439444712924

key: test_precision
value: [0.80952381 0.78947368 0.7        0.75       0.76190476 0.8
 0.94117647 0.84210526 0.75       0.9375    ]

mean value: 0.8081683989385228

key: train_precision
value: [0.85465116 0.85628743 0.8700565  0.86904762 0.85549133 0.85987261
 0.8452381  0.86060606 0.88554217 0.85875706]

mean value: 0.8615550031773642

key: test_recall
value: [0.94444444 0.83333333 0.77777778 0.83333333 0.84210526 0.84210526
 0.84210526 0.84210526 0.66666667 0.83333333]

mean value: 0.8257309941520468

key: train_recall
value: [0.88554217 0.86144578 0.92771084 0.87951807 0.8969697  0.81818182
 0.86060606 0.86060606 0.88554217 0.91566265]

mean value: 0.8791785323110625

key: test_roc_auc
value: [0.86695906 0.81140351 0.73099415 0.78508772 0.78216374 0.80994152
 0.89327485 0.8377193  0.72222222 0.88888889]

mean value: 0.8128654970760234

key: train_roc_auc
value: [0.86701351 0.85799562 0.89415845 0.87309237 0.87318364 0.84282585
 0.85198978 0.86102592 0.88554217 0.88253012]

mean value: 0.8689357429718876

key: test_jcc
value: [0.77272727 0.68181818 0.58333333 0.65217391 0.66666667 0.69565217
 0.8        0.72727273 0.54545455 0.78947368]

mean value: 0.6914572498439775

key: train_jcc
value: [0.76963351 0.75263158 0.81481481 0.77659574 0.77894737 0.72192513
 0.7434555  0.75531915 0.79459459 0.79581152]

mean value: 0.77037289076449

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09763837 0.05395174 0.05793595 0.10008216 0.05730486 0.0574739
 0.05634022 0.06836796 0.07902384 0.05754972]

mean value: 0.06856687068939209

key: score_time
value: [0.01159644 0.01121736 0.01094055 0.01107693 0.01040983 0.01041222
 0.01046443 0.01227784 0.01092386 0.0105648 ]

mean value: 0.010988426208496094

key: test_mcc
value: [0.94736842 0.89736456 0.94721815 0.94736842 0.94736842 0.89736456
 0.94736842 1.         0.72333935 1.        ]

mean value: 0.9254760307581459

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97297297 0.94594595 0.97297297 0.97297297 0.97297297 0.94594595
 0.97297297 1.         0.86111111 1.        ]

mean value: 0.9617867867867869

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.94736842 0.97142857 0.97297297 0.97297297 0.94444444
 0.97297297 1.         0.86486486 1.        ]

mean value: 0.9619998193682404

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.9        1.         0.94736842 1.         1.
 1.         1.         0.84210526 1.        ]

mean value: 0.9636842105263158

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.94444444 1.         0.94736842 0.89473684
 0.94736842 1.         0.88888889 1.        ]

mean value: 0.962280701754386

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.97222222 0.97368421 0.97368421 0.94736842
 0.97368421 1.         0.86111111 1.        ]

mean value: 0.962280701754386

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.9        0.94444444 0.94736842 0.94736842 0.89473684
 0.94736842 1.         0.76190476 1.        ]

mean value: 0.9290559732664996

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.92

Accuracy on Blind test: 0.96

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03597736 0.07507443 0.06862307 0.07008171 0.03527355 0.04518008
 0.06671071 0.0492239  0.0749712  0.06974387]

mean value: 0.05908598899841309

key: score_time
value: [0.02051544 0.02273822 0.02037239 0.01225233 0.01217675 0.02001953
 0.01223707 0.02085972 0.02115345 0.01531434]

mean value: 0.01776392459869385

key: test_mcc
value: [0.7888597  0.78764146 0.51461988 0.56725146 0.78362573 0.62280702
 0.83918129 0.73020842 0.55901699 0.89442719]

mean value: 0.7087639142980873

key: train_mcc
value: [0.9577218  0.95786323 0.9577218  0.93957649 0.95772025 0.93355239
 0.94563511 0.94563511 0.95180723 0.95208368]

mean value: 0.9499317074650987

key: test_accuracy
value: [0.89189189 0.89189189 0.75675676 0.78378378 0.89189189 0.81081081
 0.91891892 0.86486486 0.77777778 0.94444444]

mean value: 0.8533033033033033

key: train_accuracy
value: [0.97885196 0.97885196 0.97885196 0.96978852 0.97885196 0.96676737
 0.97280967 0.97280967 0.97590361 0.97590361]

mean value: 0.974939031048666

key: test_fscore
value: [0.89473684 0.88235294 0.75675676 0.77777778 0.89473684 0.81081081
 0.91891892 0.87179487 0.76470588 0.94736842]

mean value: 0.8519960064851706

key: train_fscore
value: [0.97885196 0.9787234  0.97885196 0.96987952 0.9787234  0.96676737
 0.97264438 0.97264438 0.97590361 0.97560976]

mean value: 0.9748599750031368

key: test_precision
value: [0.85       0.9375     0.73684211 0.77777778 0.89473684 0.83333333
 0.94444444 0.85       0.8125     0.9       ]

mean value: 0.8537134502923976

key: train_precision
value: [0.98181818 0.98773006 0.98181818 0.96987952 0.98170732 0.96385542
 0.97560976 0.97560976 0.97590361 0.98765432]

mean value: 0.9781586129458871

key: test_recall
value: [0.94444444 0.83333333 0.77777778 0.77777778 0.89473684 0.78947368
 0.89473684 0.89473684 0.72222222 1.        ]

mean value: 0.8529239766081871

key: train_recall
value: [0.97590361 0.96987952 0.97590361 0.96987952 0.97575758 0.96969697
 0.96969697 0.96969697 0.97590361 0.96385542]

mean value: 0.9716173786053304

key: test_roc_auc
value: [0.89327485 0.89035088 0.75730994 0.78362573 0.89181287 0.81140351
 0.91959064 0.86403509 0.77777778 0.94444444]

mean value: 0.8533625730994152

key: train_roc_auc
value: [0.9788609  0.97887915 0.9788609  0.96978824 0.97884264 0.9667762
 0.97280029 0.97280029 0.97590361 0.97590361]

mean value: 0.9749415845198979

key: test_jcc
value: [0.80952381 0.78947368 0.60869565 0.63636364 0.80952381 0.68181818
 0.85       0.77272727 0.61904762 0.9       ]

mean value: 0.7477173665388769

key: train_jcc
value: [0.95857988 0.95833333 0.95857988 0.94152047 0.95833333 0.93567251
 0.94674556 0.94674556 0.95294118 0.95238095]

mean value: 0.9509832665548312

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02032638 0.01103067 0.01079988 0.01054168 0.01051974 0.01051974
 0.0105474  0.01089859 0.0105722  0.01062846]

mean value: 0.011638474464416505

key: score_time
value: [0.01018023 0.00987649 0.00954866 0.00943232 0.00947976 0.00946927
 0.00955534 0.00968814 0.00956678 0.00959611]

mean value: 0.009639310836791991

key: test_mcc
value: [0.73020842 0.29824561 0.57184997 0.73099415 0.62280702 0.40469382
 0.7888597  0.57184997 0.3721042  0.89442719]

mean value: 0.5986040048640627

key: train_mcc
value: [0.65133406 0.56567532 0.71002957 0.70459299 0.69830851 0.60223279
 0.68655466 0.63906236 0.67014765 0.73493976]

mean value: 0.6662877681119577

key: test_accuracy
value: [0.86486486 0.64864865 0.78378378 0.86486486 0.81081081 0.7027027
 0.89189189 0.78378378 0.66666667 0.94444444]

mean value: 0.7962462462462463

key: train_accuracy
value: [0.82477341 0.78247734 0.85498489 0.85196375 0.8489426  0.80060423
 0.8429003  0.81873112 0.83433735 0.86746988]

mean value: 0.8327184872420195

key: test_fscore
value: [0.85714286 0.64864865 0.78947368 0.86486486 0.81081081 0.71794872
 0.88888889 0.77777778 0.57142857 0.94117647]

mean value: 0.7868161292309899

key: train_fscore
value: [0.81875    0.77777778 0.85454545 0.84923077 0.84567901 0.79375
 0.83850932 0.81132075 0.82866044 0.86746988]

mean value: 0.8285693401041992

key: test_precision
value: [0.88235294 0.63157895 0.75       0.84210526 0.83333333 0.7
 0.94117647 0.82352941 0.8        1.        ]

mean value: 0.820407636738906

key: train_precision
value: [0.85064935 0.79746835 0.8597561  0.86792453 0.86163522 0.81935484
 0.85987261 0.84313725 0.85806452 0.86746988]

mean value: 0.848533265179209

key: test_recall
value: [0.83333333 0.66666667 0.83333333 0.88888889 0.78947368 0.73684211
 0.84210526 0.73684211 0.44444444 0.88888889]

mean value: 0.7660818713450293

key: train_recall
value: [0.78915663 0.75903614 0.84939759 0.8313253  0.83030303 0.76969697
 0.81818182 0.78181818 0.80120482 0.86746988]

mean value: 0.8097590361445783

key: test_roc_auc
value: [0.86403509 0.64912281 0.78508772 0.86549708 0.81140351 0.70175439
 0.89327485 0.78508772 0.66666667 0.94444444]

mean value: 0.7966374269005848

key: train_roc_auc
value: [0.82488134 0.78254838 0.85500183 0.85202629 0.84888645 0.80051114
 0.84282585 0.81861993 0.83433735 0.86746988]

mean value: 0.832710843373494

key: test_jcc
value: [0.75       0.48       0.65217391 0.76190476 0.68181818 0.56
 0.8        0.63636364 0.4        0.88888889]

mean value: 0.6611149382018947

key: train_jcc
value: [0.69312169 0.63636364 0.74603175 0.73796791 0.73262032 0.65803109
 0.72192513 0.68253968 0.70744681 0.76595745]

mean value: 0.7082005470442766

MCC on Blind test: 0.77

Accuracy on Blind test: 0.89

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01300788 0.01797628 0.01849413 0.01570034 0.01609373 0.02065372
 0.02061057 0.01798749 0.01872349 0.01886225]

mean value: 0.017810988426208495

key: score_time
value: [0.00886083 0.01124477 0.0111165  0.01169562 0.01177621 0.0118463
 0.01175761 0.0117023  0.01185799 0.01174355]

mean value: 0.01136016845703125

key: test_mcc
value: [0.80369958 0.51793973 0.56725146 0.73020842 0.78362573 0.78362573
 0.84959079 0.83918129 0.79772404 0.73246702]

mean value: 0.7405313784220269

key: train_mcc
value: [0.85241016 0.90101455 0.93492806 0.76178654 0.87390869 0.91019063
 0.93436201 0.89441747 0.82781591 0.90978714]

mean value: 0.8800621170623023

key: test_accuracy
value: [0.89189189 0.75675676 0.78378378 0.86486486 0.89189189 0.89189189
 0.91891892 0.91891892 0.88888889 0.86111111]

mean value: 0.8668918918918919

key: train_accuracy
value: [0.9244713  0.94864048 0.96676737 0.87009063 0.93655589 0.95468278
 0.96676737 0.94561934 0.90662651 0.95481928]

mean value: 0.9375040949295672

key: test_fscore
value: [0.9        0.72727273 0.77777778 0.85714286 0.89473684 0.89473684
 0.91428571 0.91891892 0.875      0.87179487]

mean value: 0.8631666551403394

key: train_fscore
value: [0.92795389 0.94637224 0.96594427 0.85324232 0.93768546 0.95548961
 0.96594427 0.94303797 0.89700997 0.95440729]

mean value: 0.9347087306426057

key: test_precision
value: [0.81818182 0.8        0.77777778 0.88235294 0.89473684 0.89473684
 1.         0.94444444 1.         0.80952381]

mean value: 0.8821754475314847

key: train_precision
value: [0.88950276 0.99337748 0.99363057 0.98425197 0.91860465 0.93604651
 0.98734177 0.98675497 1.         0.96319018]

mean value: 0.9652700873506086

key: test_recall
value: [1.         0.66666667 0.77777778 0.83333333 0.89473684 0.89473684
 0.84210526 0.89473684 0.77777778 0.94444444]

mean value: 0.8526315789473684

key: train_recall
value: [0.96987952 0.90361446 0.93975904 0.75301205 0.95757576 0.97575758
 0.94545455 0.9030303  0.81325301 0.94578313]

mean value: 0.9107119386637459

key: test_roc_auc
value: [0.89473684 0.75438596 0.78362573 0.86403509 0.89181287 0.89181287
 0.92105263 0.91959064 0.88888889 0.86111111]

mean value: 0.8671052631578947

key: train_roc_auc
value: [0.9243337  0.94877693 0.96684922 0.87044542 0.9366192  0.95474626
 0.96670318 0.94549106 0.90662651 0.95481928]

mean value: 0.9375410733844468

key: test_jcc
value: [0.81818182 0.57142857 0.63636364 0.75       0.80952381 0.80952381
 0.84210526 0.85       0.77777778 0.77272727]

mean value: 0.763763195868459

key: train_jcc
value: [0.8655914  0.89820359 0.93413174 0.74404762 0.88268156 0.91477273
 0.93413174 0.89221557 0.81325301 0.9127907 ]

mean value: 0.8791819652868769

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01939464 0.01617098 0.01889229 0.01758027 0.01637602 0.01791692
 0.01738191 0.03996897 0.0171361  0.01589084]

mean value: 0.019670891761779784

key: score_time
value: [0.01182365 0.01187801 0.01180458 0.01176167 0.01184773 0.02823496
 0.01288629 0.01227355 0.01188588 0.01173544]

mean value: 0.013613176345825196

key: test_mcc
value: [0.7163504  0.51478965 0.62280702 0.51121719 0.78362573 0.84959079
 0.59234888 0.78362573 0.78262379 0.78262379]

mean value: 0.693960296854044

key: train_mcc
value: [0.84650478 0.75381029 0.95278173 0.57559806 0.91540708 0.92917693
 0.63638198 0.90359336 0.91893658 0.87618491]

mean value: 0.8308375712756191

key: test_accuracy
value: [0.83783784 0.72972973 0.81081081 0.7027027  0.89189189 0.91891892
 0.75675676 0.89189189 0.88888889 0.88888889]

mean value: 0.8318318318318318

key: train_accuracy
value: [0.918429   0.86404834 0.97583082 0.74924471 0.95770393 0.96374622
 0.78851964 0.95166163 0.95783133 0.93674699]

mean value: 0.9063762603283223

key: test_fscore
value: [0.85714286 0.77272727 0.81081081 0.76595745 0.89473684 0.91428571
 0.68965517 0.89473684 0.88235294 0.89473684]

mean value: 0.8377142741681218

key: train_fscore
value: [0.92436975 0.88       0.97530864 0.8        0.95757576 0.9625
 0.73076923 0.95209581 0.95597484 0.93913043]

mean value: 0.9077724464152594

key: test_precision
value: [0.75       0.65384615 0.78947368 0.62068966 0.89473684 1.
 1.         0.89473684 0.9375     0.85      ]

mean value: 0.839098317743962

key: train_precision
value: [0.86387435 0.78947368 1.         0.66666667 0.95757576 0.99354839
 1.         0.9408284  1.         0.90502793]

mean value: 0.9116995176427221

key: test_recall
value: [1.         0.94444444 0.83333333 1.         0.89473684 0.84210526
 0.52631579 0.89473684 0.83333333 0.94444444]

mean value: 0.8713450292397661

key: train_recall
value: [0.9939759  0.9939759  0.95180723 1.         0.95757576 0.93333333
 0.57575758 0.96363636 0.91566265 0.97590361]

mean value: 0.926162833150785

key: test_roc_auc
value: [0.84210526 0.73538012 0.81140351 0.71052632 0.89181287 0.92105263
 0.76315789 0.89181287 0.88888889 0.88888889]

mean value: 0.8345029239766082

key: train_roc_auc
value: [0.91820007 0.86365462 0.97590361 0.74848485 0.95770354 0.96365462
 0.78787879 0.9516977  0.95783133 0.93674699]

mean value: 0.9061756115370574

key: test_jcc
value: [0.75       0.62962963 0.68181818 0.62068966 0.80952381 0.84210526
 0.52631579 0.80952381 0.78947368 0.80952381]

mean value: 0.7268603632033759

key: train_jcc
value: [0.859375   0.78571429 0.95180723 0.66666667 0.91860465 0.92771084
 0.57575758 0.90857143 0.91566265 0.8852459 ]

mean value: 0.8395116232403658

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.15993023 0.14477468 0.14659452 0.14503288 0.14535022 0.14776993
 0.14721227 0.14613962 0.14871216 0.1482687 ]

mean value: 0.1479785203933716

key: score_time
value: [0.01520491 0.01517415 0.0165267  0.01517439 0.01506782 0.01651335
 0.01520538 0.01570082 0.01625323 0.01571679]

mean value: 0.01565375328063965

key: test_mcc
value: [1.         0.83918129 0.94721815 0.94736842 0.94736842 0.94736842
 0.94736842 0.94736842 0.78262379 1.        ]

mean value: 0.9305865333163313

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.91891892 0.97297297 0.97297297 0.97297297 0.97297297
 0.97297297 0.97297297 0.88888889 1.        ]

mean value: 0.9645645645645646

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.91891892 0.97142857 0.97297297 0.97297297 0.97297297
 0.97297297 0.97297297 0.89473684 1.        ]

mean value: 0.9649949197317619

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.89473684 1.         0.94736842 1.         1.
 1.         1.         0.85       1.        ]

mean value: 0.9692105263157895

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.94444444 0.94444444 1.         0.94736842 0.94736842
 0.94736842 0.94736842 0.94444444 1.        ]

mean value: 0.962280701754386

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.91959064 0.97222222 0.97368421 0.97368421 0.97368421
 0.97368421 0.97368421 0.88888889 1.        ]

mean value: 0.9649122807017544

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.85       0.94444444 0.94736842 0.94736842 0.94736842
 0.94736842 0.94736842 0.80952381 1.        ]

mean value: 0.9340810359231412

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.92

Accuracy on Blind test: 0.96

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04762053 0.05938816 0.05389047 0.07260513 0.06046724 0.05754685
 0.05538869 0.05734062 0.06089234 0.06812143]

mean value: 0.05932614803314209

key: score_time
value: [0.01756454 0.02058482 0.03347611 0.03177166 0.02301621 0.04879737
 0.0274539  0.0281961  0.03790355 0.03870583]

mean value: 0.030747008323669434

key: test_mcc
value: [0.94736842 0.84959079 1.         0.94736842 0.89181287 0.89736456
 0.94736842 0.89736456 0.83462233 1.        ]

mean value: 0.9212860370101059

key: train_mcc
value: [0.9939759  0.9818912  0.98203528 0.9939759  0.9879153  0.98189054
 0.98189054 0.9879153  0.99399394 0.98194553]

mean value: 0.9867429430855921

key: test_accuracy
value: [0.97297297 0.91891892 1.         0.97297297 0.94594595 0.94594595
 0.97297297 0.94594595 0.91666667 1.        ]

mean value: 0.9592342342342343

key: train_accuracy
value: [0.99697885 0.99093656 0.99093656 0.99697885 0.9939577  0.99093656
 0.99093656 0.9939577  0.99698795 0.99096386]

mean value: 0.9933571142576347

key: test_fscore
value: [0.97297297 0.92307692 1.         0.97297297 0.94736842 0.94444444
 0.97297297 0.94444444 0.91891892 1.        ]

mean value: 0.9597172070856281

key: train_fscore
value: [0.99697885 0.99093656 0.99088146 0.99697885 0.99393939 0.99088146
 0.99088146 0.99393939 0.99697885 0.99093656]

mean value: 0.99333328324522

key: test_precision
value: [0.94736842 0.85714286 1.         0.94736842 0.94736842 1.
 1.         1.         0.89473684 1.        ]

mean value: 0.9593984962406015

key: train_precision
value: [1.         0.99393939 1.         1.         0.99393939 0.99390244
 0.99390244 0.99393939 1.         0.99393939]

mean value: 0.9963562453806356

key: test_recall
value: [1.         1.         1.         1.         0.94736842 0.89473684
 0.94736842 0.89473684 0.94444444 1.        ]

mean value: 0.9628654970760234

key: train_recall
value: [0.9939759  0.98795181 0.98192771 0.9939759  0.99393939 0.98787879
 0.98787879 0.99393939 0.9939759  0.98795181]

mean value: 0.9903395399780942

key: test_roc_auc
value: [0.97368421 0.92105263 1.         0.97368421 0.94590643 0.94736842
 0.97368421 0.94736842 0.91666667 1.        ]

mean value: 0.9599415204678362

key: train_roc_auc
value: [0.99698795 0.9909456  0.99096386 0.99698795 0.99395765 0.99092735
 0.99092735 0.99395765 0.99698795 0.99096386]

mean value: 0.9933607155896312

key: test_jcc
value: [0.94736842 0.85714286 1.         0.94736842 0.9        0.89473684
 0.94736842 0.89473684 0.85       1.        ]

mean value: 0.9238721804511278

key: train_jcc
value: [0.9939759  0.98203593 0.98192771 0.9939759  0.98795181 0.98192771
 0.98192771 0.98795181 0.9939759  0.98203593]

mean value: 0.9867686314118751

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.07490635 0.11946273 0.103163   0.10210061 0.1119535  0.11135602
 0.16045046 0.09157825 0.10720015 0.11908531]

mean value: 0.11012563705444336

key: score_time
value: [0.02171206 0.02200747 0.02642679 0.02162862 0.02213502 0.02532601
 0.02199364 0.02181697 0.02197194 0.02154231]

mean value: 0.02265608310699463

key: test_mcc
value: [0.56725146 0.6754386  0.35104619 0.63129316 0.4633451  0.56725146
 0.62280702 0.60308132 0.5007734  0.78262379]

mean value: 0.5764911497325073

key: train_mcc
value: [0.9939759  0.9939759  0.9939759  0.9939759  0.99397568 1.
 1.         0.99397568 0.99399394 0.99399394]

mean value: 0.9951842862455198

key: test_accuracy
value: [0.78378378 0.83783784 0.67567568 0.81081081 0.72972973 0.78378378
 0.81081081 0.78378378 0.75       0.88888889]

mean value: 0.7855105105105105

key: train_accuracy
value: [0.99697885 0.99697885 0.99697885 0.99697885 0.99697885 1.
 1.         0.99697885 0.99698795 0.99698795]

mean value: 0.9975849015396935

key: test_fscore
value: [0.77777778 0.83333333 0.64705882 0.82051282 0.72222222 0.78947368
 0.81081081 0.75       0.75675676 0.88235294]

mean value: 0.779029917033013

key: train_fscore
value: [0.99697885 0.99697885 0.99697885 0.99697885 0.99696049 1.
 1.         0.99696049 0.99697885 0.99697885]

mean value: 0.9975794084426854

key: test_precision
value: [0.77777778 0.83333333 0.6875     0.76190476 0.76470588 0.78947368
 0.83333333 0.92307692 0.73684211 0.9375    ]

mean value: 0.8045447801252755

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.77777778 0.83333333 0.61111111 0.88888889 0.68421053 0.78947368
 0.78947368 0.63157895 0.77777778 0.83333333]

mean value: 0.7616959064327485

key: train_recall
value: [0.9939759  0.9939759  0.9939759  0.9939759  0.99393939 1.
 1.         0.99393939 0.9939759  0.9939759 ]

mean value: 0.9951734209565535

key: test_roc_auc
value: [0.78362573 0.8377193  0.67397661 0.8128655  0.73099415 0.78362573
 0.81140351 0.7880117  0.75       0.88888889]

mean value: 0.7861111111111111

key: train_roc_auc
value: [0.99698795 0.99698795 0.99698795 0.99698795 0.9969697  1.
 1.         0.9969697  0.99698795 0.99698795]

mean value: 0.9975867104782767

key: test_jcc
value: [0.63636364 0.71428571 0.47826087 0.69565217 0.56521739 0.65217391
 0.68181818 0.6        0.60869565 0.78947368]

mean value: 0.6421941216678059

key: train_jcc
value: [0.9939759  0.9939759  0.9939759  0.9939759  0.99393939 1.
 1.         0.99393939 0.9939759  0.9939759 ]

mean value: 0.9951734209565535

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.56603932 0.55516958 0.56111884 0.5529871  0.55033922 0.55430651
 0.53636289 0.54783416 0.5531528  0.55002046]

mean value: 0.5527330875396729

key: score_time
value: [0.01002455 0.00995398 0.00930357 0.00971341 0.01047468 0.00944734
 0.00963187 0.00966716 0.00973773 0.00942183]

mean value: 0.009737610816955566

key: test_mcc
value: [0.89181287 0.7888597  1.         0.94736842 0.89181287 0.94736842
 0.94736842 1.         0.78262379 1.        ]

mean value: 0.9197214483309992

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94594595 0.89189189 1.         0.97297297 0.94594595 0.97297297
 0.97297297 1.         0.88888889 1.        ]

mean value: 0.9591591591591592

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94444444 0.89473684 1.         0.97297297 0.94736842 0.97297297
 0.97297297 1.         0.89473684 1.        ]

mean value: 0.9600205468626521

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94444444 0.85       1.         0.94736842 0.94736842 1.
 1.         1.         0.85       1.        ]

mean value: 0.9539181286549707

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94444444 0.94444444 1.         1.         0.94736842 0.94736842
 0.94736842 1.         0.94444444 1.        ]

mean value: 0.9675438596491228

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94590643 0.89327485 1.         0.97368421 0.94590643 0.97368421
 0.97368421 1.         0.88888889 1.        ]

mean value: 0.9595029239766082

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.89473684 0.80952381 1.         0.94736842 0.9        0.94736842
 0.94736842 1.         0.80952381 1.        ]

mean value: 0.9255889724310777

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02545977 0.02662921 0.03351521 0.02726531 0.02694201 0.02765965
 0.02748775 0.02727795 0.02772045 0.02956676]

mean value: 0.027952408790588378

key: score_time
value: [0.01267028 0.01618814 0.01552796 0.01530409 0.01815295 0.01537704
 0.01538992 0.01563191 0.01558256 0.01613903]

mean value: 0.015596389770507812

key: test_mcc
value: [0.51319869 0.40643275 0.02932564 0.1378305  0.4163404  0.1378305
 0.36315314 0.46019501 0.35355339 0.35355339]

mean value: 0.3171413424046587

key: train_mcc
value: [0.91872008 0.9939759  0.98203333 0.83781349 0.96374589 0.80665108
 0.96437604 0.9818912  0.97618706 0.86450473]

mean value: 0.9289898789692037

key: test_accuracy
value: [0.75675676 0.7027027  0.51351351 0.56756757 0.7027027  0.56756757
 0.67567568 0.72972973 0.66666667 0.66666667]

mean value: 0.6549549549549549

key: train_accuracy
value: [0.95770393 0.99697885 0.99093656 0.91238671 0.98187311 0.89425982
 0.98187311 0.99093656 0.98795181 0.92771084]

mean value: 0.9622611291085793

key: test_fscore
value: [0.74285714 0.7027027  0.52631579 0.57894737 0.74418605 0.55555556
 0.64705882 0.75       0.71428571 0.71428571]

mean value: 0.6676194857622606

key: train_fscore
value: [0.95597484 0.99697885 0.99104478 0.90429043 0.98181818 0.88135593
 0.98148148 0.99093656 0.98809524 0.93258427]

mean value: 0.96045605590458

key: test_precision
value: [0.76470588 0.68421053 0.5        0.55       0.66666667 0.58823529
 0.73333333 0.71428571 0.625      0.625     ]

mean value: 0.6451437417072092

key: train_precision
value: [1.         1.         0.98224852 1.         0.98181818 1.
 1.         0.98795181 0.97647059 0.87368421]

mean value: 0.9802173308518767

key: test_recall
value: [0.72222222 0.72222222 0.55555556 0.61111111 0.84210526 0.52631579
 0.57894737 0.78947368 0.83333333 0.83333333]

mean value: 0.7014619883040936

key: train_recall
value: [0.91566265 0.9939759  1.         0.8253012  0.98181818 0.78787879
 0.96363636 0.99393939 1.         1.        ]

mean value: 0.9462212486308872

key: test_roc_auc
value: [0.75584795 0.70321637 0.51461988 0.56871345 0.69883041 0.56871345
 0.67836257 0.72807018 0.66666667 0.66666667]

mean value: 0.6549707602339181

key: train_roc_auc
value: [0.95783133 0.99698795 0.99090909 0.9126506  0.98187295 0.89393939
 0.98181818 0.9909456  0.98795181 0.92771084]

mean value: 0.9622617743702081

key: test_jcc
value: [0.59090909 0.54166667 0.35714286 0.40740741 0.59259259 0.38461538
 0.47826087 0.6        0.55555556 0.55555556]

mean value: 0.5063705980010328

key: train_jcc
value: [0.91566265 0.9939759  0.98224852 0.8253012  0.96428571 0.78787879
 0.96363636 0.98203593 0.97647059 0.87368421]

mean value: 0.9265179872452391

MCC on Blind test: 0.5

Accuracy on Blind test: 0.75

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02361274 0.05441642 0.03723788 0.04656839 0.03769898 0.03737879
 0.04589653 0.03744817 0.03729868 0.03707409]

mean value: 0.03946306705474854

key: score_time
value: [0.02177238 0.02057743 0.02044559 0.02325034 0.01888037 0.02219844
 0.0233705  0.0234704  0.0239017  0.02041125]

mean value: 0.021827840805053712

key: test_mcc
value: [0.94736842 0.73821295 0.56725146 0.73099415 0.78362573 0.73099415
 0.84959079 0.89181287 0.72333935 0.83462233]

mean value: 0.7797812196768581

key: train_mcc
value: [0.88520939 0.89729828 0.91547702 0.87940108 0.91547085 0.89729828
 0.89729828 0.88521358 0.89182522 0.89156627]

mean value: 0.8956058253044522

key: test_accuracy
value: [0.97297297 0.86486486 0.78378378 0.86486486 0.89189189 0.86486486
 0.91891892 0.94594595 0.86111111 0.91666667]

mean value: 0.8885885885885886

key: train_accuracy
value: [0.94259819 0.94864048 0.95770393 0.93957704 0.95770393 0.94864048
 0.94864048 0.94259819 0.94578313 0.94578313]

mean value: 0.9477668984093474

key: test_fscore
value: [0.97297297 0.84848485 0.77777778 0.86486486 0.89473684 0.86486486
 0.91428571 0.94736842 0.85714286 0.91891892]

mean value: 0.8861418082470714

key: train_fscore /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)

value: [0.94294294 0.94864048 0.95757576 0.94047619 0.95731707 0.94864048
 0.94864048 0.94259819 0.94642857 0.94578313]

mean value: 0.947904330558655

key: test_precision
value: [0.94736842 0.93333333 0.77777778 0.84210526 0.89473684 0.88888889
 1.         0.94736842 0.88235294 0.89473684]

mean value: 0.9008668730650154

key: train_precision
value: [0.94011976 0.95151515 0.96341463 0.92941176 0.96319018 0.94578313
 0.94578313 0.93975904 0.93529412 0.94578313]

mean value: 0.9460054046277495

key: test_recall
value: [1.         0.77777778 0.77777778 0.88888889 0.89473684 0.84210526
 0.84210526 0.94736842 0.83333333 0.94444444]

mean value: 0.8748538011695907

key: train_recall
value: [0.94578313 0.94578313 0.95180723 0.95180723 0.95151515 0.95151515
 0.95151515 0.94545455 0.95783133 0.94578313]

mean value: 0.9498795180722892

key: test_roc_auc
value: [0.97368421 0.8625731  0.78362573 0.86549708 0.89181287 0.86549708
 0.92105263 0.94590643 0.86111111 0.91666667]

mean value: 0.8887426900584795

key: train_roc_auc
value: [0.94258854 0.94864914 0.9577218  0.93953998 0.95768529 0.94864914
 0.94864914 0.94260679 0.94578313 0.94578313]

mean value: 0.9477656078860899

key: test_jcc
value: [0.94736842 0.73684211 0.63636364 0.76190476 0.80952381 0.76190476
 0.84210526 0.9        0.75       0.85      ]

mean value: 0.7996012759170654

key: train_jcc
value: [0.89204545 0.90229885 0.91860465 0.88764045 0.91812865 0.90229885
 0.90229885 0.89142857 0.89830508 0.89714286]

mean value: 0.9010192275158537

MCC on Blind test: 0.79

Accuracy on Blind test: 0.9

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.25129724 0.32188392 0.33577132 0.31489706 0.26630807 0.47269154
 0.30162644 0.28344464 0.26365685 0.33231711]

mean value: 0.3143894195556641

key: score_time
value: [0.01923299 0.02272201 0.02159452 0.02046084 0.01954365 0.02145052
 0.02375412 0.02187276 0.01636934 0.02253842]

mean value: 0.020953917503356935

key: test_mcc
value: [0.94736842 0.73821295 0.56725146 0.73099415 0.78362573 0.73099415
 0.84959079 0.89181287 0.72333935 0.83462233]

mean value: 0.7797812196768581

key: train_mcc
value: [0.88520939 0.89729828 0.91547702 0.87940108 0.91547085 0.89729828
 0.89729828 0.80061339 0.89182522 0.89156627]

mean value: 0.8871458056914586

key: test_accuracy
value: [0.97297297 0.86486486 0.78378378 0.86486486 0.89189189 0.86486486
 0.91891892 0.94594595 0.86111111 0.91666667]

mean value: 0.8885885885885886

key: train_accuracy
value: [0.94259819 0.94864048 0.95770393 0.93957704 0.95770393 0.94864048
 0.94864048 0.90030211 0.94578313 0.94578313]

mean value: 0.9435372911585921

key: test_fscore
value: [0.97297297 0.84848485 0.77777778 0.86486486 0.89473684 0.86486486
 0.91428571 0.94736842 0.85714286 0.91891892]

mean value: 0.8861418082470714

key: train_fscore
value: [0.94294294 0.94864048 0.95757576 0.94047619 0.95731707 0.94864048
 0.94864048 0.89969605 0.94642857 0.94578313]

mean value: 0.9436141166907591

key: test_precision
value: [0.94736842 0.93333333 0.77777778 0.84210526 0.89473684 0.88888889
 1.         0.94736842 0.88235294 0.89473684]

mean value: 0.9008668730650154

key: train_precision
value: [0.94011976 0.95151515 0.96341463 0.92941176 0.96319018 0.94578313
 0.94578313 0.90243902 0.93529412 0.94578313]

mean value: 0.9422734034523161

key: test_recall
value: [1.         0.77777778 0.77777778 0.88888889 0.89473684 0.84210526
 0.84210526 0.94736842 0.83333333 0.94444444]

mean value: 0.8748538011695907

key: train_recall
value: [0.94578313 0.94578313 0.95180723 0.95180723 0.95151515 0.95151515
 0.95151515 0.8969697  0.95783133 0.94578313]

mean value: 0.9450310332238043

key: test_roc_auc
value: [0.97368421 0.8625731  0.78362573 0.86549708 0.89181287 0.86549708
 0.92105263 0.94590643 0.86111111 0.91666667]

mean value: 0.8887426900584795

key: train_roc_auc
value: [0.94258854 0.94864914 0.9577218  0.93953998 0.95768529 0.94864914
 0.94864914 0.90029208 0.94578313 0.94578313]

mean value: 0.9435341365461848

key: test_jcc
value: [0.94736842 0.73684211 0.63636364 0.76190476 0.80952381 0.76190476
 0.84210526 0.9        0.75       0.85      ]

mean value: 0.7996012759170654

key: train_jcc
value: [0.89204545 0.90229885 0.91860465 0.88764045 0.91812865 0.90229885
 0.90229885 0.81767956 0.89830508 0.89714286]

mean value: 0.8936443261741015

MCC on Blind test: 0.79

Accuracy on Blind test: 0.9

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03955388 0.03381324 0.03329587 0.05509424 0.05626059 0.05854964
 0.05459762 0.06159639 0.03439593 0.03141403]

mean value: 0.04585714340209961

key: score_time
value: [0.01206017 0.01217222 0.01196647 0.01265693 0.0123384  0.01210618
 0.01433325 0.01214552 0.01429367 0.01195979]

mean value: 0.012603259086608887

key: test_mcc
value: [0.73786479 0.84327404 0.68421053 0.84327404 0.84327404 0.73786479
 0.89973541 0.84327404 0.89736456 0.51319869]

mean value: 0.7843334932834807

key: train_mcc
value: [0.85906136 0.87648575 0.85888297 0.84119102 0.87660709 0.84705882
 0.87058824 0.85888297 0.85929061 0.85930029]

mean value: 0.8607349129613298

key: test_accuracy
value: [0.86842105 0.92105263 0.84210526 0.92105263 0.92105263 0.86842105
 0.94736842 0.92105263 0.94594595 0.75675676]

mean value: 0.8913229018492176

key: train_accuracy
value: [0.92941176 0.93823529 0.92941176 0.92058824 0.93823529 0.92352941
 0.93529412 0.92941176 0.92961877 0.92961877]

mean value: 0.9303355183715715

key: test_fscore
value: [0.87179487 0.92307692 0.84210526 0.91891892 0.91891892 0.87179487
 0.94444444 0.91891892 0.94736842 0.76923077]

mean value: 0.8926572321309163

key: train_fscore
value: [0.93023256 0.93841642 0.92899408 0.92035398 0.93877551 0.92352941
 0.93529412 0.92982456 0.93023256 0.92982456]

mean value: 0.9305477766130446

key: test_precision
value: [0.85       0.9        0.84210526 0.94444444 0.94444444 0.85
 1.         0.94444444 0.9        0.75      ]

mean value: 0.8925438596491228

key: train_precision
value: [0.91954023 0.93567251 0.93452381 0.92307692 0.93063584 0.92352941
 0.93529412 0.9244186  0.92485549 0.9244186 ]

mean value: 0.9275965545299533

key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.89473684 0.89473684 0.89473684
 0.89473684 0.89473684 1.         0.78947368]

mean value: 0.8947368421052632

key: train_recall
value: [0.94117647 0.94117647 0.92352941 0.91764706 0.94705882 0.92352941
 0.93529412 0.93529412 0.93567251 0.93529412]

mean value: 0.9335672514619883

key: test_roc_auc
value: [0.86842105 0.92105263 0.84210526 0.92105263 0.92105263 0.86842105
 0.94736842 0.92105263 0.94736842 0.75584795]

mean value: 0.8913742690058479

key: train_roc_auc
value: [0.92941176 0.93823529 0.92941176 0.92058824 0.93823529 0.92352941
 0.93529412 0.92941176 0.92960096 0.92963536]

mean value: 0.9303353973168215

key: test_jcc
value: [0.77272727 0.85714286 0.72727273 0.85       0.85       0.77272727
 0.89473684 0.85       0.9        0.625     ]

mean value: 0.8099606971975393

key: train_jcc
value: [0.86956522 0.8839779  0.86740331 0.85245902 0.88461538 0.8579235
 0.87845304 0.86885246 0.86956522 0.86885246]

mean value: 0.8701667505235628

MCC on Blind test: 0.83

Accuracy on Blind test: 0.91

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.85755229 0.90421176 1.06443739 0.84898424 1.10972214 1.86754012
 1.32581997 1.09845734 1.02783036 1.25375056]

mean value: 1.1358306169509889

key: score_time
value: [0.01481295 0.01703072 0.01232839 0.02471399 0.01229548 0.01258135
 0.01538014 0.01235461 0.01231289 0.01262498]

mean value: 0.014643549919128418

key: test_mcc
value: [0.78947368 0.79388419 0.68421053 0.89473684 0.73786479 0.73786479
 0.85280287 0.84327404 0.89736456 0.51319869]

mean value: 0.7744674971633342

key: train_mcc
value: [1.         0.90001557 0.88825066 0.87648575 0.82942611 0.98236994
 0.88235294 0.82375747 0.77139024 0.82404541]

mean value: 0.8778094100190499

key: test_accuracy
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.86842105 0.86842105
 0.92105263 0.92105263 0.94594595 0.75675676]

mean value: 0.8860597439544808

key: train_accuracy
value: [1.         0.95       0.94411765 0.93823529 0.91470588 0.99117647
 0.94117647 0.91176471 0.8856305  0.91202346]

mean value: 0.9388830429532516

key: test_fscore
value: [0.89473684 0.9        0.84210526 0.94736842 0.86486486 0.87179487
 0.91428571 0.91891892 0.94736842 0.76923077]

mean value: 0.887067408646356

key: train_fscore
value: [1.         0.94985251 0.9439528  0.9380531  0.91445428 0.99115044
 0.94117647 0.9127907  0.88495575 0.91176471]

mean value: 0.9388150753201054

key: test_precision
value: [0.89473684 0.85714286 0.84210526 0.94736842 0.88888889 0.85
 1.         0.94444444 0.9        0.75      ]

mean value: 0.887468671679198

key: train_precision
value: [1.         0.95266272 0.94674556 0.9408284  0.91715976 0.99408284
 0.94117647 0.90229885 0.89285714 0.91176471]

mean value: 0.9399576459843272

key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.94736842 0.84210526 0.89473684
 0.84210526 0.89473684 1.         0.78947368]

mean value: 0.8894736842105263

key: train_recall
value: [1.         0.94705882 0.94117647 0.93529412 0.91176471 0.98823529
 0.94117647 0.92352941 0.87719298 0.91176471]

mean value: 0.9377192982456141

key: test_roc_auc
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.86842105 0.86842105
 0.92105263 0.92105263 0.94736842 0.75584795]

mean value: 0.8861111111111111

key: train_roc_auc
value: [1.         0.95       0.94411765 0.93823529 0.91470588 0.99117647
 0.94117647 0.91176471 0.88565531 0.9120227 ]

mean value: 0.9388854489164087

key: test_jcc
value: [0.80952381 0.81818182 0.72727273 0.9        0.76190476 0.77272727
 0.84210526 0.85       0.9        0.625     ]

mean value: 0.8006715652768285

key: train_jcc
value: [1.         0.90449438 0.89385475 0.88333333 0.8423913  0.98245614
 0.88888889 0.83957219 0.79365079 0.83783784]

mean value: 0.886647962154875

MCC on Blind test: 0.76

Accuracy on Blind test: 0.88

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01427627 0.01156187 0.01119947 0.01072907 0.01078057 0.01065731
 0.01075244 0.01101494 0.01063251 0.01074815]

mean value: 0.011235260963439941

key: score_time
value: [0.0127666  0.01023865 0.01028371 0.00991011 0.00974965 0.00975084
 0.00979042 0.00979543 0.00977588 0.00977397]

mean value: 0.01018352508544922

key: test_mcc
value: [0.63245553 0.54554473 0.59222009 0.59222009 0.58218174 0.37686733
 0.85280287 0.63245553 0.73020842 0.62280702]

mean value: 0.6159763343462977

key: train_mcc
value: [0.67627507 0.65158377 0.68169173 0.62705429 0.65254612 0.6500365
 0.62983758 0.67087425 0.64290652 0.64159047]

mean value: 0.6524396309639473

key: test_accuracy
value: [0.81578947 0.76315789 0.78947368 0.78947368 0.78947368 0.68421053
 0.92105263 0.81578947 0.86486486 0.81081081]

mean value: 0.8044096728307255

key: train_accuracy
value: [0.83529412 0.82352941 0.83823529 0.81176471 0.82352941 0.81764706
 0.81176471 0.83235294 0.81818182 0.81818182]

mean value: 0.8230481283422459

key: test_fscore
value: [0.81081081 0.72727273 0.76470588 0.76470588 0.77777778 0.64705882
 0.91428571 0.81081081 0.85714286 0.81081081]

mean value: 0.7885382097146804

key: train_fscore
value: [0.82389937 0.8125     0.82758621 0.80124224 0.81132075 0.79605263
 0.79746835 0.82018927 0.80503145 0.80503145]

mean value: 0.8100321722246597

key: test_precision
value: [0.83333333 0.85714286 0.86666667 0.86666667 0.82352941 0.73333333
 1.         0.83333333 0.88235294 0.83333333]

mean value: 0.85296918767507

key: train_precision
value: [0.88513514 0.86666667 0.88590604 0.84868421 0.87162162 0.90298507
 0.8630137  0.88435374 0.8707483  0.86486486]

mean value: 0.874397935315639

key: test_recall
value: [0.78947368 0.63157895 0.68421053 0.68421053 0.73684211 0.57894737
 0.84210526 0.78947368 0.83333333 0.78947368]

mean value: 0.7359649122807017

key: train_recall
value: [0.77058824 0.76470588 0.77647059 0.75882353 0.75882353 0.71176471
 0.74117647 0.76470588 0.74853801 0.75294118]

mean value: 0.7548538011695907

key: test_roc_auc
value: [0.81578947 0.76315789 0.78947368 0.78947368 0.78947368 0.68421053
 0.92105263 0.81578947 0.86403509 0.81140351]

mean value: 0.8043859649122806

key: train_roc_auc
value: [0.83529412 0.82352941 0.83823529 0.81176471 0.82352941 0.81764706
 0.81176471 0.83235294 0.81838665 0.81799106]

mean value: 0.8230495356037152

key: test_jcc
value: [0.68181818 0.57142857 0.61904762 0.61904762 0.63636364 0.47826087
 0.84210526 0.68181818 0.75       0.68181818]

mean value: 0.6561708124065103

key: train_jcc
value: [0.70053476 0.68421053 0.70588235 0.66839378 0.68253968 0.66120219
 0.66315789 0.69518717 0.67368421 0.67368421]

mean value: 0.6808476770895582

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01116753 0.01096296 0.01096344 0.01117587 0.01004553 0.01000977
 0.01028728 0.01055145 0.0107522  0.01096797]

mean value: 0.010688400268554688

key: score_time
value: [0.0098331  0.00980783 0.00969982 0.01029825 0.00925255 0.00957799
 0.00884676 0.008991   0.009202   0.00951934]

mean value: 0.009502863883972168

key: test_mcc
value: [0.52704628 0.47633051 0.63245553 0.63245553 0.73786479 0.52704628
 0.9486833  0.58218174 0.89736456 0.35558302]

mean value: 0.6317011536883512

key: train_mcc
value: [0.74138173 0.74717517 0.73632672 0.72354193 0.71236887 0.72354193
 0.71944168 0.72961376 0.72462581 0.75953765]

mean value: 0.7317555250285934

key: test_accuracy
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
 0.97368421 0.78947368 0.94594595 0.67567568]

mean value: 0.8147937411095306

key: train_accuracy
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
 0.85882353 0.86470588 0.86217009 0.8797654 ]

mean value: 0.865664136622391

key: test_fscore
value: [0.75675676 0.75       0.81081081 0.82051282 0.86486486 0.75675676
 0.97297297 0.8        0.94736842 0.71428571]

mean value: 0.8194329118013328

key: train_fscore
value: [0.87209302 0.87463557 0.87106017 0.86217009 0.85878963 0.86217009
 0.86363636 0.86627907 0.86455331 0.87905605]

mean value: 0.8674443359724497

key: test_precision
value: [0.77777778 0.71428571 0.83333333 0.8        0.88888889 0.77777778
 1.         0.76190476 0.9        0.65217391]

mean value: 0.8106142167011732

key: train_precision
value: [0.86206897 0.86705202 0.84916201 0.85964912 0.84180791 0.85964912
 0.83516484 0.85632184 0.85227273 0.8816568 ]

mean value: 0.8564805361282117

key: test_recall
value: [0.73684211 0.78947368 0.78947368 0.84210526 0.84210526 0.73684211
 0.94736842 0.84210526 1.         0.78947368]

mean value: 0.831578947368421

key: train_recall
value: [0.88235294 0.88235294 0.89411765 0.86470588 0.87647059 0.86470588
 0.89411765 0.87647059 0.87719298 0.87647059]

mean value: 0.8788957688338493

key: test_roc_auc
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
 0.97368421 0.78947368 0.94736842 0.67251462]

mean value: 0.8146198830409357

key: train_roc_auc
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
 0.85882353 0.86470588 0.8621259  0.87975576]

mean value: 0.8656587547299621

key: test_jcc
value: [0.60869565 0.6        0.68181818 0.69565217 0.76190476 0.60869565
 0.94736842 0.66666667 0.9        0.55555556]

mean value: 0.7026357065258667

key: train_jcc
value: [0.77319588 0.77720207 0.7715736  0.75773196 0.75252525 0.75773196
 0.76       0.76410256 0.76142132 0.78421053]

mean value: 0.7659695133154767

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01076102 0.01020241 0.01022601 0.01018453 0.01024795 0.01016283
 0.01031971 0.0109694  0.01042318 0.01042962]

mean value: 0.01039266586303711

key: score_time
value: [0.01769781 0.0165956  0.01196361 0.01205826 0.01194811 0.01616716
 0.01586628 0.01836443 0.01750278 0.01775217]

mean value: 0.015591621398925781

key: test_mcc
value: [0.26462806 0.05383819 0.58218174 0.53300179 0.42163702 0.53300179
 0.59222009 0.47368421 0.62280702 0.40469382]

mean value: 0.44816937343834784

key: train_mcc
value: [0.65385813 0.68277833 0.68235294 0.64127633 0.65322377 0.65304287
 0.62968418 0.67100629 0.66019502 0.69541138]

mean value: 0.6622829245583861

key: test_accuracy
value: [0.63157895 0.52631579 0.78947368 0.76315789 0.71052632 0.76315789
 0.78947368 0.73684211 0.81081081 0.7027027 ]

mean value: 0.7224039829302987

key: train_accuracy
value: [0.82647059 0.84117647 0.84117647 0.82058824 0.82647059 0.82647059
 0.81470588 0.83529412 0.82991202 0.84750733]

mean value: 0.830977229601518

key: test_fscore
value: [0.65       0.47058824 0.77777778 0.74285714 0.71794872 0.74285714
 0.76470588 0.73684211 0.81081081 0.71794872]

mean value: 0.7132336533110527

key: train_fscore
value: [0.82175227 0.84393064 0.84117647 0.8189911  0.82898551 0.82492582
 0.8173913  0.83233533 0.83333333 0.84431138]

mean value: 0.8307133137748363

key: test_precision
value: [0.61904762 0.53333333 0.82352941 0.8125     0.7        0.8125
 0.86666667 0.73684211 0.78947368 0.7       ]

mean value: 0.7393892820286009

key: train_precision
value: [0.8447205  0.82954545 0.84117647 0.82634731 0.81714286 0.83233533
 0.80571429 0.84756098 0.81920904 0.8597561 ]

mean value: 0.8323508312334535

key: test_recall
value: [0.68421053 0.42105263 0.73684211 0.68421053 0.73684211 0.68421053
 0.68421053 0.73684211 0.83333333 0.73684211]

mean value: 0.6938596491228071

key: train_recall
value: [0.8        0.85882353 0.84117647 0.81176471 0.84117647 0.81764706
 0.82941176 0.81764706 0.84795322 0.82941176]

mean value: 0.8295012039903681

key: test_roc_auc
value: [0.63157895 0.52631579 0.78947368 0.76315789 0.71052632 0.76315789
 0.78947368 0.73684211 0.81140351 0.70175439]

mean value: 0.7223684210526315

key: train_roc_auc
value: [0.82647059 0.84117647 0.84117647 0.82058824 0.82647059 0.82647059
 0.81470588 0.83529412 0.82985896 0.84745442]

mean value: 0.8309666322669419

key: test_jcc
value: [0.48148148 0.30769231 0.63636364 0.59090909 0.56       0.59090909
 0.61904762 0.58333333 0.68181818 0.56      ]

mean value: 0.5611554741554742

key: train_jcc
value: [0.6974359  0.73       0.72588832 0.69346734 0.70792079 0.7020202
 0.69117647 0.71282051 0.71428571 0.73056995]

mean value: 0.7105585198972811

MCC on Blind test: 0.5

Accuracy on Blind test: 0.75

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01894116 0.01715732 0.01883602 0.01850367 0.01840329 0.01838541
 0.01891804 0.01576376 0.01587749 0.01550865]

mean value: 0.017629480361938475

key: score_time
value: [0.01178336 0.01053977 0.01157427 0.01156521 0.01155615 0.01190138
 0.01167703 0.01049495 0.01038718 0.01023459]

mean value: 0.011171388626098632

key: test_mcc
value: [0.78947368 0.79388419 0.68421053 0.84327404 0.79388419 0.68421053
 0.9486833  0.84327404 0.89736456 0.40469382]

mean value: 0.768295287708455

key: train_mcc
value: [0.78828985 0.79413139 0.8        0.78828985 0.80600787 0.78828985
 0.79424133 0.81227078 0.78299907 0.82992191]

mean value: 0.798444188462922

key: test_accuracy
value: [0.89473684 0.89473684 0.84210526 0.92105263 0.89473684 0.84210526
 0.97368421 0.92105263 0.94594595 0.7027027 ]

mean value: 0.8832859174964438

key: train_accuracy
value: [0.89411765 0.89705882 0.9        0.89411765 0.90294118 0.89411765
 0.89705882 0.90588235 0.8914956  0.91495601]

mean value: 0.8991745730550285

key: test_fscore
value: [0.89473684 0.9        0.84210526 0.91891892 0.88888889 0.84210526
 0.97297297 0.91891892 0.94736842 0.71794872]

mean value: 0.8843964207122101

key: train_fscore
value: [0.89473684 0.8973607  0.9        0.89473684 0.90379009 0.89349112
 0.89795918 0.90751445 0.89212828 0.91445428]

mean value: 0.8996171791456794

key: test_precision
value: [0.89473684 0.85714286 0.84210526 0.94444444 0.94117647 0.84210526
 1.         0.94444444 0.9        0.7       ]

mean value: 0.8866155585041033

key: train_precision
value: [0.88953488 0.89473684 0.9        0.88953488 0.89595376 0.89880952
 0.89017341 0.89204545 0.88953488 0.91715976]

mean value: 0.8957483402566699

key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.89473684 0.84210526 0.84210526
 0.94736842 0.89473684 1.         0.73684211]

mean value: 0.8842105263157894

key: train_recall
value: [0.9        0.9        0.9        0.9        0.91176471 0.88823529
 0.90588235 0.92352941 0.89473684 0.91176471]

mean value: 0.9035913312693499

key: test_roc_auc
value: [0.89473684 0.89473684 0.84210526 0.92105263 0.89473684 0.84210526
 0.97368421 0.92105263 0.94736842 0.70175439]

mean value: 0.8833333333333333

key: train_roc_auc
value: [0.89411765 0.89705882 0.9        0.89411765 0.90294118 0.89411765
 0.89705882 0.90588235 0.89148607 0.91494668]

mean value: 0.899172686618507

key: test_jcc
value: [0.80952381 0.81818182 0.72727273 0.85       0.8        0.72727273
 0.94736842 0.85       0.9        0.56      ]

mean value: 0.7989619503303714

key: train_jcc
value: [0.80952381 0.81382979 0.81818182 0.80952381 0.82446809 0.80748663
 0.81481481 0.83068783 0.80526316 0.8423913 ]

mean value: 0.8176171048331113

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.88404822 2.01980352 1.86825657 1.99386954 1.89242554 2.02484608
 1.87313199 1.85225391 2.03934073 1.26169872]

mean value: 1.870967483520508

key: score_time
value: [0.05082941 0.02140141 0.01251268 0.01486778 0.02660775 0.01820397
 0.01507092 0.01291871 0.01262975 0.01267314]

mean value: 0.019771552085876463

key: test_mcc
value: [0.78947368 0.69989647 0.73786479 0.9486833  0.9486833  0.78947368
 0.89973541 0.79388419 0.89736456 0.56725146]

mean value: 0.8072310845862681

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89473684 0.84210526 0.86842105 0.97368421 0.97368421 0.89473684
 0.94736842 0.89473684 0.94594595 0.78378378]

mean value: 0.9019203413940255

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.89473684 0.85714286 0.87179487 0.97435897 0.97435897 0.89473684
 0.94444444 0.88888889 0.94736842 0.78947368]

mean value: 0.9037304800462695

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.89473684 0.7826087  0.85       0.95       0.95       0.89473684
 1.         0.94117647 0.9        0.78947368]

mean value: 0.8952732534661462

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.89473684 0.94736842 0.89473684 1.         1.         0.89473684
 0.89473684 0.84210526 1.         0.78947368]

mean value: 0.9157894736842105

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.89473684 0.84210526 0.86842105 0.97368421 0.97368421 0.89473684
 0.94736842 0.89473684 0.94736842 0.78362573]

mean value: 0.902046783625731

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.80952381 0.75       0.77272727 0.95       0.95       0.80952381
 0.89473684 0.8        0.9        0.65217391]

mean value: 0.8288685646923634

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.74

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02012491 0.01943636 0.01589656 0.01583838 0.01620412 0.01516843
 0.01706886 0.01599669 0.01592517 0.01600361]

mean value: 0.01676630973815918

key: score_time
value: [0.01256251 0.01115823 0.00905514 0.00882006 0.00896192 0.00889635
 0.00920773 0.00900817 0.0087254  0.00878215]

mean value: 0.009517765045166016

key: test_mcc
value: [0.89973541 0.89973541 0.9486833  0.84327404 0.89473684 0.89473684
 1.         0.84327404 0.94736842 0.83871328]

mean value: 0.9010257594256045

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.94736842 0.97368421 0.92105263 0.94736842 0.94736842
 1.         0.92105263 0.97297297 0.91891892]

mean value: 0.9497155049786629

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94444444 0.95       0.97435897 0.92307692 0.94736842 0.94736842
 1.         0.91891892 0.97297297 0.92307692]

mean value: 0.950158599895442

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9047619  0.95       0.9        0.94736842 0.94736842
 1.         0.94444444 0.94736842 0.9       ]

mean value: 0.9441311612364244

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.89473684 1.         1.         0.94736842 0.94736842 0.94736842
 1.         0.89473684 1.         0.94736842]

mean value: 0.9578947368421052

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.94736842 0.97368421 0.92105263 0.94736842 0.94736842
 1.         0.92105263 0.97368421 0.91812865]

mean value: 0.9497076023391813

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.89473684 0.9047619  0.95       0.85714286 0.9        0.9
 1.         0.85       0.94736842 0.85714286]

mean value: 0.9061152882205513

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.87

Accuracy on Blind test: 0.93

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10629559 0.10595846 0.10605121 0.10595107 0.10800147 0.10813618
 0.11086369 0.11145663 0.11341166 0.11815166]

mean value: 0.10942776203155517

key: score_time
value: [0.01755023 0.01762581 0.01783776 0.01773572 0.01776767 0.01922417
 0.01921582 0.01775551 0.01930189 0.0180676 ]

mean value: 0.01820821762084961

key: test_mcc
value: [0.78947368 0.63245553 0.73786479 0.89973541 0.9486833  0.73786479
 0.89973541 0.74620251 0.84959079 0.62807634]

mean value: 0.7869682546638045

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89473684 0.81578947 0.86842105 0.94736842 0.97368421 0.86842105
 0.94736842 0.86842105 0.91891892 0.81081081]

mean value: 0.8913940256045519

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.89473684 0.81081081 0.87179487 0.95       0.97435897 0.87179487
 0.94444444 0.85714286 0.92307692 0.82926829]

mean value: 0.8927428888211943

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.89473684 0.83333333 0.85       0.9047619  0.95       0.85
 1.         0.9375     0.85714286 0.77272727]

mean value: 0.8850202210070631

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.89473684 0.78947368 0.89473684 1.         1.         0.89473684
 0.89473684 0.78947368 1.         0.89473684]

mean value: 0.9052631578947369

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.89473684 0.81578947 0.86842105 0.94736842 0.97368421 0.86842105
 0.94736842 0.86842105 0.92105263 0.80847953]

mean value: 0.8913742690058479

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.80952381 0.68181818 0.77272727 0.9047619  0.95       0.77272727
 0.89473684 0.75       0.85714286 0.70833333]

mean value: 0.8101771474139895

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.78

Accuracy on Blind test: 0.89

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00998735 0.01076865 0.01041269 0.01028037 0.00994205 0.01035309
 0.01037288 0.01094675 0.01107693 0.01506305]

mean value: 0.010920381546020508

key: score_time
value: [0.00932503 0.00965333 0.00984311 0.00884271 0.00942039 0.00911498
 0.01005459 0.00995064 0.00970674 0.011832  ]

mean value: 0.009774351119995117

key: test_mcc
value: [0.47633051 0.52704628 0.73786479 0.68421053 0.42163702 0.36842105
 0.52704628 0.68421053 0.40469382 0.46019501]

mean value: 0.529165581278633

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 0.76315789 0.86842105 0.84210526 0.71052632 0.68421053
 0.76315789 0.84210526 0.7027027  0.72972973]

mean value: 0.7642958748221906

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.75675676 0.87179487 0.84210526 0.7027027  0.68421053
 0.76923077 0.84210526 0.68571429 0.75      ]

mean value: 0.7654620438830966

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.77777778 0.85       0.84210526 0.72222222 0.68421053
 0.75       0.84210526 0.70588235 0.71428571]

mean value: 0.7602874834144184

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.78947368 0.73684211 0.89473684 0.84210526 0.68421053 0.68421053
 0.78947368 0.84210526 0.66666667 0.78947368]

mean value: 0.7719298245614035

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73684211 0.76315789 0.86842105 0.84210526 0.71052632 0.68421053
 0.76315789 0.84210526 0.70175439 0.72807018]

mean value: 0.7640350877192983

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.60869565 0.77272727 0.72727273 0.54166667 0.52
 0.625      0.72727273 0.52173913 0.6       ]

mean value: 0.624437417654809

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.52

Accuracy on Blind test: 0.76

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.60144925 1.58419991 1.56108975 1.55858779 1.55152655 1.62324905
 1.61374426 1.59573984 1.60749698 1.50386572]

mean value: 1.5800949096679688

key: score_time
value: [0.09293389 0.09812546 0.09960723 0.09873724 0.09930444 0.09915233
 0.10042262 0.09998393 0.10088515 0.09459758]

mean value: 0.09837498664855956

key: test_mcc
value: [0.89473684 0.79388419 0.9486833  0.9486833  0.89973541 0.89473684
 1.         0.84327404 0.94736842 0.78362573]

mean value: 0.8954728071949787

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.94736842
 1.         0.92105263 0.97297297 0.89189189]

mean value: 0.9470128022759602

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.88888889 0.97435897 0.97435897 0.95       0.94736842
 1.         0.91891892 0.97297297 0.89473684]

mean value: 0.9468972413709256

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.94117647 0.95       0.95       0.9047619  0.94736842
 1.         0.94444444 0.94736842 0.89473684]

mean value: 0.9427224925057742

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94736842 0.84210526 1.         1.         1.         0.94736842
 1.         0.89473684 1.         0.89473684]

mean value: 0.9526315789473684

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.94736842
 1.         0.92105263 0.97368421 0.89181287]

mean value: 0.9470760233918128

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.8        0.95       0.95       0.9047619  0.9
 1.         0.85       0.94736842 0.80952381]

mean value: 0.9011654135338346

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.92

Accuracy on Blind test: 0.96

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.94246864 0.95814466 0.88224459 0.89182115 0.94442129 1.0197804
 0.91957068 0.97436619 0.95079732 0.92337275]

mean value: 0.940698766708374

key: score_time
value: [0.19855642 0.15321207 0.25342155 0.13341498 0.17170119 0.26337838
 0.17836618 0.23186874 0.27052569 0.24460649]

mean value: 0.20990517139434814

key: test_mcc
value: [0.89473684 0.79388419 0.9486833  0.9486833  0.89473684 0.84327404
 0.9486833  0.84327404 1.         0.73020842]

mean value: 0.8846164268265851

key: train_mcc
value: [0.96470588 0.95884012 0.96477265 0.95884012 0.95300713 0.96470588
 0.95884012 0.95294118 0.95314596 0.97069143]

mean value: 0.9600490476456546

key: test_accuracy
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.92105263
 0.97368421 0.92105263 1.         0.86486486]

mean value: 0.9417496443812233

key: train_accuracy
value: [0.98235294 0.97941176 0.98235294 0.97941176 0.97647059 0.98235294
 0.97941176 0.97647059 0.97653959 0.98533724]

mean value: 0.9800112126962222

key: test_fscore
value: [0.94736842 0.88888889 0.97435897 0.97435897 0.94736842 0.92307692
 0.97297297 0.91891892 1.         0.87179487]

mean value: 0.9419107366475787

key: train_fscore
value: [0.98235294 0.97935103 0.98224852 0.97935103 0.97633136 0.98235294
 0.97935103 0.97647059 0.97647059 0.98533724]

mean value: 0.9799617281227226

key: test_precision
value: [0.94736842 0.94117647 0.95       0.95       0.94736842 0.9
 1.         0.94444444 1.         0.85      ]

mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
0.9430357757137943

key: train_precision
value: [0.98235294 0.98224852 0.98809524 0.98224852 0.98214286 0.98235294
 0.98224852 0.97647059 0.98224852 0.98245614]

mean value: 0.9822864789017445

key: test_recall
value: [0.94736842 0.84210526 1.         1.         0.94736842 0.94736842
 0.94736842 0.89473684 1.         0.89473684]

mean value: 0.9421052631578947

key: train_recall
value: [0.98235294 0.97647059 0.97647059 0.97647059 0.97058824 0.98235294
 0.97647059 0.97647059 0.97076023 0.98823529]

mean value: 0.9776642586859305

key: test_roc_auc
value: [0.94736842 0.89473684 0.97368421 0.97368421 0.94736842 0.92105263
 0.97368421 0.92105263 1.         0.86403509]

mean value: 0.9416666666666667

key: train_roc_auc
value: [0.98235294 0.97941176 0.98235294 0.97941176 0.97647059 0.98235294
 0.97941176 0.97647059 0.97655659 0.98534572]

mean value: 0.9800137598899209

key: test_jcc
value: [0.9        0.8        0.95       0.95       0.9        0.85714286
 0.94736842 0.85       1.         0.77272727]

mean value: 0.8927238550922761

key: train_jcc
value: [0.96531792 0.95953757 0.96511628 0.95953757 0.95375723 0.96531792
 0.95953757 0.95402299 0.95402299 0.97109827]

mean value: 0.9607266302324036

MCC on Blind test: 0.85

Accuracy on Blind test: 0.92

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01481676 0.01259422 0.01158786 0.01182032 0.0115788  0.01470709
 0.01181889 0.01124763 0.01149035 0.01684523]

mean value: 0.012850713729858399

key: score_time
value: [0.01577783 0.01042986 0.01050568 0.01003408 0.01580787 0.01024508
 0.01042175 0.01017332 0.01543117 0.01240611]

mean value: 0.01212327480316162

key: test_mcc
value: [0.52704628 0.47633051 0.63245553 0.63245553 0.73786479 0.52704628
 0.9486833  0.58218174 0.89736456 0.35558302]

mean value: 0.6317011536883512

key: train_mcc
value: [0.74138173 0.74717517 0.73632672 0.72354193 0.71236887 0.72354193
 0.71944168 0.72961376 0.72462581 0.75953765]

mean value: 0.7317555250285934

key: test_accuracy
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
 0.97368421 0.78947368 0.94594595 0.67567568]

mean value: 0.8147937411095306

key: train_accuracy
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
 0.85882353 0.86470588 0.86217009 0.8797654 ]

mean value: 0.865664136622391

key: test_fscore
value: [0.75675676 0.75       0.81081081 0.82051282 0.86486486 0.75675676
 0.97297297 0.8        0.94736842 0.71428571]

mean value: 0.8194329118013328

key: train_fscore
value: [0.87209302 0.87463557 0.87106017 0.86217009 0.85878963 0.86217009
 0.86363636 0.86627907 0.86455331 0.87905605]

mean value: 0.8674443359724497

key: test_precision
value: [0.77777778 0.71428571 0.83333333 0.8        0.88888889 0.77777778
 1.         0.76190476 0.9        0.65217391]

mean value: 0.8106142167011732

key: train_precision
value: [0.86206897 0.86705202 0.84916201 0.85964912 0.84180791 0.85964912
 0.83516484 0.85632184 0.85227273 0.8816568 ]

mean value: 0.8564805361282117

key: test_recall
value: [0.73684211 0.78947368 0.78947368 0.84210526 0.84210526 0.73684211
 0.94736842 0.84210526 1.         0.78947368]

mean value: 0.831578947368421

key: train_recall
value: [0.88235294 0.88235294 0.89411765 0.86470588 0.87647059 0.86470588
 0.89411765 0.87647059 0.87719298 0.87647059]

mean value: 0.8788957688338493

key: test_roc_auc
value: [0.76315789 0.73684211 0.81578947 0.81578947 0.86842105 0.76315789
 0.97368421 0.78947368 0.94736842 0.67251462]

mean value: 0.8146198830409357

key: train_roc_auc
value: [0.87058824 0.87352941 0.86764706 0.86176471 0.85588235 0.86176471
 0.85882353 0.86470588 0.8621259  0.87975576]

mean value: 0.8656587547299621

key: test_jcc
value: [0.60869565 0.6        0.68181818 0.69565217 0.76190476 0.60869565
 0.94736842 0.66666667 0.9        0.55555556]

mean value: 0.7026357065258667

key: train_jcc
value: [0.77319588 0.77720207 0.7715736  0.75773196 0.75252525 0.75773196
 0.76       0.76410256 0.76142132 0.78421053]

mean value: 0.7659695133154767

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.50253129 0.13468575 0.05374956 0.13997626 0.35016894 0.12072372
 0.05626488 0.16596508 0.05752921 0.05376387]

mean value: 0.16353585720062255

key: score_time
value: [0.01153541 0.01105118 0.01070404 0.01191998 0.01194215 0.01204967
 0.01075387 0.01131034 0.01075554 0.01144719]

mean value: 0.011346936225891113

key: test_mcc
value: [0.9486833  0.89973541 0.89473684 0.89973541 1.         0.9486833
 0.9486833  0.9486833  1.         0.78362573]

mean value: 0.9272566586986345

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.94736842 0.94736842 0.94736842 1.         0.97368421
 0.97368421 0.97368421 1.         0.89189189]

mean value: 0.962873399715505

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.95       0.94736842 0.95       1.         0.97297297
 0.97297297 0.97435897 1.         0.89473684]

mean value: 0.9635383156435788

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9047619  0.94736842 0.9047619  1.         1.
 1.         0.95       1.         0.89473684]

mean value: 0.9601629072681704

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94736842 1.         0.94736842 1.         1.         0.94736842
 0.94736842 1.         1.         0.89473684]

mean value: 0.968421052631579

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.94736842 0.94736842 1.         0.97368421
 0.97368421 0.97368421 1.         0.89181287]

mean value: 0.9628654970760233

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.9047619  0.9        0.9047619  1.         0.94736842
 0.94736842 0.95       1.         0.80952381]

mean value: 0.9311152882205513

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.05968142 0.05620432 0.04024172 0.05535245 0.06633306 0.06649137
 0.05653095 0.0731082  0.0600152  0.07051778]

mean value: 0.06044764518737793

key: score_time
value: [0.02383661 0.01224399 0.01207924 0.01229954 0.02160764 0.03937793
 0.02177739 0.01230597 0.02152944 0.01230001]

mean value: 0.018935775756835936

key: test_mcc
value: [0.78947368 0.89473684 0.58218174 0.79388419 0.79388419 0.63245553
 0.89973541 0.73786479 0.78764146 0.62280702]

mean value: 0.753466484462356

key: train_mcc
value: [0.94720632 0.93543979 0.92941176 0.95294118 0.94143711 0.95897286
 0.94707521 0.94117647 0.94723082 0.93550052]

mean value: 0.9436392042730108

key: test_accuracy
value: [0.89473684 0.94736842 0.78947368 0.89473684 0.89473684 0.81578947
 0.94736842 0.86842105 0.89189189 0.81081081]

mean value: 0.8755334281650071

key: train_accuracy
value: [0.97352941 0.96764706 0.96470588 0.97647059 0.97058824 0.97941176
 0.97352941 0.97058824 0.97360704 0.96774194]

mean value: 0.9717819561842332

key: test_fscore
value: [0.89473684 0.94736842 0.77777778 0.9        0.9        0.82051282
 0.94444444 0.87179487 0.88235294 0.81081081]

mean value: 0.8749798929675091

key: train_fscore
value: [0.97329377 0.96735905 0.96470588 0.97647059 0.9702381  0.97922849
 0.97345133 0.97058824 0.97360704 0.96774194]

mean value: 0.9716684407799097

key: test_precision
value: [0.89473684 0.94736842 0.82352941 0.85714286 0.85714286 0.8
 1.         0.85       0.9375     0.83333333]

mean value: 0.8800753722541648

key: train_precision
value: [0.98203593 0.9760479  0.96470588 0.97647059 0.98192771 0.98802395
 0.97633136 0.97058824 0.97647059 0.96491228]

mean value: 0.9757514431040658

key: test_recall
value: [0.89473684 0.94736842 0.73684211 0.94736842 0.94736842 0.84210526
 0.89473684 0.89473684 0.83333333 0.78947368]

mean value: 0.8728070175438596

key: train_recall
value: [0.96470588 0.95882353 0.96470588 0.97647059 0.95882353 0.97058824
 0.97058824 0.97058824 0.97076023 0.97058824]

mean value: 0.9676642586859305

key: test_roc_auc
value: [0.89473684 0.94736842 0.78947368 0.89473684 0.89473684 0.81578947
 0.94736842 0.86842105 0.89035088 0.81140351]

mean value: 0.875438596491228

key: train_roc_auc
value: [0.97352941 0.96764706 0.96470588 0.97647059 0.97058824 0.97941176
 0.97352941 0.97058824 0.97361541 0.96775026]

mean value: 0.9717836257309942

key: test_jcc
value: [0.80952381 0.9        0.63636364 0.81818182 0.81818182 0.69565217
 0.89473684 0.77272727 0.78947368 0.68181818]

mean value: 0.781665923702537

key: train_jcc
value: [0.94797688 0.93678161 0.93181818 0.95402299 0.94219653 0.95930233
 0.94827586 0.94285714 0.94857143 0.9375    ]

mean value: 0.9449302949002888

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.03640914 0.01111817 0.0107789  0.0103848  0.01065993 0.01029205
 0.0105691  0.01045561 0.00946641 0.00948262]

mean value: 0.012961673736572265

key: score_time
value: [0.01239967 0.00975752 0.0088954  0.00995421 0.00950813 0.01114082
 0.00941396 0.00899124 0.00900888 0.00877619]

mean value: 0.009784603118896484

key: test_mcc
value: [0.79388419 0.59222009 0.78947368 0.68421053 0.69989647 0.37047929
 0.89473684 0.52704628 0.89736456 0.29618896]

mean value: 0.6545500886423572

key: train_mcc
value: [0.72354193 0.70607783 0.73540864 0.64777648 0.62994603 0.72414357
 0.67676337 0.67100629 0.66600142 0.70748143]

mean value: 0.6888146983674961

key: test_accuracy
value: [0.89473684 0.78947368 0.89473684 0.84210526 0.84210526 0.68421053
 0.94736842 0.76315789 0.94594595 0.64864865]

mean value: 0.82524893314367

key: train_accuracy
value: [0.86176471 0.85294118 0.86764706 0.82352941 0.81470588 0.86176471
 0.83823529 0.83529412 0.83284457 0.85337243]

mean value: 0.8442099361738831

key: test_fscore
value: [0.88888889 0.76470588 0.89473684 0.84210526 0.82352941 0.66666667
 0.94736842 0.76923077 0.94736842 0.66666667]

mean value: 0.821126723293906

key: train_fscore
value: [0.86135693 0.85119048 0.86646884 0.81927711 0.81081081 0.85885886
 0.8358209  0.83233533 0.83086053 0.84939759]

mean value: 0.8416377378527024

key: test_precision
value: [0.94117647 0.86666667 0.89473684 0.84210526 0.93333333 0.70588235
 0.94736842 0.75       0.9        0.65      ]

mean value: 0.8431269349845201

key: train_precision
value: [0.86390533 0.86144578 0.8742515  0.83950617 0.82822086 0.87730061
 0.84848485 0.84756098 0.84337349 0.87037037]

mean value: 0.8554419939255328

key: test_recall
value: [0.84210526 0.68421053 0.89473684 0.84210526 0.73684211 0.63157895
 0.94736842 0.78947368 1.         0.68421053]

mean value: 0.8052631578947368

key: train_recall
value: [0.85882353 0.84117647 0.85882353 0.8        0.79411765 0.84117647
 0.82352941 0.81764706 0.81871345 0.82941176]

mean value: 0.8283419332645339

key: test_roc_auc
value: [0.89473684 0.78947368 0.89473684 0.84210526 0.84210526 0.68421053
 0.94736842 0.76315789 0.94736842 0.64766082]

mean value: 0.8252923976608187

key: train_roc_auc
value: [0.86176471 0.85294118 0.86764706 0.82352941 0.81470588 0.86176471
 0.83823529 0.83529412 0.83288614 0.85330237]

mean value: 0.8442070863433093

key: test_jcc
value: [0.8        0.61904762 0.80952381 0.72727273 0.7        0.5
 0.9        0.625      0.9        0.5       ]

mean value: 0.7080844155844156

key: train_jcc
value: [0.75647668 0.74093264 0.76439791 0.69387755 0.68181818 0.75263158
 0.71794872 0.71282051 0.7106599  0.7382199 ]

mean value: 0.7269783568504338

MCC on Blind test: 0.77

Accuracy on Blind test: 0.89

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01160216 0.01595926 0.01850629 0.01905012 0.01691532 0.01988149
 0.02177978 0.01703143 0.01949835 0.01658297]

mean value: 0.017680716514587403

key: score_time
value: [0.0089817  0.01127219 0.01129818 0.01176715 0.01182032 0.01188231
 0.01186323 0.01187611 0.01192379 0.01185489]

mean value: 0.011453986167907715

key: test_mcc
value: [0.61017022 0.84327404 0.68803296 0.80757285 0.80757285 0.74620251
 0.9486833  0.54554473 0.89736456 0.57857577]

mean value: 0.7472993786374361

key: train_mcc
value: [0.78047467 0.86724532 0.88861973 0.87273022 0.81456113 0.81962489
 0.84766884 0.72101498 0.90106396 0.85672926]

mean value: 0.8369732992272605

key: test_accuracy
value: [0.78947368 0.92105263 0.84210526 0.89473684 0.89473684 0.86842105
 0.97368421 0.76315789 0.94594595 0.78378378]

mean value: 0.8677098150782361

key: train_accuracy
value: [0.88235294 0.93235294 0.94411765 0.93235294 0.9        0.90294118
 0.92058824 0.84705882 0.95014663 0.92668622]

mean value: 0.9138597550457133

key: test_fscore
value: [0.81818182 0.92307692 0.83333333 0.88235294 0.88235294 0.87804878
 0.97435897 0.79069767 0.94736842 0.80952381]

mean value: 0.8739295616786841

key: train_fscore
value: [0.89304813 0.92966361 0.94328358 0.92744479 0.88961039 0.91105121
 0.92520776 0.86528497 0.94925373 0.92957746]

mean value: 0.9163425642953533

key: test_precision
value: [0.72       0.9        0.88235294 1.         1.         0.81818182
 0.95       0.70833333 0.9        0.73913043]

mean value: 0.8617998527474231

key: train_precision
value: [0.81862745 0.96815287 0.95757576 1.         0.99275362 0.84079602
 0.87434555 0.77314815 0.9695122  0.89189189]

mean value: 0.9086803502787302

key: test_recall
value: [0.94736842 0.94736842 0.78947368 0.78947368 0.78947368 0.94736842
 1.         0.89473684 1.         0.89473684]

mean value: 0.9

key: train_recall
value: [0.98235294 0.89411765 0.92941176 0.86470588 0.80588235 0.99411765
 0.98235294 0.98235294 0.92982456 0.97058824]

mean value: 0.9335706914344686

key: test_roc_auc
value: [0.78947368 0.92105263 0.84210526 0.89473684 0.89473684 0.86842105
 0.97368421 0.76315789 0.94736842 0.78070175]

mean value: 0.8675438596491228

key: train_roc_auc
value: [0.88235294 0.93235294 0.94411765 0.93235294 0.9        0.90294118
 0.92058824 0.84705882 0.9502064  0.92681459]

mean value: 0.9138785689714483

key: test_jcc
value: [0.69230769 0.85714286 0.71428571 0.78947368 0.78947368 0.7826087
 0.95       0.65384615 0.9        0.68      ]

mean value: 0.7809138481655644

key: train_jcc
value: [0.80676329 0.86857143 0.89265537 0.86470588 0.80116959 0.83663366
 0.86082474 0.76255708 0.90340909 0.86842105]

mean value: 0.8465711180624056

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01726389 0.01823545 0.01617908 0.01863098 0.01907182 0.01757312
 0.01636481 0.01706433 0.01851106 0.01773095]

mean value: 0.017662549018859865

key: score_time
value: [0.01184845 0.01182127 0.01185322 0.01189089 0.01183534 0.01188326
 0.01175451 0.01184678 0.01183844 0.01181006]

mean value: 0.011838221549987793

key: test_mcc
value: [0.74620251 0.63245553 0.68803296 0.89973541 0.85280287 0.68803296
 0.76376262 0.79388419 0.7163504  0.57857577]

mean value: 0.7359835206381676

key: train_mcc
value: [0.88333157 0.90352405 0.86508013 0.90014017 0.82470774 0.8452381
 0.72433672 0.83493231 0.77515848 0.87706192]

mean value: 0.8433511179735004

key: test_accuracy
value: [0.86842105 0.81578947 0.84210526 0.94736842 0.92105263 0.84210526
 0.86842105 0.89473684 0.83783784 0.78378378]

mean value: 0.8621621621621621

key: train_accuracy
value: [0.94117647 0.95       0.93235294 0.95       0.90588235 0.91764706
 0.84411765 0.91176471 0.87683284 0.93548387]

mean value: 0.916525789201311

key: test_fscore
value: [0.85714286 0.82051282 0.85       0.95       0.91428571 0.83333333
 0.84848485 0.88888889 0.85714286 0.80952381]

mean value: 0.8629315129315129

key: train_fscore
value: [0.93975904 0.94769231 0.93333333 0.95043732 0.89677419 0.91082803
 0.81533101 0.90384615 0.89005236 0.93888889]

mean value: 0.9126942623189517

key: test_precision
value: [0.9375     0.8        0.80952381 0.9047619  1.         0.88235294
 1.         0.94117647 0.75       0.73913043]

mean value: 0.8764445560833029

key: train_precision
value: [0.96296296 0.99354839 0.92       0.94219653 0.99285714 0.99305556
 1.         0.99295775 0.8056872  0.88947368]

mean value: 0.9492739214745212

key: test_recall
value: [0.78947368 0.84210526 0.89473684 1.         0.84210526 0.78947368
 0.73684211 0.84210526 1.         0.89473684]

mean value: 0.8631578947368421

key: train_recall
value: [0.91764706 0.90588235 0.94705882 0.95882353 0.81764706 0.84117647
 0.68823529 0.82941176 0.99415205 0.99411765]

mean value: 0.8894152046783625

key: test_roc_auc
value: [0.86842105 0.81578947 0.84210526 0.94736842 0.92105263 0.84210526
 0.86842105 0.89473684 0.84210526 0.78070175]

mean value: 0.862280701754386

key: train_roc_auc
value: [0.94117647 0.95       0.93235294 0.95       0.90588235 0.91764706
 0.84411765 0.91176471 0.87648779 0.93565531]

mean value: 0.9165084279325766

key: test_jcc
value: [0.75       0.69565217 0.73913043 0.9047619  0.84210526 0.71428571
 0.73684211 0.8        0.75       0.68      ]

mean value: 0.7612777596164324

key: train_jcc
value: [0.88636364 0.9005848  0.875      0.90555556 0.8128655  0.83625731
 0.68823529 0.8245614  0.80188679 0.88481675]

mean value: 0.8416127038264324

MCC on Blind test: 0.62

Accuracy on Blind test: 0.79

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.16265321 0.14861178 0.14626455 0.14584064 0.14930272 0.16142392
 0.15244198 0.14463997 0.14828467 0.1534369 ]

mean value: 0.15129003524780274

key: score_time
value: [0.0151782  0.01543975 0.01532531 0.01555467 0.01592684 0.02432156
 0.01536274 0.01528263 0.01656127 0.01543808]

mean value: 0.016439104080200197

key: test_mcc
value: [0.9486833  0.89973541 0.89473684 0.89973541 1.         0.9486833
 0.9486833  0.9486833  1.         0.78764146]

mean value: 0.927658231800502

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.94736842 0.94736842 0.94736842 1.         0.97368421
 0.97368421 0.97368421 1.         0.89189189]

mean value: 0.962873399715505

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.95       0.94736842 0.95       1.         0.97297297
 0.97297297 0.97435897 1.         0.9       ]

mean value: 0.9640646314330525

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9047619  0.94736842 0.9047619  1.         1.
 1.         0.95       1.         0.85714286]

mean value: 0.9564035087719298

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94736842 1.         0.94736842 1.         1.         0.94736842
 0.94736842 1.         1.         0.94736842]

mean value: 0.9736842105263157

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.94736842 0.94736842 1.         0.97368421
 0.97368421 0.97368421 1.         0.89035088]

mean value: 0.962719298245614

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.9047619  0.9        0.9047619  1.         0.94736842
 0.94736842 0.95       1.         0.81818182]

mean value: 0.9319810890863522

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.92

Accuracy on Blind test: 0.96

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04295039 0.05016208 0.03978586 0.04688597 0.05769682 0.05953455
 0.07448912 0.0562222  0.04632425 0.03965807]

mean value: 0.05137093067169189

key: score_time
value: [0.02003407 0.02414131 0.01633906 0.02169681 0.02622557 0.02317142
 0.02110815 0.03839302 0.02159262 0.02609897]

mean value: 0.02388010025024414

key: test_mcc
value: [0.9486833  0.78947368 0.9486833  0.89973541 1.         0.89473684
 0.9486833  0.9486833  1.         0.83871328]

mean value: 0.9217392413194645

key: train_mcc
value: [0.98823529 0.98823529 0.99413485 1.         0.98823529 0.98236994
 0.97653817 0.99413485 0.99415185 0.98826969]

mean value: 0.9894305224350558

key: test_accuracy
value: [0.97368421 0.89473684 0.97368421 0.94736842 1.         0.94736842
 0.97368421 0.97368421 1.         0.91891892]

mean value: 0.9603129445234708

key: train_accuracy
value: [0.99411765 0.99411765 0.99705882 1.         0.99411765 0.99117647
 0.98823529 0.99705882 0.99706745 0.9941349 ]

mean value: 0.9947084698982233

key: test_fscore
value: [0.97297297 0.89473684 0.97435897 0.95       1.         0.94736842
 0.97297297 0.97435897 1.         0.92307692]

mean value: 0.9609846080898713

key: train_fscore
value: [0.99411765 0.99411765 0.99706745 1.         0.99411765 0.99120235
 0.98816568 0.99706745 0.99708455 0.99411765]

mean value: 0.9947058060215382

key: test_precision
value: [1.         0.89473684 0.95       0.9047619  1.         0.94736842
 1.         0.95       1.         0.9       ]

mean value: 0.9546867167919799

key: train_precision
value: [0.99411765 0.99411765 0.99415205 1.         0.99411765 0.98830409
 0.99404762 0.99415205 0.99418605 0.99411765]

mean value: 0.9941312440929044

key: test_recall
value: [0.94736842 0.89473684 1.         1.         1.         0.94736842
 0.94736842 1.         1.         0.94736842]

mean value: 0.968421052631579

key: train_recall
value: [0.99411765 0.99411765 1.         1.         0.99411765 0.99411765
 0.98235294 1.         1.         0.99411765]

mean value: 0.9952941176470589

key: test_roc_auc
value: [0.97368421 0.89473684 0.97368421 0.94736842 1.         0.94736842
 0.97368421 0.97368421 1.         0.91812865]

mean value: 0.960233918128655

key: train_roc_auc
value: [0.99411765 0.99411765 0.99705882 1.         0.99411765 0.99117647
 0.98823529 0.99705882 0.99705882 0.99413485]

mean value: 0.9947076023391813

key: test_jcc
value: [0.94736842 0.80952381 0.95       0.9047619  1.         0.9
 0.94736842 0.95       1.         0.85714286]

mean value: 0.9266165413533834

key: train_jcc
value: [0.98830409 0.98830409 0.99415205 1.         0.98830409 0.98255814
 0.97660819 0.99415205 0.99418605 0.98830409]

mean value: 0.9894872841017271

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.14919233 0.09951925 0.13426876 0.0898757  0.09314251 0.08426976
 0.08239388 0.05807185 0.10029602 0.09456182]

mean value: 0.09855918884277344

key: score_time
value: [0.02806473 0.01825833 0.02253628 0.02233458 0.0233705  0.02221894
 0.01384473 0.01433444 0.01381803 0.02184868]

mean value: 0.020062923431396484

key: test_mcc
value: [0.63245553 0.16151457 0.68803296 0.63245553 0.68421053 0.57894737
 0.85280287 0.59222009 0.89181287 0.51319869]

mean value: 0.6227651001498481

key: train_mcc
value: [0.99413485 0.99413485 0.99413485 0.99413485 0.99413485 1.
 0.99413485 1.         0.99415205 0.99415185]

mean value: 0.9953112973615207

key: test_accuracy
value: [0.81578947 0.57894737 0.84210526 0.81578947 0.84210526 0.78947368
 0.92105263 0.78947368 0.94594595 0.75675676]

mean value: 0.8097439544807966

key: train_accuracy
value: [0.99705882 0.99705882 0.99705882 0.99705882 0.99705882 1.
 0.99705882 1.         0.99706745 0.99706745]

mean value: 0.9976487838537175

key: test_fscore
value: [0.82051282 0.52941176 0.83333333 0.82051282 0.84210526 0.78947368
 0.91428571 0.76470588 0.94444444 0.76923077]

mean value: 0.8028016496747147

key: train_fscore
value: [0.99705015 0.99705015 0.99705015 0.99705015 0.99705015 1.
 0.99705015 1.         0.99706745 0.99705015]

mean value: 0.9976418481128729

key: test_precision
value: [0.8        0.6        0.88235294 0.8        0.84210526 0.78947368
 1.         0.86666667 0.94444444 0.75      ]

mean value: 0.8275042999656003

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.84210526 0.47368421 0.78947368 0.84210526 0.84210526 0.78947368
 0.84210526 0.68421053 0.94444444 0.78947368]

mean value: 0.7839181286549708

key: train_recall
value: [0.99411765 0.99411765 0.99411765 0.99411765 0.99411765 1.
 0.99411765 1.         0.99415205 0.99411765]

mean value: 0.9952975576195391

key: test_roc_auc
value: [0.81578947 0.57894737 0.84210526 0.81578947 0.84210526 0.78947368
 0.92105263 0.78947368 0.94590643 0.75584795]

mean value: 0.8096491228070175

key: train_roc_auc
value: [0.99705882 0.99705882 0.99705882 0.99705882 0.99705882 1.
 0.99705882 1.         0.99707602 0.99705882]

mean value: 0.9976487788097695

key: test_jcc
value: [0.69565217 0.36       0.71428571 0.69565217 0.72727273 0.65217391
 0.84210526 0.61904762 0.89473684 0.625     ]

mean value: 0.6825926426738784

key: train_jcc
value: [0.99411765 0.99411765 0.99411765 0.99411765 0.99411765 1.
 0.99411765 1.         0.99415205 0.99411765]

mean value: 0.9952975576195391

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.6186831  0.62528348 0.54585361 0.54748797 0.58207774 0.56963778
 0.5545404  0.55546618 0.54905748 0.55526996]

mean value: 0.5703357696533203

key: score_time
value: [0.01032495 0.00916743 0.00931358 0.01051068 0.00964808 0.00931263
 0.01022315 0.00920653 0.00935125 0.00947666]

mean value: 0.009653496742248534

key: test_mcc
value: [0.9486833  0.89973541 0.89473684 0.89973541 1.         0.89473684
 1.         0.89473684 0.94736842 0.83871328]

mean value: 0.9218446350938172

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97368421 0.94736842 0.94736842 0.94736842 1.         0.94736842
 1.         0.94736842 0.97297297 0.91891892]

mean value: 0.9602418207681366

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.95       0.94736842 0.95       1.         0.94736842
 1.         0.94736842 0.97297297 0.92307692]

mean value: 0.9611128132180764

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9047619  0.94736842 0.9047619  1.         0.94736842
 1.         0.94736842 0.94736842 0.9       ]

mean value: 0.9498997493734336

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94736842 1.         0.94736842 1.         1.         0.94736842
 1.         0.94736842 1.         0.94736842]

mean value: 0.9736842105263157

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97368421 0.94736842 0.94736842 0.94736842 1.         0.94736842
 1.         0.94736842 0.97368421 0.91812865]

mean value: 0.960233918128655

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.9047619  0.9        0.9047619  1.         0.9
 1.         0.9        0.94736842 0.85714286]

mean value: 0.9261403508771929

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.94

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02871037 0.04236031 0.04015183 0.07962561 0.02676129 0.04086113
 0.02677369 0.02712107 0.02694225 0.02759171]

mean value: 0.036689925193786624

key: score_time
value: [0.02021146 0.01885295 0.01714921 0.01255703 0.01522589 0.01539779
 0.01532888 0.01534462 0.01538682 0.01636577]

mean value: 0.01618204116821289

key: test_mcc
value: [0.47633051 0.2773501  0.53300179 0.43643578 0.10660036 0.54554473
 0.73786479 0.53300179 0.63129316 0.25301653]

mean value: 0.453043953428385

key: train_mcc
value: [0.98823529 0.9653073  0.99413485 0.98830369 0.78190435 0.94838881
 0.976741   0.95963741 0.94298132 0.88351945]

mean value: 0.9429153475266533

key: test_accuracy
value: [0.73684211 0.63157895 0.76315789 0.71052632 0.55263158 0.76315789
 0.86842105 0.76315789 0.81081081 0.62162162]

mean value: 0.7221906116642959

key: train_accuracy
value: [0.99411765 0.98235294 0.99705882 0.99411765 0.87941176 0.97352941
 0.98823529 0.97941176 0.97067449 0.93841642]

mean value: 0.9697326203208556

key: test_fscore
value: [0.75       0.5625     0.74285714 0.74418605 0.51428571 0.79069767
 0.87179487 0.74285714 0.82051282 0.58823529]

mean value: 0.7127926707355572

key: train_fscore
value: [0.99411765 0.98203593 0.99706745 0.99408284 0.86287625 0.97280967
 0.98809524 0.97897898 0.96987952 0.93416928]

mean value: 0.9674112800117264

key: test_precision
value: [0.71428571 0.69230769 0.8125     0.66666667 0.5625     0.70833333
 0.85       0.8125     0.76190476 0.66666667]

mean value: 0.7247664835164835

key: train_precision
value: [0.99411765 1.         0.99415205 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9988269693842449

key: test_recall
value: [0.78947368 0.47368421 0.68421053 0.84210526 0.47368421 0.89473684
 0.89473684 0.68421053 0.88888889 0.52631579]

mean value: 0.7152046783625731

key: train_recall
value: [0.99411765 0.96470588 1.         0.98823529 0.75882353 0.94705882
 0.97647059 0.95882353 0.94152047 0.87647059]

mean value: 0.9406226350189199

key: test_roc_auc
value: [0.73684211 0.63157895 0.76315789 0.71052632 0.55263158 0.76315789
 0.86842105 0.76315789 0.8128655  0.62426901]

mean value: 0.7226608187134502

key: train_roc_auc
value: [0.99411765 0.98235294 0.99705882 0.99411765 0.87941176 0.97352941
 0.98823529 0.97941176 0.97076023 0.93823529]

mean value: 0.9697230822153423

key: test_jcc
value: [0.6        0.39130435 0.59090909 0.59259259 0.34615385 0.65384615
 0.77272727 0.59090909 0.69565217 0.41666667]

mean value: 0.5650761235543844

key: train_jcc
value: [0.98830409 0.96470588 0.99415205 0.98823529 0.75882353 0.94705882
 0.97647059 0.95882353 0.94152047 0.87647059]

mean value: 0.9394564843481252

MCC on Blind test: 0.42

Accuracy on Blind test: 0.71

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02983665 0.03623152 0.0326581  0.03908563 0.04000688 0.04243374
 0.03620481 0.03728127 0.04367828 0.03259063]

mean value: 0.03700075149536133

key: score_time
value: [0.02391553 0.02225924 0.02418375 0.02630496 0.02330065 0.02267957
 0.0241394  0.02215934 0.02327657 0.02062058]

mean value: 0.023283958435058594

key: test_mcc
value: [0.78947368 0.84327404 0.68421053 0.89473684 0.84327404 0.68803296
 0.89973541 0.84327404 0.94736842 0.51461988]

mean value: 0.7947999856934739

key: train_mcc
value: [0.87648575 0.87648575 0.88235294 0.88241401 0.87648575 0.88825066
 0.88825066 0.88825066 0.87684899 0.87121527]

mean value: 0.8807040447224638

key: test_accuracy
value: [0.89473684 0.92105263 0.84210526 0.94736842 0.92105263 0.84210526
 0.94736842 0.92105263 0.97297297 0.75675676]

mean value: 0.8966571834992887

key: train_accuracy
value: [0.93823529 0.93823529 0.94117647 0.94117647 0.93823529 0.94411765
 0.94411765 0.94411765 0.93841642 0.93548387]

mean value: 0.9403312057961014

key: test_fscore
value: [0.89473684 0.92307692 0.84210526 0.94736842 0.91891892 0.85
 0.94444444 0.91891892 0.97297297 0.75675676]

mean value: 0.8969299461404725

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_7030.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.93841642 0.93841642 0.94117647 0.9408284  0.93841642 0.9439528
 0.9439528  0.94428152 0.93841642 0.93604651]

mean value: 0.9403904203379017

key: test_precision
value: [0.89473684 0.9        0.84210526 0.94736842 0.94444444 0.80952381
 1.         0.94444444 0.94736842 0.77777778]

mean value: 0.9007769423558897

key: train_precision
value: [0.93567251 0.93567251 0.94117647 0.94642857 0.93567251 0.94674556
 0.94674556 0.94152047 0.94117647 0.92528736]

mean value: 0.9396098004883142

key: test_recall
value: [0.89473684 0.94736842 0.84210526 0.94736842 0.89473684 0.89473684
 0.89473684 0.89473684 1.         0.73684211]

mean value: 0.8947368421052632

key: train_recall
value: [0.94117647 0.94117647 0.94117647 0.93529412 0.94117647 0.94117647
 0.94117647 0.94705882 0.93567251 0.94705882]

mean value: 0.9412143102855177

key: test_roc_auc
value: [0.89473684 0.92105263 0.84210526 0.94736842 0.92105263 0.84210526
 0.94736842 0.92105263 0.97368421 0.75730994]

mean value: 0.8967836257309941

key: train_roc_auc
value: [0.93823529 0.93823529 0.94117647 0.94117647 0.93823529 0.94411765
 0.94411765 0.94411765 0.93842449 0.93551772]

mean value: 0.9403353973168215

key: test_jcc
value: [0.80952381 0.85714286 0.72727273 0.9        0.85       0.73913043
 0.89473684 0.85       0.94736842 0.60869565]

mean value: 0.818387074405381

key: train_jcc
value: [0.8839779  0.8839779  0.88888889 0.88826816 0.8839779  0.89385475
 0.89385475 0.89444444 0.8839779  0.87978142]

mean value: 0.887500400993959

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.25618434 0.2671926  0.32188869 0.26167226 0.25118732 0.25854945
 0.26166058 0.24911594 0.2661407  0.28530979]

mean value: 0.26789016723632814

key: score_time
value: [0.02451444 0.02255034 0.02055788 0.02151513 0.02196789 0.01751781
 0.02237058 0.0236876  0.02393007 0.02099395]

mean value: 0.02196056842803955

key: test_mcc
value: [0.78947368 0.78947368 0.68421053 0.89473684 0.84327404 0.68803296
 0.89973541 0.84327404 0.94736842 0.51461988]

mean value: 0.7894199498433697

key: train_mcc
value: [0.87648575 0.78828985 0.88235294 0.88241401 0.87648575 0.88825066
 0.88825066 0.88825066 0.87684899 0.87121527]

mean value: 0.8718844543680944

key: test_accuracy
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.92105263 0.84210526
 0.94736842 0.92105263 0.97297297 0.75675676]

mean value: 0.8940256045519204

key: train_accuracy
value: [0.93823529 0.89411765 0.94117647 0.94117647 0.93823529 0.94411765
 0.94411765 0.94411765 0.93841642 0.93548387]

mean value: 0.9359194410902191

key: test_fscore
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.91891892 0.85
 0.94444444 0.91891892 0.97297297 0.75675676]

mean value: 0.8940959380433064

key: train_fscore
value: [0.93841642 0.89349112 0.94117647 0.9408284  0.93841642 0.9439528
 0.9439528  0.94428152 0.93841642 0.93604651]

mean value: 0.9358978905351981

key: test_precision
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.94444444 0.80952381
 1.         0.94444444 0.94736842 0.77777778]

mean value: 0.9002506265664161

key: train_precision
value: [0.93567251 0.89880952 0.94117647 0.94642857 0.93567251 0.94674556
 0.94674556 0.94152047 0.94117647 0.92528736]

mean value: 0.9359235014072783

key: test_recall
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.89473684 0.89473684
 0.89473684 0.89473684 1.         0.73684211]

mean value: 0.8894736842105263

key: train_recall
value: [0.94117647 0.88823529 0.94117647 0.93529412 0.94117647 0.94117647
 0.94117647 0.94705882 0.93567251 0.94705882]

mean value: 0.9359201926384588

key: test_roc_auc
value: [0.89473684 0.89473684 0.84210526 0.94736842 0.92105263 0.84210526
 0.94736842 0.92105263 0.97368421 0.75730994]

mean value: 0.8941520467836257

key: train_roc_auc
value: [0.93823529 0.89411765 0.94117647 0.94117647 0.93823529 0.94411765
 0.94411765 0.94411765 0.93842449 0.93551772]

mean value: 0.9359236326109391

key: test_jcc
value: [0.80952381 0.80952381 0.72727273 0.9        0.85       0.73913043
 0.89473684 0.85       0.94736842 0.60869565]

mean value: 0.8136251696434763

key: train_jcc
value: [0.8839779  0.80748663 0.88888889 0.88826816 0.8839779  0.89385475
 0.89385475 0.89444444 0.8839779  0.87978142]

mean value: 0.8798512740403147

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9