LSHTM_analysis/scripts/ml/log_gid_orig.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_orig.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 531

PASS: my_features_df and aa_df successfully combined
nrows: 531
ncols: 286
count of NULL values before imputation

or_mychisq          263
log10_or_mychisq    263
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 167
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True
Original Data
 Counter({0: 76, 1: 43}) Data dim: (119, 174)

-------------------------------------------------------------
Successfully split data: ORIGINAL training
actual values: training set
imputed values: blind test set
Train data size: (119, 174)
Test data size: (412, 174)
y_train numbers: Counter({0: 76, 1: 43})
y_train ratio: 1.7674418604651163

y_test_numbers: Counter({0: 409, 1: 3})
y_test ratio: 136.33333333333334
-------------------------------------------------------------
Simple Random OverSampling
 Counter({0: 76, 1: 76})
(152, 174)
Simple Random UnderSampling
 Counter({0: 43, 1: 43})
(86, 174)
Simple Combined Over and UnderSampling
 Counter({0: 76, 1: 76})
(152, 174)
SMOTE_NC OverSampling
 Counter({0: 76, 1: 76})
(152, 174)

#####################################################################

Running ML analysis: ORIGINAL
Gene name: gid
Drug name: streptomycin

Output directory: /home/tanu/git/Data/streptomycin/output/ml/tts_orig/

Sanity checks:
Total input features: 174

Training data size: (119, 174)
Test data size: (412, 174)

Target feature numbers (training data): Counter({0: 76, 1: 43})
Target features ratio (training data: 1.7674418604651163

Target feature numbers (test data): Counter({0: 409, 1: 3})
Target features ratio (test data): 136.33333333333334

#####################################################################


================================================================

Strucutral features (n): 35
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
These are:
 ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03334117 0.02475882 0.02577257 0.02525377 0.02861619 0.02571344
 0.02372122 0.02629876 0.03418589 0.02536702]

mean value: 0.027302885055541994

key: score_time
value: [0.01207638 0.01153398 0.01153016 0.0115757  0.01153898 0.011554
 0.0115602  0.01156116 0.01151729 0.01161742]

mean value: 0.011606526374816895

key: test_mcc
value: [0.63245553 0.25       0.         0.63245553 0.47809144 0.25
 0.07559289 0.68313005 0.65714286 0.21428571]

mean value: 0.38731540248954993

key: train_mcc
value: [0.83974018 0.90115642 0.90115642 0.85945065 0.87925006 0.85945065
 0.81756015 0.85757194 0.91818969 0.87982296]

mean value: 0.8713349127582828

key: test_accuracy
value: [0.83333333 0.66666667 0.5        0.83333333 0.75       0.66666667
 0.58333333 0.83333333 0.83333333 0.63636364]

mean value: 0.7136363636363636

key: train_accuracy
value: [0.92523364 0.95327103 0.95327103 0.93457944 0.94392523 0.93457944
 0.91588785 0.93457944 0.96261682 0.94444444]

mean value: 0.940238836967809

key: test_fscore
value: [0.66666667 0.5        0.4        0.66666667 0.66666667 0.5
 0.28571429 0.75       0.8        0.5       ]

mean value: 0.5735714285714285

key: train_fscore
value: [0.88888889 0.93150685 0.93150685 0.90410959 0.91891892 0.90410959
 0.86956522 0.90140845 0.94594595 0.91891892]

mean value: 0.911487921748053

key: test_precision
value: [1.         0.5        0.33333333 1.         0.6        0.5
 0.5        1.         0.8        0.5       ]

mean value: 0.6733333333333333

key: train_precision
value: [0.96969697 1.         1.         0.97058824 0.97142857 0.97058824
 0.96774194 0.96969697 0.97222222 0.97142857]

mean value: 0.976339171054541

key: test_recall
value: [0.5  0.5  0.5  0.5  0.75 0.5  0.2  0.6  0.8  0.5 ]

mean value: 0.535

key: train_recall
value: [0.82051282 0.87179487 0.87179487 0.84615385 0.87179487 0.84615385
 0.78947368 0.84210526 0.92105263 0.87179487]

mean value: 0.8552631578947368

key: test_roc_auc
value: [0.75       0.625      0.5        0.75       0.75       0.625
 0.52857143 0.8        0.82857143 0.60714286]

mean value: 0.6764285714285714

key: train_roc_auc
value: [0.90290347 0.93589744 0.93589744 0.91572398 0.92854449 0.91572398
 0.88749047 0.91380625 0.95327994 0.92865106]

mean value: 0.9217918517521425

key: test_jcc
value: [0.5        0.33333333 0.25       0.5        0.5        0.33333333
 0.16666667 0.6        0.66666667 0.33333333]

mean value: 0.41833333333333333

key: train_jcc
value: [0.8        0.87179487 0.87179487 0.825      0.85       0.825
 0.76923077 0.82051282 0.8974359  0.85      ]

mean value: 0.838076923076923

MCC on Blind test: 0.14

Accuracy on Blind test: 0.73

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.71644402 0.81135726 0.6770184  0.69487214 0.77071428 0.63311291
 0.59070015 0.60373497 0.72378373 0.66717815]

mean value: 0.6888916015625

key: score_time
value: [0.01187897 0.01522493 0.01512241 0.01507711 0.0152173  0.01510835
 0.01505136 0.01501942 0.01196957 0.01193142]

mean value: 0.014160084724426269

key: test_mcc
value: [0.63245553 0.40824829 0.11952286 0.63245553 0.11952286 0.70710678
 0.50709255 0.29277002 0.50709255 0.21428571]

mean value: 0.41405526994291353

key: train_mcc
value: [0.73774797 1.         1.         1.         1.         1.
 1.         1.         1.         0.72342894]

mean value: 0.9461176913949825

key: test_accuracy
value: [0.83333333 0.75       0.58333333 0.83333333 0.58333333 0.83333333
 0.75       0.66666667 0.75       0.63636364]

mean value: 0.7219696969696969

key: train_accuracy
value: [0.87850467 1.         1.         1.         1.         1.
 1.         1.         1.         0.87037037]

mean value: 0.9748875043267566

key: test_fscore
value: [0.66666667 0.57142857 0.44444444 0.66666667 0.44444444 0.8
 0.72727273 0.5        0.72727273 0.5       ]

mean value: 0.6048196248196248

key: train_fscore
value: [0.8115942  1.         1.         1.         1.         1.
 1.         1.         1.         0.78787879]

mean value: 0.9599472990777339

key: test_precision
value: [1.         0.66666667 0.4        1.         0.4        0.66666667
 0.66666667 0.66666667 0.66666667 0.5       ]

mean value: 0.6633333333333333

key: train_precision
value: [0.93333333 1.         1.         1.         1.         1.
 1.         1.         1.         0.96296296]

mean value: 0.9896296296296296

key: test_recall
value: [0.5 0.5 0.5 0.5 0.5 1.  0.8 0.4 0.8 0.5]

mean value: 0.6

key: train_recall
value: [0.71794872 1.         1.         1.         1.         1.
 1.         1.         1.         0.66666667]

mean value: 0.9384615384615385

key: test_roc_auc
value: [0.75       0.6875     0.5625     0.75       0.5625     0.875
 0.75714286 0.62857143 0.75714286 0.60714286]

mean value: 0.69375

key: train_roc_auc
value: [0.84426848 1.         1.         1.         1.         1.
 1.         1.         1.         0.82608696]

mean value: 0.9670355433143157

key: test_jcc
value: [0.5        0.4        0.28571429 0.5        0.28571429 0.66666667
 0.57142857 0.33333333 0.57142857 0.33333333]

mean value: 0.44476190476190475

key: train_jcc
value: [0.68292683 1.         1.         1.         1.         1.
 1.         1.         1.         0.65      ]

mean value: 0.9332926829268293

MCC on Blind test: 0.12

Accuracy on Blind test: 0.67

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01205301 0.01168513 0.00886297 0.00880456 0.00827646 0.00836802
 0.00829816 0.00849175 0.00859928 0.00846052]

mean value: 0.009189987182617187

key: score_time
value: [0.01187491 0.00991917 0.00889611 0.00851369 0.00854278 0.00852203
 0.00846195 0.00852776 0.00884151 0.00848675]

mean value: 0.009058666229248048

key: test_mcc
value: [ 0.31622777  0.          0.5         0.125       0.          0.
 -0.37142857  0.23904572  0.23904572 -0.38575837]

mean value: 0.06621322634167942

key: train_mcc
value: [0.26480296 0.37487168 0.27967347 0.22683426 0.33144796 0.39638569
 0.40772448 0.3171935  0.30927906 0.36454849]

mean value: 0.3272761552505757

key: test_accuracy
value: [0.5        0.5        0.66666667 0.5        0.41666667 0.41666667
 0.33333333 0.58333333 0.58333333 0.27272727]

mean value: 0.4772727272727273

key: train_accuracy
value: [0.57943925 0.62616822 0.57943925 0.54205607 0.61682243 0.6728972
 0.68224299 0.64485981 0.58878505 0.62962963]

mean value: 0.6162339910003461

key: test_fscore
value: [0.57142857 0.4        0.66666667 0.5        0.46153846 0.46153846
 0.2        0.61538462 0.61538462 0.33333333]

mean value: 0.4825274725274725

key: train_fscore
value: [0.58715596 0.63636364 0.59459459 0.57391304 0.61682243 0.64646465
 0.64583333 0.59574468 0.6        0.62962963]

mean value: 0.6126521957924459

key: test_precision
value: [0.4        0.33333333 0.5        0.375      0.33333333 0.33333333
 0.2        0.5        0.5        0.25      ]

mean value: 0.3725

key: train_precision
value: [0.45714286 0.49295775 0.45833333 0.43421053 0.48529412 0.53333333
 0.53448276 0.5        0.45833333 0.49275362]

mean value: 0.4846841629393674

key: test_recall
value: [1.   0.5  1.   0.75 0.75 0.75 0.2  0.8  0.8  0.5 ]

mean value: 0.705

key: train_recall
value: [0.82051282 0.8974359  0.84615385 0.84615385 0.84615385 0.82051282
 0.81578947 0.73684211 0.86842105 0.87179487]

mean value: 0.8369770580296896

key: test_roc_auc
value: [0.625      0.5        0.75       0.5625     0.5        0.5
 0.31428571 0.61428571 0.61428571 0.32142857]

mean value: 0.5301785714285714

key: train_roc_auc
value: [0.63084465 0.68401207 0.63631222 0.60690045 0.66572398 0.70437406
 0.71224256 0.6655225  0.65160183 0.68227425]

mean value: 0.6639808563805737

key: test_jcc
value: [0.4        0.25       0.5        0.33333333 0.3        0.3
 0.11111111 0.44444444 0.44444444 0.2       ]

mean value: 0.3283333333333333

key: train_jcc
value: [0.41558442 0.46666667 0.42307692 0.40243902 0.44594595 0.47761194
 0.47692308 0.42424242 0.42857143 0.45945946]

mean value: 0.4420521305159092

MCC on Blind test: 0.0

Accuracy on Blind test: 0.35

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00877714 0.00849271 0.00963545 0.00846076 0.00849152 0.00848961
 0.00851655 0.00852251 0.00855732 0.0085299 ]

mean value: 0.008647346496582031

key: score_time
value: [0.00853372 0.00901699 0.00869703 0.00842142 0.00851774 0.00851393
 0.00854301 0.00848365 0.00848579 0.0085268 ]

mean value: 0.00857400894165039

key: test_mcc
value: [-0.31622777  0.         -0.31622777  0.25        0.40824829  0.625
  0.07559289  0.07559289  0.11952286  0.13363062]

mean value: 0.10551320295235296

key: train_mcc
value: [0.54302345 0.56482566 0.5756791  0.52383566 0.54368212 0.52105076
 0.51620555 0.51327923 0.51494818 0.54491455]

mean value: 0.5361444261644434

key: test_accuracy
value: [0.5        0.58333333 0.5        0.66666667 0.75       0.83333333
 0.58333333 0.58333333 0.58333333 0.63636364]

mean value: 0.621969696969697

key: train_accuracy
value: [0.79439252 0.80373832 0.80373832 0.78504673 0.79439252 0.78504673
 0.78504673 0.78504673 0.78504673 0.7962963 ]

mean value: 0.79177916233991

key: test_fscore
value: [0.         0.28571429 0.         0.5        0.57142857 0.75
 0.28571429 0.28571429 0.44444444 0.33333333]

mean value: 0.34563492063492063

key: train_fscore
value: [0.67647059 0.69565217 0.6557377  0.63492063 0.68571429 0.65671642
 0.66666667 0.63492063 0.62295082 0.67647059]

mean value: 0.6606220515106465

key: test_precision
value: [0.         0.33333333 0.         0.5        0.66666667 0.75
 0.5        0.5        0.5        0.5       ]

mean value: 0.425

key: train_precision
value: [0.79310345 0.8        0.90909091 0.83333333 0.77419355 0.78571429
 0.74193548 0.8        0.82608696 0.79310345]

mean value: 0.8056561413470056

key: test_recall
value: [0.   0.25 0.   0.5  0.5  0.75 0.2  0.2  0.4  0.25]

mean value: 0.305

key: train_recall
value: [0.58974359 0.61538462 0.51282051 0.51282051 0.61538462 0.56410256
 0.60526316 0.52631579 0.5        0.58974359]

mean value: 0.5631578947368421

key: test_roc_auc
value: [0.375      0.5        0.375      0.625      0.6875     0.8125
 0.52857143 0.52857143 0.55714286 0.55357143]

mean value: 0.5542857142857143

key: train_roc_auc
value: [0.75075415 0.76357466 0.74170437 0.72699849 0.75622172 0.73793363
 0.74466056 0.72692601 0.72101449 0.75139353]

mean value: 0.7421181630546956

key: test_jcc
value: [0.         0.16666667 0.         0.33333333 0.4        0.6
 0.16666667 0.16666667 0.28571429 0.2       ]

mean value: 0.2319047619047619

key: train_jcc
value: [0.51111111 0.53333333 0.48780488 0.46511628 0.52173913 0.48888889
 0.5        0.46511628 0.45238095 0.51111111]

mean value: 0.4936601963448495

MCC on Blind test: 0.16

Accuracy on Blind test: 0.78

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01097941 0.01172185 0.00993443 0.0093863  0.00939178 0.00932884
 0.00943375 0.00923657 0.00932479 0.00890183]

mean value: 0.009763956069946289

key: score_time
value: [0.0509901  0.0338583  0.01047421 0.00983119 0.01023149 0.01005054
 0.01004434 0.01029134 0.01021481 0.01026177]

mean value: 0.016624808311462402

key: test_mcc
value: [ 0.63245553  0.42640143  0.25       -0.31622777  0.81649658  0.15811388
 -0.09759001  0.11952286  0.68313005 -0.23904572]

mean value: 0.24332568454998843

key: train_mcc
value: [0.45416735 0.49964579 0.50186627 0.4993366  0.40732456 0.59640798
 0.5405763  0.49050631 0.42002098 0.54491455]

mean value: 0.4954766696721971

key: test_accuracy
value: [0.83333333 0.75       0.66666667 0.5        0.91666667 0.66666667
 0.5        0.58333333 0.83333333 0.54545455]

mean value: 0.6795454545454546

key: train_accuracy
value: [0.75700935 0.77570093 0.77570093 0.77570093 0.73831776 0.81308411
 0.79439252 0.77570093 0.74766355 0.7962963 ]

mean value: 0.7749567324333679

key: test_fscore
value: [0.66666667 0.4        0.5        0.         0.85714286 0.33333333
 0.25       0.44444444 0.75       0.        ]

mean value: 0.4201587301587301

key: train_fscore
value: [0.60606061 0.625      0.61290323 0.64705882 0.5483871  0.67741935
 0.63333333 0.61290323 0.54237288 0.67647059]

mean value: 0.6181909135740384

key: test_precision
value: [1.         1.         0.5        0.         1.         0.5
 0.33333333 0.5        1.         0.        ]

mean value: 0.5833333333333334

key: train_precision
value: [0.74074074 0.8        0.82608696 0.75862069 0.73913043 0.91304348
 0.86363636 0.79166667 0.76190476 0.79310345]

mean value: 0.7987933540444785

key: test_recall
value: [0.5  0.25 0.5  0.   0.75 0.25 0.2  0.4  0.6  0.  ]

mean value: 0.345

key: train_recall
value: [0.51282051 0.51282051 0.48717949 0.56410256 0.43589744 0.53846154
 0.5        0.5        0.42105263 0.58974359]

mean value: 0.5062078272604589

key: test_roc_auc
value: [0.75       0.625      0.625      0.375      0.875      0.5625
 0.45714286 0.55714286 0.8        0.42857143]

mean value: 0.6055357142857143

key: train_roc_auc
value: [0.70493967 0.71964555 0.71417798 0.73058069 0.67383107 0.75452489
 0.72826087 0.71376812 0.67429443 0.75139353]

mean value: 0.7165416800411416

key: test_jcc
value: [0.5        0.25       0.33333333 0.         0.75       0.2
 0.14285714 0.28571429 0.6        0.        ]

mean value: 0.3061904761904762

key: train_jcc
value: [0.43478261 0.45454545 0.44186047 0.47826087 0.37777778 0.51219512
 0.46341463 0.44186047 0.37209302 0.51111111]

mean value: 0.4487901531281146

MCC on Blind test: 0.02

Accuracy on Blind test: 0.75

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01083565 0.01061916 0.01054645 0.01052928 0.01054525 0.01045609
 0.00951052 0.00935054 0.01064563 0.01066136]

mean value: 0.010369992256164551

key: score_time
value: [0.00955319 0.00956869 0.00950885 0.01027656 0.00945163 0.00895619
 0.0088501  0.00963807 0.00961924 0.00976944]

mean value: 0.009519195556640625

key: test_mcc
value: [-0.21320072  0.42640143  0.40824829  0.          0.42640143  0.
  0.35675303  0.52915026  0.47809144  0.41833001]

mean value: 0.28301751927507457

key: train_mcc
value: [0.61356082 0.59383173 0.61356082 0.59383173 0.65250309 0.6717753
 0.70511024 0.6658757  0.62613533 0.6341119 ]

mean value: 0.637029666985776

key: test_accuracy
value: [0.58333333 0.75       0.75       0.66666667 0.75       0.66666667
 0.66666667 0.75       0.75       0.72727273]

mean value: 0.706060606060606

key: train_accuracy
value: [0.81308411 0.80373832 0.81308411 0.80373832 0.8317757  0.8411215
 0.85981308 0.8411215  0.82242991 0.82407407]

mean value: 0.8253980616130149

key: test_fscore
value: [0.         0.4        0.57142857 0.         0.4        0.
 0.33333333 0.57142857 0.66666667 0.4       ]

mean value: 0.3342857142857143

key: train_fscore
value: [0.65517241 0.63157895 0.65517241 0.63157895 0.7        0.72131148
 0.75409836 0.71186441 0.66666667 0.6779661 ]

mean value: 0.6805409733529866

key: test_precision
value: [0.         1.         0.66666667 0.         1.         0.
 1.         1.         0.75       1.        ]

mean value: 0.6416666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.   0.25 0.5  0.   0.25 0.   0.2  0.4  0.6  0.25]

mean value: 0.245

key: train_recall
value: [0.48717949 0.46153846 0.48717949 0.46153846 0.53846154 0.56410256
 0.60526316 0.55263158 0.5        0.51282051]

mean value: 0.5170715249662619

key: test_roc_auc
value: [0.4375     0.625      0.6875     0.5        0.625      0.5
 0.6        0.7        0.72857143 0.625     ]

mean value: 0.6028571428571429

key: train_roc_auc
value: [0.74358974 0.73076923 0.74358974 0.73076923 0.76923077 0.78205128
 0.80263158 0.77631579 0.75       0.75641026]

mean value: 0.7585357624831309

key: test_jcc
value: [0.   0.25 0.4  0.   0.25 0.   0.2  0.4  0.5  0.25]

mean value: 0.225

key: train_jcc
value: [0.48717949 0.46153846 0.48717949 0.46153846 0.53846154 0.56410256
 0.60526316 0.55263158 0.5        0.51282051]

mean value: 0.5170715249662619

MCC on Blind test: -0.03

Accuracy on Blind test: 0.9

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.56232119 0.63862157 0.60934663 0.55800104 0.53653932 0.71582627
 0.5305922  0.58970189 0.68618298 0.7097888 ]

mean value: 0.6136921882629395

key: score_time
value: [0.01231527 0.01209044 0.01212478 0.01505733 0.01235318 0.01213241
 0.01209545 0.0121336  0.01381922 0.01211047]

mean value: 0.012623214721679687

key: test_mcc
value: [0.625      0.25       0.25       0.625      0.25       0.25
 0.07559289 0.65714286 0.65714286 0.21428571]

mean value: 0.3854164323173274

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.83333333 0.66666667 0.66666667 0.83333333 0.66666667 0.66666667
 0.58333333 0.83333333 0.83333333 0.63636364]

mean value: 0.7219696969696969

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.5        0.5        0.75       0.5        0.5
 0.28571429 0.8        0.8        0.5       ]

mean value: 0.5885714285714286

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75 0.5  0.5  0.75 0.5  0.5  0.5  0.8  0.8  0.5 ]

mean value: 0.61

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.5  0.75 0.5  0.5  0.2  0.8  0.8  0.5 ]

mean value: 0.58

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.625      0.625      0.8125     0.625      0.625
 0.52857143 0.82857143 0.82857143 0.60714286]

mean value: 0.6917857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.33333333 0.33333333 0.6        0.33333333 0.33333333
 0.16666667 0.66666667 0.66666667 0.33333333]

mean value: 0.43666666666666665

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.7

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.0142343  0.01151633 0.01047158 0.01146269 0.01170778 0.01135254
 0.01145124 0.01137567 0.01155138 0.01109648]

mean value: 0.011621999740600585

key: score_time
value: [0.01149178 0.00891137 0.00910735 0.00927162 0.00931621 0.00920415
 0.00913906 0.00935984 0.00922346 0.00918746]

mean value: 0.009421229362487793

key: test_mcc
value: [0.83666003 0.625      0.625      0.625      0.83666003 1.
 0.47809144 1.         0.65714286 0.60714286]

mean value: 0.7290697211087622

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91666667 0.83333333 0.83333333 0.83333333 0.91666667 1.
 0.75       1.         0.83333333 0.81818182]

mean value: 0.8734848484848485

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.75       0.75       0.75       0.88888889 1.
 0.66666667 1.         0.8        0.75      ]

mean value: 0.8244444444444444

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8  0.75 0.75 0.75 0.8  1.   0.75 1.   0.8  0.75]

mean value: 0.8150000000000001

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.75 1.   1.   0.6  1.   0.8  0.75]

mean value: 0.84

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.8125     0.8125     0.8125     0.9375     1.
 0.72857143 1.         0.82857143 0.80357143]

mean value: 0.8673214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.6        0.6        0.6        0.8        1.
 0.5        1.         0.66666667 0.6       ]

mean value: 0.7166666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08845139 0.08627129 0.08441663 0.08516455 0.08460331 0.08442616
 0.08551168 0.08432364 0.084692   0.08504295]

mean value: 0.08529036045074463

key: score_time
value: [0.01813054 0.01712751 0.01685095 0.01689005 0.01693153 0.01691318
 0.0168407  0.01707983 0.01705503 0.01776624]

mean value: 0.01715855598449707

key: test_mcc
value: [0.81649658 0.25       0.25       0.63245553 0.15811388 0.40824829
 0.29277002 0.68313005 0.83666003 0.21428571]

mean value: 0.4542160100202007

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91666667 0.66666667 0.66666667 0.83333333 0.66666667 0.75
 0.66666667 0.83333333 0.91666667 0.63636364]

mean value: 0.7553030303030303

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.5        0.5        0.66666667 0.33333333 0.57142857
 0.5        0.75       0.88888889 0.5       ]

mean value: 0.6067460317460317

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.5        0.5        1.         0.5        0.66666667
 0.66666667 1.         1.         0.5       ]

mean value: 0.7333333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.5  0.5  0.25 0.5  0.4  0.6  0.8  0.5 ]

mean value: 0.53

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.625      0.625      0.75       0.5625     0.6875
 0.62857143 0.8        0.9        0.60714286]

mean value: 0.7060714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.33333333 0.33333333 0.5        0.2        0.4
 0.33333333 0.6        0.8        0.33333333]

mean value: 0.4583333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.02

Accuracy on Blind test: 0.77

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00952244 0.00952935 0.00941849 0.00951624 0.00955868 0.00941634
 0.00942731 0.0094018  0.00958323 0.00910163]

mean value: 0.009447550773620606

key: score_time
value: [0.00921059 0.00916076 0.00918984 0.00924277 0.00925565 0.00907969
 0.00916648 0.00929952 0.00872755 0.0085113 ]

mean value: 0.009084415435791016

key: test_mcc
value: [ 0.125       0.42640143  0.40824829  0.40824829  0.625       0.23904572
  0.31428571 -0.02857143  0.50709255  0.38575837]

mean value: 0.3410508948962451

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.5        0.75       0.75       0.75       0.83333333 0.58333333
 0.66666667 0.5        0.75       0.72727273]

mean value: 0.681060606060606

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.4        0.57142857 0.57142857 0.75       0.54545455
 0.6        0.4        0.72727273 0.57142857]

mean value: 0.5637012987012987

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.375      1.         0.66666667 0.66666667 0.75       0.42857143
 0.6        0.4        0.66666667 0.66666667]

mean value: 0.6220238095238095

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.25 0.5  0.5  0.75 0.75 0.6  0.4  0.8  0.5 ]

mean value: 0.58

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.5625     0.625      0.6875     0.6875     0.8125     0.625
 0.65714286 0.48571429 0.75714286 0.67857143]

mean value: 0.6578571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.25       0.4        0.4        0.6        0.375
 0.42857143 0.25       0.57142857 0.4       ]

mean value: 0.4008333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.68

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.11022449 1.10686707 1.07394433 1.17411065 1.12241173 1.1280334
 1.07594752 1.093925   1.09230089 1.07993698]

mean value: 1.105770206451416

key: score_time
value: [0.09314108 0.09392309 0.08695936 0.09498835 0.09460545 0.08653736
 0.0941546  0.08683944 0.09424591 0.08606005]

mean value: 0.09114546775817871

key: test_mcc
value: [0.81649658 0.40824829 0.625      0.63245553 0.40824829 0.63245553
 0.29277002 0.68313005 0.83666003 0.13363062]

mean value: 0.5469094946361625

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91666667 0.75       0.83333333 0.83333333 0.75       0.83333333
 0.66666667 0.83333333 0.91666667 0.63636364]

mean value: 0.796969696969697

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.57142857 0.75       0.66666667 0.57142857 0.66666667
 0.5        0.75       0.88888889 0.33333333]

mean value: 0.6555555555555556

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.66666667 0.75       1.         0.66666667 1.
 0.66666667 1.         1.         0.5       ]

mean value: 0.825

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.75 0.5  0.5  0.5  0.4  0.6  0.8  0.25]

mean value: 0.555

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.6875     0.8125     0.75       0.6875     0.75
 0.62857143 0.8        0.9        0.55357143]

mean value: 0.7444642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.4        0.6        0.5        0.4        0.5
 0.33333333 0.6        0.8        0.2       ]

mean value: 0.5083333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.72028232 0.85728979 0.8835597  0.87653446 0.84350824 0.84826064
 0.85741544 0.89759564 0.87344646 0.88773561]

mean value: 0.9545628309249878

key: score_time
value: [0.16366887 0.22488356 0.20972967 0.29697323 0.24213696 0.18953753
 0.17323303 0.19940066 0.21370244 0.14987516]

mean value: 0.2063141107559204

key: test_mcc
value: [0.81649658 0.40824829 0.81649658 0.63245553 0.15811388 0.63245553
 0.52915026 0.52915026 0.83666003 0.41833001]

mean value: 0.5777556963622036

key: train_mcc
value: [0.90115642 0.92064018 0.94025192 0.94025192 0.88178613 0.88178613
 0.89985774 0.88019137 0.91962501 0.92101104]

mean value: 0.9086557859411416

key: test_accuracy
value: [0.91666667 0.75       0.91666667 0.83333333 0.66666667 0.83333333
 0.75       0.75       0.91666667 0.72727273]

mean value: 0.806060606060606

key: train_accuracy
value: [0.95327103 0.96261682 0.97196262 0.97196262 0.94392523 0.94392523
 0.95327103 0.94392523 0.96261682 0.96296296]

mean value: 0.9570439598476982

key: test_fscore
value: [0.85714286 0.57142857 0.85714286 0.66666667 0.33333333 0.66666667
 0.57142857 0.57142857 0.88888889 0.4       ]

mean value: 0.6384126984126984

key: train_fscore
value: [0.93150685 0.94594595 0.96       0.96       0.91666667 0.91666667
 0.92957746 0.91428571 0.94444444 0.94594595]

mean value: 0.9365039698059185

key: test_precision
value: [1.         0.66666667 1.         1.         0.5        1.
 1.         1.         1.         1.        ]

mean value: 0.9166666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.75 0.5  0.25 0.5  0.4  0.4  0.8  0.25]

mean value: 0.51

key: train_recall
value: [0.87179487 0.8974359  0.92307692 0.92307692 0.84615385 0.84615385
 0.86842105 0.84210526 0.89473684 0.8974359 ]

mean value: 0.8810391363022942

key: test_roc_auc
value: [0.875  0.6875 0.875  0.75   0.5625 0.75   0.7    0.7    0.9    0.625 ]

mean value: 0.7424999999999999

key: train_roc_auc
value: [0.93589744 0.94871795 0.96153846 0.96153846 0.92307692 0.92307692
 0.93421053 0.92105263 0.94736842 0.94871795]

mean value: 0.9405195681511471

key: test_jcc
value: [0.75 0.4  0.75 0.5  0.2  0.5  0.4  0.4  0.8  0.25]

mean value: 0.495

key: train_jcc
value: [0.87179487 0.8974359  0.92307692 0.92307692 0.84615385 0.84615385
 0.86842105 0.84210526 0.89473684 0.8974359 ]

mean value: 0.8810391363022942

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02321768 0.00922227 0.00949192 0.00946665 0.00959659 0.00948167
 0.00957298 0.00974035 0.00960398 0.00952101]

mean value: 0.01089150905609131

key: score_time
value: [0.01008844 0.00922775 0.00936937 0.00923586 0.00924206 0.00932956
 0.00904942 0.00939536 0.00939751 0.00934649]

mean value: 0.009368181228637695

key: test_mcc
value: [-0.31622777  0.         -0.31622777  0.25        0.40824829  0.625
  0.07559289  0.07559289  0.11952286  0.13363062]

mean value: 0.10551320295235296

key: train_mcc
value: [0.54302345 0.56482566 0.5756791  0.52383566 0.54368212 0.52105076
 0.51620555 0.51327923 0.51494818 0.54491455]

mean value: 0.5361444261644434

key: test_accuracy
value: [0.5        0.58333333 0.5        0.66666667 0.75       0.83333333
 0.58333333 0.58333333 0.58333333 0.63636364]

mean value: 0.621969696969697

key: train_accuracy
value: [0.79439252 0.80373832 0.80373832 0.78504673 0.79439252 0.78504673
 0.78504673 0.78504673 0.78504673 0.7962963 ]

mean value: 0.79177916233991

key: test_fscore
value: [0.         0.28571429 0.         0.5        0.57142857 0.75
 0.28571429 0.28571429 0.44444444 0.33333333]

mean value: 0.34563492063492063

key: train_fscore
value: [0.67647059 0.69565217 0.6557377  0.63492063 0.68571429 0.65671642
 0.66666667 0.63492063 0.62295082 0.67647059]

mean value: 0.6606220515106465

key: test_precision
value: [0.         0.33333333 0.         0.5        0.66666667 0.75
 0.5        0.5        0.5        0.5       ]

mean value: 0.425

key: train_precision
value: [0.79310345 0.8        0.90909091 0.83333333 0.77419355 0.78571429
 0.74193548 0.8        0.82608696 0.79310345]

mean value: 0.8056561413470056

key: test_recall
value: [0.   0.25 0.   0.5  0.5  0.75 0.2  0.2  0.4  0.25]

mean value: 0.305

key: train_recall
value: [0.58974359 0.61538462 0.51282051 0.51282051 0.61538462 0.56410256
 0.60526316 0.52631579 0.5        0.58974359]

mean value: 0.5631578947368421

key: test_roc_auc
value: [0.375      0.5        0.375      0.625      0.6875     0.8125
 0.52857143 0.52857143 0.55714286 0.55357143]

mean value: 0.5542857142857143

key: train_roc_auc
value: [0.75075415 0.76357466 0.74170437 0.72699849 0.75622172 0.73793363
 0.74466056 0.72692601 0.72101449 0.75139353]

mean value: 0.7421181630546956

key: test_jcc
value: [0.         0.16666667 0.         0.33333333 0.4        0.6
 0.16666667 0.16666667 0.28571429 0.2       ]

mean value: 0.2319047619047619

key: train_jcc
value: [0.51111111 0.53333333 0.48780488 0.46511628 0.52173913 0.48888889
 0.5        0.46511628 0.45238095 0.51111111]

mean value: 0.4936601963448495

MCC on Blind test: 0.16

Accuracy on Blind test: 0.78

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.15230823 0.0347209  0.03545165 0.0362525  0.03555393 0.0401206
 0.03871918 0.04143095 0.0373764  0.04362845]

mean value: 0.04955627918243408

key: score_time
value: [0.01079464 0.01076293 0.01020479 0.01042175 0.0106473  0.01025009
 0.01163721 0.01053095 0.01053333 0.01037621]

mean value: 0.010615921020507813

key: test_mcc
value: [1.         0.625      1.         1.         0.83666003 1.
 0.65714286 1.         0.65714286 0.81009259]

mean value: 0.8586038328120772

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.83333333 1.         1.         0.91666667 1.
 0.83333333 1.         0.83333333 0.90909091]

mean value: 0.9325757575757576

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.75       1.         1.         0.88888889 1.
 0.8        1.         0.8        0.85714286]

mean value: 0.9096031746031746

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.   0.75 1.   1.   0.8  1.   0.8  1.   0.8  1.  ]

mean value: 0.915

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 1.   1.   1.   1.   0.8  1.   0.8  0.75]

mean value: 0.91

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.8125     1.         1.         0.9375     1.
 0.82857143 1.         0.82857143 0.875     ]

mean value: 0.9282142857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.6        1.         1.         0.8        1.
 0.66666667 1.         0.66666667 0.75      ]

mean value: 0.8483333333333334

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.85

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03027558 0.01925349 0.04498887 0.04488349 0.0192492  0.02261639
 0.04425192 0.04414797 0.04453683 0.04522443]

mean value: 0.035942816734313966

key: score_time
value: [0.01195264 0.01174021 0.02140355 0.01278901 0.01185536 0.0207603
 0.02176905 0.02263951 0.02292538 0.02251673]

mean value: 0.018035173416137695

key: test_mcc
value: [ 0.125       0.25        0.11952286 -0.23904572  0.5         0.
  0.29277002  0.02857143 -0.29277002 -0.06900656]

mean value: 0.07150420117037538

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.5        0.66666667 0.58333333 0.41666667 0.66666667 0.5
 0.66666667 0.5        0.33333333 0.45454545]

mean value: 0.5287878787878788

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.5        0.44444444 0.22222222 0.66666667 0.4
 0.5        0.5        0.42857143 0.4       ]

mean value: 0.4561904761904762

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.375      0.5        0.4        0.2        0.5        0.33333333
 0.66666667 0.42857143 0.33333333 0.33333333]

mean value: 0.4070238095238095

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.5  0.25 1.   0.5  0.4  0.6  0.6  0.5 ]

mean value: 0.56

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.5625     0.625      0.5625     0.375      0.75       0.5
 0.62857143 0.51428571 0.37142857 0.46428571]

mean value: 0.5353571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.33333333 0.28571429 0.125      0.5        0.25
 0.33333333 0.33333333 0.27272727 0.25      ]

mean value: 0.30167748917748916

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.57

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02219272 0.00896144 0.00897241 0.00866389 0.00925398 0.00860715
 0.00947046 0.00968862 0.00901008 0.00871587]

mean value: 0.010353660583496094

key: score_time
value: [0.02152371 0.00910091 0.00912261 0.00854945 0.00911403 0.00849843
 0.00861621 0.00916672 0.00870585 0.00874329]

mean value: 0.010114121437072753

key: test_mcc
value: [ 0.625       0.40824829  0.11952286  0.81649658  0.          0.
 -0.09759001  0.23904572  0.31428571  0.21428571]

mean value: 0.26392948754684825

key: train_mcc
value: [0.30586639 0.30789368 0.42925551 0.36941929 0.3617205  0.43262802
 0.43009525 0.34175868 0.37917626 0.37501394]

mean value: 0.37328275032791647

key: test_accuracy
value: [0.83333333 0.75       0.58333333 0.91666667 0.58333333 0.41666667
 0.5        0.58333333 0.66666667 0.63636364]

mean value: 0.646969696969697

key: train_accuracy
value: [0.6728972  0.6635514  0.73831776 0.70093458 0.70093458 0.72897196
 0.71962617 0.68224299 0.71028037 0.69444444]

mean value: 0.7012201453790239

key: test_fscore
value: [0.75       0.57142857 0.44444444 0.85714286 0.28571429 0.46153846
 0.25       0.61538462 0.6        0.5       ]

mean value: 0.5335653235653236

key: train_fscore
value: [0.56790123 0.58139535 0.63157895 0.6097561  0.6        0.65060241
 0.65116279 0.5952381  0.60759494 0.62068966]

mean value: 0.6115919515790106

key: test_precision
value: [0.75       0.66666667 0.4        1.         0.33333333 0.33333333
 0.33333333 0.5        0.6        0.5       ]

mean value: 0.5416666666666666

key: train_precision
value: [0.54761905 0.53191489 0.64864865 0.58139535 0.58536585 0.61363636
 0.58333333 0.54347826 0.58536585 0.5625    ]

mean value: 0.5783257603878262

key: test_recall
value: [0.75 0.5  0.5  0.75 0.25 0.75 0.2  0.8  0.6  0.5 ]

mean value: 0.56

key: train_recall
value: [0.58974359 0.64102564 0.61538462 0.64102564 0.61538462 0.69230769
 0.73684211 0.65789474 0.63157895 0.69230769]

mean value: 0.6513495276653172

key: test_roc_auc
value: [0.8125     0.6875     0.5625     0.875      0.5        0.5
 0.45714286 0.61428571 0.65714286 0.60714286]

mean value: 0.6273214285714286

key: train_roc_auc
value: [0.65516591 0.65874811 0.71210407 0.68815988 0.68269231 0.72115385
 0.72349352 0.67677346 0.69260107 0.69397993]

mean value: 0.6904872105504761

key: test_jcc
value: [0.6        0.4        0.28571429 0.75       0.16666667 0.3
 0.14285714 0.44444444 0.42857143 0.33333333]

mean value: 0.38515873015873014

key: train_jcc
value: [0.39655172 0.40983607 0.46153846 0.43859649 0.42857143 0.48214286
 0.48275862 0.42372881 0.43636364 0.45      ]

mean value: 0.4410088098805133

MCC on Blind test: 0.05

Accuracy on Blind test: 0.62

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01069689 0.01261544 0.01330781 0.01351762 0.01318288 0.0139327
 0.01301622 0.01384878 0.01437545 0.01323342]

mean value: 0.013172721862792969

key: score_time
value: [0.00870061 0.01082563 0.01075101 0.0115633  0.01132607 0.01154351
 0.01151013 0.01174736 0.011554   0.01127577]

mean value: 0.011079740524291993

key: test_mcc
value: [0.81649658 0.40824829 0.11952286 0.81649658 0.47809144 0.5976143
 0.09759001 0.50709255 0.65714286 0.41833001]

mean value: 0.4916625492195567

key: train_mcc
value: [0.8843113  0.82420912 0.9600061  0.90115642 0.94025192 0.77885663
 0.70820392 0.88799637 0.97968078 0.8996469 ]

mean value: 0.8764319472697766

key: test_accuracy
value: [0.91666667 0.75       0.58333333 0.91666667 0.75       0.75
 0.5        0.75       0.83333333 0.72727273]

mean value: 0.7477272727272727

key: train_accuracy
value: [0.94392523 0.91588785 0.98130841 0.95327103 0.97196262 0.87850467
 0.8317757  0.94392523 0.99065421 0.9537037 ]

mean value: 0.9364918656974732

key: test_fscore
value: [0.85714286 0.57142857 0.44444444 0.85714286 0.66666667 0.72727273
 0.57142857 0.72727273 0.8        0.4       ]

mean value: 0.6622799422799422

key: train_fscore
value: [0.92682927 0.86956522 0.97368421 0.93150685 0.96       0.85714286
 0.80851064 0.92682927 0.98666667 0.93333333]

mean value: 0.9174068309258784

key: test_precision
value: [1.         0.66666667 0.4        1.         0.6        0.57142857
 0.44444444 0.66666667 0.8        1.        ]

mean value: 0.714920634920635

key: train_precision
value: [0.88372093 1.         1.         1.         1.         0.75
 0.67857143 0.86363636 1.         0.97222222]

mean value: 0.9148150944662573

key: test_recall
value: [0.75 0.5  0.5  0.75 0.75 1.   0.8  0.8  0.8  0.25]

mean value: 0.6900000000000001

key: train_recall
value: [0.97435897 0.76923077 0.94871795 0.87179487 0.92307692 1.
 1.         1.         0.97368421 0.8974359 ]

mean value: 0.93582995951417

key: test_roc_auc
value: [0.875      0.6875     0.5625     0.875      0.75       0.8125
 0.54285714 0.75714286 0.82857143 0.625     ]

mean value: 0.7316071428571429

key: train_roc_auc
value: [0.95041478 0.88461538 0.97435897 0.93589744 0.96153846 0.90441176
 0.86956522 0.95652174 0.98684211 0.94147157]

mean value: 0.9365637436104525

key: test_jcc
value: [0.75       0.4        0.28571429 0.75       0.5        0.57142857
 0.4        0.57142857 0.66666667 0.25      ]

mean value: 0.5145238095238095

key: train_jcc
value: [0.86363636 0.76923077 0.94871795 0.87179487 0.92307692 0.75
 0.67857143 0.86363636 0.97368421 0.875     ]

mean value: 0.8517348879190985

MCC on Blind test: 0.14

Accuracy on Blind test: 0.75

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01269102 0.01227665 0.01251817 0.01275468 0.01322985 0.01274872
 0.01255131 0.01216674 0.01289606 0.01244688]

mean value: 0.012628006935119628

key: score_time
value: [0.01155043 0.01127887 0.01133299 0.01129389 0.01123714 0.01129723
 0.01133728 0.01136446 0.01130033 0.01130509]

mean value: 0.0113297700881958

key: test_mcc
value: [ 0.81649658  0.25        0.25        0.42640143  0.625      -0.21320072
  0.07559289  0.68313005  0.65714286  0.44854261]

mean value: 0.40191057136645425

key: train_mcc
value: [0.87895928 0.73075324 0.97991726 0.59383173 0.98013354 0.26021572
 0.91962501 0.73787244 0.88799637 0.88955289]

mean value: 0.7858857488632192

key: test_accuracy
value: [0.91666667 0.66666667 0.66666667 0.75       0.83333333 0.58333333
 0.58333333 0.83333333 0.83333333 0.72727273]

mean value: 0.7393939393939394

key: train_accuracy
value: [0.94392523 0.85981308 0.99065421 0.80373832 0.99065421 0.6728972
 0.96261682 0.87850467 0.94392523 0.94444444]

mean value: 0.8991173416407061

key: test_fscore
value: [0.85714286 0.5        0.5        0.4        0.75       0.
 0.28571429 0.75       0.8        0.66666667]

mean value: 0.550952380952381

key: train_fscore
value: [0.92307692 0.83146067 0.98701299 0.63157895 0.98734177 0.18604651
 0.94444444 0.8        0.92682927 0.92857143]

mean value: 0.8146362956703995

key: test_precision
value: [1.   0.5  0.5  1.   0.75 0.   0.5  1.   0.8  0.6 ]

mean value: 0.665

key: train_precision
value: [0.92307692 0.74       1.         1.         0.975      1.
 1.         0.96296296 0.86363636 0.86666667]

mean value: 0.9331342916342916

key: test_recall
value: [0.75 0.5  0.5  0.25 0.75 0.   0.2  0.6  0.8  0.75]

mean value: 0.51

key: train_recall
value: [0.92307692 0.94871795 0.97435897 0.46153846 1.         0.1025641
 0.89473684 0.68421053 1.         1.        ]

mean value: 0.7989203778677463

key: test_roc_auc
value: [0.875      0.625      0.625      0.625      0.8125     0.4375
 0.52857143 0.8        0.82857143 0.73214286]

mean value: 0.6889285714285714

key: train_roc_auc
value: [0.93947964 0.87877074 0.98717949 0.73076923 0.99264706 0.55128205
 0.94736842 0.83485889 0.95652174 0.95652174]

mean value: 0.8775398990788007

key: test_jcc
value: [0.75       0.33333333 0.33333333 0.25       0.6        0.
 0.16666667 0.6        0.66666667 0.5       ]

mean value: 0.42

key: train_jcc
value: [0.85714286 0.71153846 0.97435897 0.46153846 0.975      0.1025641
 0.89473684 0.66666667 0.86363636 0.86666667]

mean value: 0.7373849396217818

MCC on Blind test: 0.11

Accuracy on Blind test: 0.82

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10345721 0.08764267 0.08817434 0.08749247 0.08806634 0.08874726
 0.08956051 0.08980393 0.08894587 0.0904038 ]

mean value: 0.0902294397354126

key: score_time
value: [0.01449013 0.01465034 0.01477146 0.01476502 0.01470923 0.01490712
 0.01549172 0.01508617 0.01505709 0.01503992]

mean value: 0.014896821975708009

key: test_mcc
value: [1.         0.625      0.625      0.63245553 0.83666003 0.83666003
 0.50709255 1.         0.83666003 0.81009259]

mean value: 0.7709620751773996

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.83333333 0.83333333 0.83333333 0.91666667 0.91666667
 0.75       1.         0.91666667 0.90909091]

mean value: 0.8909090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.75       0.75       0.66666667 0.88888889 0.88888889
 0.72727273 1.         0.88888889 0.85714286]

mean value: 0.8417748917748917

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       0.75       1.         0.8        0.8
 0.66666667 1.         1.         1.        ]

mean value: 0.8766666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.5  1.   1.   0.8  1.   0.8  0.75]

mean value: 0.835

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.8125     0.8125     0.75       0.9375     0.9375
 0.75714286 1.         0.9        0.875     ]

mean value: 0.8782142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.6        0.6        0.5        0.8        0.8
 0.57142857 1.         0.8        0.75      ]

mean value: 0.7421428571428571

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.77

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03509355 0.03801394 0.04573321 0.04925704 0.04040885 0.02893972
 0.03013682 0.03751421 0.04837871 0.04894233]

mean value: 0.04024183750152588

key: score_time
value: [0.02462077 0.03305888 0.04407382 0.03202057 0.02337003 0.02233243
 0.02474475 0.03469205 0.02849483 0.033566  ]

mean value: 0.030097413063049316

key: test_mcc
value: [1.         0.625      0.81649658 1.         0.83666003 1.
 0.83666003 1.         0.65714286 0.81009259]

mean value: 0.8582052078439717

key: train_mcc
value: [1.         0.97991726 1.         0.95965309 0.97991726 1.
 0.97991726 1.         1.         0.95986622]

mean value: 0.9859271082573318

key: test_accuracy
value: [1.         0.83333333 0.91666667 1.         0.91666667 1.
 0.91666667 1.         0.83333333 0.90909091]

mean value: 0.9325757575757576

key: train_accuracy
value: [1.         0.99065421 1.         0.98130841 0.99065421 1.
 0.99065421 1.         1.         0.98148148]

mean value: 0.9934752509518865

key: test_fscore
value: [1.         0.75       0.85714286 1.         0.88888889 1.
 0.88888889 1.         0.8        0.85714286]

mean value: 0.9042063492063492

key: train_fscore
value: [1.         0.98701299 1.         0.97435897 0.98701299 1.
 0.98701299 1.         1.         0.97435897]

mean value: 0.9909756909756909

key: test_precision
value: [1.   0.75 1.   1.   0.8  1.   1.   1.   0.8  1.  ]

mean value: 0.935

key: train_precision
value: [1.         1.         1.         0.97435897 1.         1.
 0.97435897 1.         1.         0.97435897]

mean value: 0.9923076923076923

key: test_recall
value: [1.   0.75 0.75 1.   1.   1.   0.8  1.   0.8  0.75]

mean value: 0.885

key: train_recall
value: [1.         0.97435897 1.         0.97435897 0.97435897 1.
 1.         1.         1.         0.97435897]

mean value: 0.9897435897435898

key: test_roc_auc
value: [1.         0.8125     0.875      1.         0.9375     1.
 0.9        1.         0.82857143 0.875     ]

mean value: 0.9228571428571428

key: train_roc_auc
value: [1.         0.98717949 1.         0.97982655 0.98717949 1.
 0.99275362 1.         1.         0.97993311]

mean value: 0.992687225391829

key: test_jcc
value: [1.         0.6        0.75       1.         0.8        1.
 0.8        1.         0.66666667 0.75      ]

mean value: 0.8366666666666667

key: train_jcc
value: [1.         0.97435897 1.         0.95       0.97435897 1.
 0.97435897 1.         1.         0.95      ]

mean value: 0.9823076923076923

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.02343273 0.01722765 0.01697659 0.02735806 0.04024768 0.04029274
 0.02456927 0.03989363 0.03921294 0.04148316]

mean value: 0.031069445610046386

key: score_time
value: [0.01215196 0.01174808 0.01172161 0.02232957 0.01338077 0.0117867
 0.01184607 0.02344608 0.0225389  0.02162814]

mean value: 0.01625778675079346

key: test_mcc
value: [ 0.          0.          0.25        0.15811388  0.15811388  0.15811388
  0.11952286  0.68313005  0.68313005 -0.03857584]

mean value: 0.21715487745961196

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.5        0.58333333 0.66666667 0.66666667 0.66666667 0.66666667
 0.58333333 0.83333333 0.83333333 0.54545455]

mean value: 0.6545454545454545

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.4        0.28571429 0.5        0.33333333 0.33333333 0.33333333
 0.44444444 0.75       0.75       0.28571429]

mean value: 0.4415873015873016

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.33333333 0.33333333 0.5        0.5        0.5        0.5
 0.5        1.         1.         0.33333333]

mean value: 0.55

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.25 0.5  0.25 0.25 0.25 0.4  0.6  0.6  0.25]

mean value: 0.385

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.5        0.5        0.625      0.5625     0.5625     0.5625
 0.55714286 0.8        0.8        0.48214286]

mean value: 0.5951785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.25       0.16666667 0.33333333 0.2        0.2        0.2
 0.28571429 0.6        0.6        0.16666667]

mean value: 0.30023809523809525

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.05

Accuracy on Blind test: 0.75

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.22485304 0.20807576 0.21185255 0.21056032 0.20868492 0.21106625
 0.22972059 0.21120667 0.23677921 0.1712718 ]

mean value: 0.21240711212158203

key: score_time
value: [0.00908399 0.00889945 0.00889659 0.00886393 0.00904346 0.00892591
 0.00900364 0.0089159  0.00945544 0.0089376 ]

mean value: 0.009002590179443359

key: test_mcc
value: [1.         0.625      0.625      0.81649658 0.83666003 1.
 0.65714286 1.         0.65714286 0.60714286]

mean value: 0.7824585178890373

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.83333333 0.83333333 0.91666667 0.91666667 1.
 0.83333333 1.         0.83333333 0.81818182]

mean value: 0.8984848484848484

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.75       0.75       0.85714286 0.88888889 1.
 0.8        1.         0.8        0.75      ]

mean value: 0.8596031746031746

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.   0.75 0.75 1.   0.8  1.   0.8  1.   0.8  0.75]

mean value: 0.865

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.75 0.75 1.   1.   0.8  1.   0.8  0.75]

mean value: 0.86

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.8125     0.8125     0.875      0.9375     1.
 0.82857143 1.         0.82857143 0.80357143]

mean value: 0.8898214285714285

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.6        0.6        0.75       0.8        1.
 0.66666667 1.         0.66666667 0.6       ]

mean value: 0.7683333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.0173173  0.01490331 0.01552176 0.01563334 0.01692104 0.0218811
 0.01528478 0.02272177 0.01535749 0.01550364]

mean value: 0.017104554176330566

key: score_time
value: [0.01197457 0.01160645 0.01172018 0.01262617 0.01268053 0.01280403
 0.0124104  0.01255608 0.01246977 0.0125246 ]

mean value: 0.012337279319763184

key: test_mcc
value: [ 0.42640143 -0.125       0.15811388 -0.23904572 -0.23904572  0.15811388
 -0.02857143 -0.09759001  0.68313005 -0.17857143]

mean value: 0.051793494162056405

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.5        0.66666667 0.41666667 0.41666667 0.66666667
 0.5        0.5        0.83333333 0.45454545]

mean value: 0.5704545454545454

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.4        0.25       0.33333333 0.22222222 0.22222222 0.33333333
 0.4        0.25       0.75       0.25      ]

mean value: 0.3411111111111111

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.25       0.5        0.2        0.2        0.5
 0.4        0.33333333 1.         0.25      ]

mean value: 0.4633333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.25 0.25 0.25 0.25 0.25 0.25 0.4  0.2  0.6  0.25]

mean value: 0.295

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.625      0.4375     0.5625     0.375      0.375      0.5625
 0.48571429 0.45714286 0.8        0.41071429]

mean value: 0.5091071428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.25       0.14285714 0.2        0.125      0.125      0.2
 0.25       0.14285714 0.6        0.14285714]

mean value: 0.21785714285714286

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.05

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02931094 0.03452849 0.03164291 0.01259708 0.01256371 0.0125587
 0.01257324 0.0173099  0.03186011 0.01252818]

mean value: 0.02074732780456543

key: score_time
value: [0.02166176 0.02129889 0.01981616 0.01143384 0.01167202 0.0114224
 0.01144147 0.02194834 0.01146293 0.01149607]

mean value: 0.015365386009216308

key: test_mcc
value: [1.         0.40824829 0.11952286 0.81649658 0.35355339 0.625
 0.29277002 0.65714286 0.65714286 0.38575837]

mean value: 0.5315635233993806

key: train_mcc
value: [0.95965309 0.97991726 0.9600061  0.95965309 0.95965309 0.95965309
 0.95919146 0.95919146 0.93862091 0.93965322]

mean value: 0.9575192777485965

key: test_accuracy
value: [1.         0.75       0.58333333 0.91666667 0.66666667 0.83333333
 0.66666667 0.83333333 0.83333333 0.72727273]

mean value: 0.781060606060606

key: train_accuracy
value: [0.98130841 0.99065421 0.98130841 0.98130841 0.98130841 0.98130841
 0.98130841 0.98130841 0.97196262 0.97222222]

mean value: 0.9803997923156802

key: test_fscore
value: [1.         0.57142857 0.44444444 0.85714286 0.6        0.75
 0.5        0.8        0.8        0.57142857]

mean value: 0.6894444444444445

key: train_fscore
value: [0.97435897 0.98701299 0.97368421 0.97435897 0.97435897 0.97435897
 0.97368421 0.97368421 0.96       0.96103896]

mean value: 0.9726540477066793

key: test_precision
value: [1.         0.66666667 0.4        1.         0.5        0.75
 0.66666667 0.8        0.8        0.66666667]

mean value: 0.725

key: train_precision
value: [0.97435897 1.         1.         0.97435897 0.97435897 0.97435897
 0.97368421 0.97368421 0.97297297 0.97368421]

mean value: 0.9791461501987818

key: test_recall
value: [1.   0.5  0.5  0.75 0.75 0.75 0.4  0.8  0.8  0.5 ]

mean value: 0.675

key: train_recall
value: [0.97435897 0.97435897 0.94871795 0.97435897 0.97435897 0.97435897
 0.97368421 0.97368421 0.94736842 0.94871795]

mean value: 0.9663967611336032

key: test_roc_auc
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[1.         0.6875     0.5625     0.875      0.6875     0.8125
 0.62857143 0.82857143 0.82857143 0.67857143]

mean value: 0.7589285714285714

key: train_roc_auc
value: [0.97982655 0.98717949 0.97435897 0.97982655 0.97982655 0.97982655
 0.97959573 0.97959573 0.96643783 0.9671126 ]

mean value: 0.9773586533715757

key: test_jcc
value: [1.         0.4        0.28571429 0.75       0.42857143 0.6
 0.33333333 0.66666667 0.66666667 0.4       ]

mean value: 0.5530952380952381

key: train_jcc
value: [0.95       0.97435897 0.94871795 0.95       0.95       0.95
 0.94871795 0.94871795 0.92307692 0.925     ]

mean value: 0.9468589743589744

MCC on Blind test: 0.13

Accuracy on Blind test: 0.71

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.11067224 0.12599802 0.18228793 0.18764257 0.19317532 0.18818402
 0.26005721 0.21987581 0.09147549 0.16718626]

mean value: 0.17265548706054687

key: score_time
value: [0.01199269 0.02052832 0.01966453 0.02230549 0.02030396 0.02112412
 0.02278662 0.01183152 0.01173759 0.01176786]

mean value: 0.01740427017211914

key: test_mcc
value: [0.63245553 0.40824829 0.11952286 0.81649658 0.35355339 0.35355339
 0.29277002 0.65714286 0.65714286 0.38575837]

mean value: 0.46766441566207556

key: train_mcc
value: [0.75771445 0.97991726 0.9600061  0.95965309 0.95965309 1.
 0.95919146 0.95919146 0.93862091 0.93965322]

mean value: 0.9413601047309211

key: test_accuracy
value: [0.83333333 0.75       0.58333333 0.91666667 0.66666667 0.66666667
 0.66666667 0.83333333 0.83333333 0.72727273]

mean value: 0.7477272727272727

key: train_accuracy
value: [0.88785047 0.99065421 0.98130841 0.98130841 0.98130841 1.
 0.98130841 0.98130841 0.97196262 0.97222222]

mean value: 0.9729231568016614

key: test_fscore
value: [0.66666667 0.57142857 0.44444444 0.85714286 0.6        0.6
 0.5        0.8        0.8        0.57142857]

mean value: 0.6411111111111112

key: train_fscore
value: [0.82857143 0.98701299 0.97368421 0.97435897 0.97435897 1.
 0.97368421 0.97368421 0.96       0.96103896]

mean value: 0.9606393956920273

key: test_precision
value: [1.         0.66666667 0.4        1.         0.5        0.5
 0.66666667 0.8        0.8        0.66666667]

mean value: 0.7

key: train_precision
value: [0.93548387 1.         1.         0.97435897 0.97435897 1.
 0.97368421 0.97368421 0.97297297 0.97368421]

mean value: 0.9778227424237611

key: test_recall
value: [0.5  0.5  0.5  0.75 0.75 0.75 0.4  0.8  0.8  0.5 ]

mean value: 0.625

key: train_recall
value: [0.74358974 0.97435897 0.94871795 0.97435897 0.97435897 1.
 0.97368421 0.97368421 0.94736842 0.94871795]

mean value: 0.9458839406207827

key: test_roc_auc
value: [0.75       0.6875     0.5625     0.875      0.6875     0.6875
 0.62857143 0.82857143 0.82857143 0.67857143]

mean value: 0.7214285714285714

key: train_roc_auc
value: [0.85708899 0.98717949 0.97435897 0.97982655 0.97982655 1.
 0.97959573 0.97959573 0.96643783 0.9671126 ]

mean value: 0.9671022431151655

key: test_jcc
value: [0.5        0.4        0.28571429 0.75       0.42857143 0.42857143
 0.33333333 0.66666667 0.66666667 0.4       ]

mean value: 0.4859523809523809

key: train_jcc
value: [0.70731707 0.97435897 0.94871795 0.95       0.95       1.
 0.94871795 0.94871795 0.92307692 0.925     ]

mean value: 0.9275906816760475

MCC on Blind test: 0.13

Accuracy on Blind test: 0.71

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02570605 0.02531195 0.02237129 0.02603054 0.03007078 0.02120304
 0.02243948 0.02645206 0.02500463 0.02917624]

mean value: 0.025376605987548827

key: score_time
value: [0.01148796 0.01151919 0.01135206 0.01143742 0.01142955 0.01138592
 0.01134014 0.01146054 0.01140571 0.01153827]

mean value: 0.011435675621032714

key: test_mcc
value: [0.67419986 0.37796447 0.37796447 0.73214286 0.76376262 0.6000992
 0.60714286 1.         0.73214286 0.6000992 ]

mean value: 0.64655183920342

key: train_mcc
value: [0.91176471 0.86774089 0.94323594 0.88320546 0.91281179 0.91240409
 0.88320546 0.91240409 0.91240409 0.89791134]

mean value: 0.9037087856483232

key: test_accuracy
value: [0.8125     0.6875     0.66666667 0.86666667 0.86666667 0.8
 0.8        1.         0.86666667 0.8       ]

mean value: 0.8166666666666667

key: train_accuracy
value: [0.95588235 0.93382353 0.97080292 0.94160584 0.95620438 0.95620438
 0.94160584 0.95620438 0.95620438 0.94890511]

mean value: 0.9517443108630314

key: test_fscore
value: [0.76923077 0.66666667 0.70588235 0.85714286 0.875      0.76923077
 0.8        1.         0.875      0.82352941]

mean value: 0.8141682826976945

key: train_fscore
value: [0.95588235 0.93333333 0.97014925 0.94202899 0.95588235 0.95652174
 0.94117647 0.95588235 0.95588235 0.94890511]

mean value: 0.951564430354435

key: test_precision
value: [1.         0.71428571 0.6        0.85714286 0.77777778 0.83333333
 0.85714286 1.         0.875      0.77777778]

mean value: 0.8292460317460317

key: train_precision
value: [0.95588235 0.94029851 1.         0.94202899 0.97014925 0.95652174
 0.94117647 0.95588235 0.95588235 0.94202899]

mean value: 0.9559851000750722

key: test_recall
value: [0.625      0.625      0.85714286 0.85714286 1.         0.71428571
 0.75       1.         0.875      0.875     ]

mean value: 0.8178571428571428

key: train_recall
value: [0.95588235 0.92647059 0.94202899 0.94202899 0.94202899 0.95652174
 0.94117647 0.95588235 0.95588235 0.95588235]

mean value: 0.9473785166240409

key: test_roc_auc
value: [0.8125     0.6875     0.67857143 0.86607143 0.875      0.79464286
 0.80357143 1.         0.86607143 0.79464286]

mean value: 0.8178571428571428

key: train_roc_auc
value: [0.95588235 0.93382353 0.97101449 0.94160273 0.95630861 0.95620205
 0.94160273 0.95620205 0.95620205 0.94895567]

mean value: 0.9517796248934357

key: test_jcc
value: [0.625      0.5        0.54545455 0.75       0.77777778 0.625
 0.66666667 1.         0.77777778 0.7       ]

mean value: 0.6967676767676767

key: train_jcc
value: [0.91549296 0.875      0.94202899 0.89041096 0.91549296 0.91666667
 0.88888889 0.91549296 0.91549296 0.90277778]

mean value: 0.9077745108730605

MCC on Blind test: 0.11

Accuracy on Blind test: 0.64

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.59272099 0.73302746 0.57024789 0.63847446 0.81539202 0.66339469
 0.66922975 1.00259376 0.63085222 0.65632081]

mean value: 0.6972254037857055

key: score_time
value: [0.01941419 0.03988981 0.01182389 0.01283121 0.01295757 0.01300907
 0.02492142 0.0128777  0.01295567 0.01319575]

mean value: 0.01738762855529785

key: test_mcc
value: [0.77459667 0.75       0.60714286 1.         0.66143783 0.73214286
 0.64465837 0.87287156 0.32732684 0.875     ]

mean value: 0.7245176978811607

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.875      0.8        1.         0.8        0.86666667
 0.8        0.93333333 0.66666667 0.93333333]

mean value: 0.855

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.875      0.8        1.         0.82352941 0.85714286
 0.84210526 0.94117647 0.70588235 0.93333333]

mean value: 0.863531254607106

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.75       1.         0.7        0.85714286
 0.72727273 0.88888889 0.66666667 1.        ]

mean value: 0.846497113997114

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.875      0.85714286 1.         1.         0.85714286
 1.         1.         0.75       0.875     ]

mean value: 0.8964285714285715

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.875      0.80357143 1.         0.8125     0.86607143
 0.78571429 0.92857143 0.66071429 0.9375    ]

mean value: 0.8544642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.77777778 0.66666667 1.         0.7        0.75
 0.72727273 0.88888889 0.54545455 0.875     ]

mean value: 0.7681060606060606

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.68

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01221609 0.01339197 0.00892854 0.00857139 0.00862551 0.00868034
 0.00853682 0.00880527 0.00853062 0.00853968]

mean value: 0.009482622146606445

key: score_time
value: [0.01381898 0.00916982 0.00887036 0.00863338 0.00858545 0.00860238
 0.0085628  0.00857377 0.00856447 0.00868702]

mean value: 0.009206843376159669

key: test_mcc
value: [ 0.          0.25819889  0.46428571  0.37796447  0.13363062 -0.04029115
  0.18898224 -0.07142857  0.47245559  0.04029115]

mean value: 0.1824088954335891

key: train_mcc
value: [0.42192651 0.42192651 0.39817312 0.37138669 0.42963201 0.44352148
 0.42304382 0.38248522 0.37510519 0.40676841]

mean value: 0.40739689589467604

key: test_accuracy
value: [0.5        0.625      0.73333333 0.66666667 0.53333333 0.46666667
 0.6        0.46666667 0.73333333 0.53333333]

mean value: 0.5858333333333333

key: train_accuracy
value: [0.69852941 0.69852941 0.68613139 0.67153285 0.70072993 0.71532847
 0.70072993 0.68613139 0.67153285 0.68613139]

mean value: 0.6915306998711893

key: test_fscore
value: [0.6        0.66666667 0.71428571 0.70588235 0.63157895 0.55555556
 0.66666667 0.5        0.77777778 0.63157895]

mean value: 0.6449992628630399

key: train_fscore
value: [0.74213836 0.74213836 0.73619632 0.72727273 0.74846626 0.7483871
 0.7388535  0.71523179 0.72392638 0.73619632]

mean value: 0.7358807120944473

key: test_precision
value: [0.5        0.6        0.71428571 0.6        0.5        0.45454545
 0.6        0.5        0.7        0.54545455]

mean value: 0.5714285714285714

key: train_precision
value: [0.64835165 0.64835165 0.63829787 0.625      0.64893617 0.6744186
 0.65168539 0.65060241 0.62105263 0.63157895]

mean value: 0.6438275325752001

key: test_recall
value: [0.75       0.75       0.71428571 0.85714286 0.85714286 0.71428571
 0.75       0.5        0.875      0.75      ]

mean value: 0.7517857142857143

key: train_recall
value: [0.86764706 0.86764706 0.86956522 0.86956522 0.88405797 0.84057971
 0.85294118 0.79411765 0.86764706 0.88235294]

mean value: 0.85961210571185

key: test_roc_auc
value: [0.5        0.625      0.73214286 0.67857143 0.55357143 0.48214286
 0.58928571 0.46428571 0.72321429 0.51785714]

mean value: 0.5866071428571429

key: train_roc_auc
value: [0.69852941 0.69852941 0.68478261 0.67007673 0.69938193 0.7144075
 0.70183291 0.6869139  0.67295396 0.68755328]

mean value: 0.6914961636828644

key: test_jcc
value: [0.42857143 0.5        0.55555556 0.54545455 0.46153846 0.38461538
 0.5        0.33333333 0.63636364 0.46153846]

mean value: 0.4806970806970807

key: train_jcc
value: [0.59       0.59       0.58252427 0.57142857 0.59803922 0.59793814
 0.58585859 0.55670103 0.56730769 0.58252427]

mean value: 0.5822321784228176

MCC on Blind test: 0.01

Accuracy on Blind test: 0.38

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00890756 0.00875044 0.00880671 0.00884891 0.00875044 0.008816
 0.0088191  0.0089426  0.00888658 0.00880456]

mean value: 0.00883328914642334

key: score_time
value: [0.00866461 0.00874972 0.00858355 0.00862575 0.00877166 0.0087378
 0.00872993 0.00858498 0.00864148 0.00872946]

mean value: 0.008681893348693848

key: test_mcc
value: [0.         0.25       0.19642857 0.56407607 0.49099025 0.26189246
 0.19642857 0.47245559 0.46428571 0.64465837]

mean value: 0.35412156107816933

key: train_mcc
value: [0.5144434  0.54470478 0.54363493 0.50392153 0.52716245 0.54721488
 0.48424445 0.51877014 0.50571263 0.5019846 ]

mean value: 0.5191793798033469

key: test_accuracy
value: [0.5        0.625      0.6        0.73333333 0.73333333 0.6
 0.6        0.73333333 0.73333333 0.8       ]

mean value: 0.6658333333333333

key: train_accuracy
value: [0.75       0.76470588 0.76642336 0.74452555 0.75912409 0.76642336
 0.72992701 0.75182482 0.74452555 0.74452555]

mean value: 0.7522005152425933

key: test_fscore
value: [0.6        0.625      0.57142857 0.77777778 0.75       0.66666667
 0.625      0.77777778 0.75       0.84210526]

mean value: 0.6985756056808688

key: train_fscore
value: [0.77631579 0.78947368 0.78947368 0.77419355 0.78145695 0.79220779
 0.76433121 0.77631579 0.77124183 0.76821192]

mean value: 0.7783222202391937

key: test_precision
value: [0.5        0.625      0.57142857 0.63636364 0.66666667 0.54545455
 0.625      0.7        0.75       0.72727273]

mean value: 0.6347186147186147

key: train_precision
value: [0.70238095 0.71428571 0.72289157 0.69767442 0.7195122  0.71764706
 0.6741573  0.70238095 0.69411765 0.69879518]

mean value: 0.7043842989015312

key: test_recall
value: [0.75       0.625      0.57142857 1.         0.85714286 0.85714286
 0.625      0.875      0.75       1.        ]

mean value: 0.7910714285714285

key: train_recall
value: [0.86764706 0.88235294 0.86956522 0.86956522 0.85507246 0.88405797
 0.88235294 0.86764706 0.86764706 0.85294118]

mean value: 0.8698849104859335

key: test_roc_auc
value: [0.5        0.625      0.59821429 0.75       0.74107143 0.61607143
 0.59821429 0.72321429 0.73214286 0.78571429]

mean value: 0.6669642857142857

key: train_roc_auc
value: [0.75       0.76470588 0.76566496 0.74360614 0.75841858 0.7655584
 0.73103154 0.75266411 0.74541773 0.74531117]

mean value: 0.7522378516624041

key: test_jcc
value: [0.42857143 0.45454545 0.4        0.63636364 0.6        0.5
 0.45454545 0.63636364 0.6        0.72727273]

mean value: 0.5437662337662338

key: train_jcc
value: [0.6344086  0.65217391 0.65217391 0.63157895 0.64130435 0.65591398
 0.6185567  0.6344086  0.62765957 0.62365591]

mean value: 0.6371834493554671

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00872469 0.00941682 0.0093987  0.0094955  0.00943923 0.00875735
 0.00831437 0.00911641 0.0095017  0.00948215]

mean value: 0.009164690971374512

key: score_time
value: [0.00979877 0.01038027 0.01041031 0.0104022  0.01029658 0.01435733
 0.0094676  0.01413035 0.010638   0.01043248]

mean value: 0.011031389236450195

key: test_mcc
value: [0.37796447 0.37796447 0.26189246 0.66143783 0.32732684 0.13363062
 0.32732684 0.47245559 0.46428571 0.64465837]

mean value: 0.4048943205524593

key: train_mcc
value: [0.56273143 0.60616144 0.62969553 0.61333523 0.55409877 0.65057816
 0.66024878 0.61423342 0.58468637 0.64091263]

mean value: 0.6116681772279047

key: test_accuracy
value: [0.625      0.6875     0.6        0.8        0.66666667 0.53333333
 0.66666667 0.73333333 0.73333333 0.8       ]

mean value: 0.6845833333333333

key: train_accuracy
value: [0.77941176 0.80147059 0.81021898 0.80291971 0.77372263 0.82481752
 0.82481752 0.80291971 0.78832117 0.81751825]

mean value: 0.8026137827393731

key: test_fscore
value: [0.72727273 0.66666667 0.66666667 0.82352941 0.61538462 0.63157895
 0.70588235 0.77777778 0.75       0.84210526]

mean value: 0.7206864429000651

key: train_fscore
value: [0.79166667 0.81118881 0.82666667 0.81879195 0.79194631 0.83098592
 0.83783784 0.81632653 0.80272109 0.82758621]

mean value: 0.8155717978830668

key: test_precision
value: [0.57142857 0.71428571 0.54545455 0.7        0.66666667 0.5
 0.66666667 0.7        0.75       0.72727273]

mean value: 0.6541774891774892

key: train_precision
value: [0.75       0.77333333 0.7654321  0.7625     0.7375     0.80821918
 0.775      0.75949367 0.74683544 0.77922078]

mean value: 0.7657534503325787

key: test_recall
value: [1.         0.625      0.85714286 1.         0.57142857 0.85714286
 0.75       0.875      0.75       1.        ]

mean value: 0.8285714285714285

key: train_recall
value: [0.83823529 0.85294118 0.89855072 0.88405797 0.85507246 0.85507246
 0.91176471 0.88235294 0.86764706 0.88235294]

mean value: 0.8728047740835465

key: test_roc_auc
value: [0.625      0.6875     0.61607143 0.8125     0.66071429 0.55357143
 0.66071429 0.72321429 0.73214286 0.78571429]

mean value: 0.6857142857142857

key: train_roc_auc
value: [0.77941176 0.80147059 0.80956948 0.8023231  0.77312447 0.82459506
 0.82544757 0.80349531 0.78889599 0.81798806]

mean value: 0.8026321398124467

key: test_jcc
value: [0.57142857 0.5        0.5        0.7        0.44444444 0.46153846
 0.54545455 0.63636364 0.6        0.72727273]

mean value: 0.5686502386502387

key: train_jcc
value: [0.65517241 0.68235294 0.70454545 0.69318182 0.65555556 0.71084337
 0.72093023 0.68965517 0.67045455 0.70588235]

mean value: 0.6888573860114033

MCC on Blind test: 0.04

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01179934 0.01012301 0.01003814 0.01017427 0.00991893 0.00987267
 0.00998878 0.01008558 0.01007128 0.01131511]

mean value: 0.010338711738586425

key: score_time
value: [0.00999904 0.00925827 0.00890064 0.00903559 0.00892496 0.00906014
 0.00908375 0.0089345  0.00948644 0.00994992]

mean value: 0.009263324737548827

key: test_mcc
value: [0.40451992 0.5        0.19642857 0.60714286 0.875      0.46428571
 0.76376262 0.73214286 0.49099025 0.6000992 ]

mean value: 0.563437198448388

key: train_mcc
value: [0.84051051 0.86774089 0.81031543 0.82480818 0.82480818 0.76668815
 0.83951407 0.81027501 0.83947987 0.88320546]

mean value: 0.8307345767586535

key: test_accuracy
value: [0.6875     0.75       0.6        0.8        0.93333333 0.73333333
 0.86666667 0.86666667 0.73333333 0.8       ]

mean value: 0.7770833333333333

key: train_accuracy
value: [0.91911765 0.93382353 0.90510949 0.91240876 0.91240876 0.88321168
 0.91970803 0.90510949 0.91970803 0.94160584]

mean value: 0.9152211249463289

key: test_fscore
value: [0.61538462 0.75       0.57142857 0.8        0.93333333 0.71428571
 0.85714286 0.875      0.71428571 0.82352941]

mean value: 0.7654390217625512

key: train_fscore
value: [0.91603053 0.93430657 0.90510949 0.91304348 0.91304348 0.88571429
 0.91970803 0.9037037  0.91851852 0.94117647]

mean value: 0.9150354556988868

key: test_precision
value: [0.8        0.75       0.57142857 0.75       0.875      0.71428571
 1.         0.875      0.83333333 0.77777778]

mean value: 0.7946825396825397

key: train_precision
value: [0.95238095 0.92753623 0.91176471 0.91304348 0.91304348 0.87323944
 0.91304348 0.91044776 0.92537313 0.94117647]

mean value: 0.9181049127660313

key: test_recall
value: [0.5        0.75       0.57142857 0.85714286 1.         0.71428571
 0.75       0.875      0.625      0.875     ]

mean value: 0.7517857142857143

key: train_recall
value: [0.88235294 0.94117647 0.89855072 0.91304348 0.91304348 0.89855072
 0.92647059 0.89705882 0.91176471 0.94117647]

mean value: 0.9123188405797101

key: test_roc_auc
value: [0.6875     0.75       0.59821429 0.80357143 0.9375     0.73214286
 0.875      0.86607143 0.74107143 0.79464286]

mean value: 0.7785714285714286

key: train_roc_auc
value: [0.91911765 0.93382353 0.90515772 0.91240409 0.91240409 0.88309889
 0.91975703 0.90505115 0.91965047 0.94160273]

mean value: 0.9152067348678602

key: test_jcc
value: [0.44444444 0.6        0.4        0.66666667 0.875      0.55555556
 0.75       0.77777778 0.55555556 0.7       ]

mean value: 0.6325

key: train_jcc
value: [0.84507042 0.87671233 0.82666667 0.84       0.84       0.79487179
 0.85135135 0.82432432 0.84931507 0.88888889]

mean value: 0.8437200845898511

MCC on Blind test: -0.0

Accuracy on Blind test: 0.65

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.56858373 0.64376926 0.4885788  0.55860162 0.62052274 0.54586768
 0.57091928 0.53351116 0.68408871 0.53369379]

mean value: 0.5748136758804321

key: score_time
value: [0.01217818 0.01202369 0.01197481 0.01199365 0.01205015 0.012012
 0.0120542  0.01204777 0.01204062 0.01203394]

mean value: 0.01204090118408203

key: test_mcc
value: [0.51639778 0.62994079 0.19642857 1.         0.66143783 0.73214286
 0.6000992  0.75592895 0.73214286 0.6000992 ]

mean value: 0.642461802363988

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.8125     0.6        1.         0.8        0.86666667
 0.8        0.86666667 0.86666667 0.8       ]

mean value: 0.81625

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.8        0.57142857 1.         0.82352941 0.85714286
 0.82352941 0.88888889 0.875      0.82352941]

mean value: 0.8177334267040149

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.83333333 0.85714286 0.57142857 1.         0.7        0.85714286
 0.77777778 0.8        0.875      0.77777778]

mean value: 0.8049603174603175

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.625      0.75       0.57142857 1.         1.         0.85714286
 0.875      1.         0.875      0.875     ]

mean value: 0.8428571428571429

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.8125     0.59821429 1.         0.8125     0.86607143
 0.79464286 0.85714286 0.86607143 0.79464286]

mean value: 0.8151785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.66666667 0.4        1.         0.7        0.75
 0.7        0.8        0.77777778 0.7       ]

mean value: 0.705

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.66

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01505685 0.01453471 0.01128817 0.01089859 0.01046872 0.01097035
 0.01134324 0.0110693  0.0112195  0.01129436]

mean value: 0.011814379692077636

key: score_time
value: [0.01152134 0.00902057 0.00866437 0.00855374 0.00851226 0.00853801
 0.00877357 0.00879598 0.00872588 0.00893998]

mean value: 0.009004569053649903

key: test_mcc
value: [0.8819171  0.75       0.875      0.875      0.73214286 0.87287156
 1.         1.         0.73214286 0.87287156]

mean value: 0.859194593986185

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.93333333 0.93333333 0.86666667 0.93333333
 1.         1.         0.86666667 0.93333333]

mean value: 0.9279166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.875      0.93333333 0.93333333 0.85714286 0.92307692
 1.         1.         0.875      0.94117647]

mean value: 0.9279239388062918

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.875      0.875      0.875      0.85714286 1.
 1.         1.         0.875      0.88888889]

mean value: 0.9134920634920635

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 1.         1.         0.875      1.        ]

mean value: 0.9464285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      0.9375     0.9375     0.86607143 0.92857143
 1.         1.         0.86607143 0.92857143]

mean value: 0.9276785714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.77777778 0.875      0.875      0.75       0.85714286
 1.         1.         0.77777778 0.88888889]

mean value: 0.8690476190476191

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08812714 0.08732772 0.08859634 0.08848572 0.08767033 0.08787918
 0.08768225 0.08852649 0.08794308 0.08818316]

mean value: 0.08804214000701904

key: score_time
value: [0.01718473 0.01735854 0.01743412 0.01702428 0.01767778 0.01741862
 0.01739454 0.01736569 0.01758718 0.01760912]

mean value: 0.01740546226501465

key: test_mcc
value: [0.77459667 0.37796447 0.49099025 0.73214286 0.875      0.73214286
 0.73214286 0.875      0.66143783 0.32732684]

mean value: 0.6578744629830401

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.6875     0.73333333 0.86666667 0.93333333 0.86666667
 0.86666667 0.93333333 0.8        0.66666667]

mean value: 0.8229166666666666

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.66666667 0.75       0.85714286 0.93333333 0.85714286
 0.875      0.93333333 0.76923077 0.70588235]

mean value: 0.820487502693385

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.71428571 0.66666667 0.85714286 0.875      0.85714286
 0.875      1.         1.         0.66666667]

mean value: 0.8511904761904762

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.625      0.85714286 0.85714286 1.         0.85714286
 0.875      0.875      0.625      0.75      ]

mean value: 0.8071428571428572

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.6875     0.74107143 0.86607143 0.9375     0.86607143
 0.86607143 0.9375     0.8125     0.66071429]

mean value: 0.825

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.5        0.6        0.75       0.875      0.75
 0.77777778 0.875      0.625      0.54545455]

mean value: 0.7048232323232323

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.74

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00902843 0.00919366 0.00891805 0.00889707 0.00913405 0.0090549
 0.00896406 0.00876498 0.00908923 0.00890517]

mean value: 0.008994960784912109

key: score_time
value: [0.0090282  0.00873184 0.00872302 0.00881553 0.00873852 0.00867462
 0.00864935 0.00869679 0.00871325 0.00864553]

mean value: 0.008741664886474609

key: test_mcc
value: [0.51639778 0.37796447 0.49099025 0.09449112 0.60714286 0.19642857
 0.33928571 0.32732684 0.6000992  0.46428571]

mean value: 0.40144125144326637

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.6875     0.73333333 0.53333333 0.8        0.6
 0.66666667 0.66666667 0.8        0.73333333]

mean value: 0.6970833333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.77777778 0.66666667 0.75       0.58823529 0.8        0.57142857
 0.66666667 0.70588235 0.82352941 0.75      ]

mean value: 0.7100186741363211

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.71428571 0.66666667 0.5        0.75       0.57142857
 0.71428571 0.66666667 0.77777778 0.75      ]

mean value: 0.6811111111111111

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.625      0.85714286 0.71428571 0.85714286 0.57142857
 0.625      0.75       0.875      0.75      ]

mean value: 0.75

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.6875     0.74107143 0.54464286 0.80357143 0.59821429
 0.66964286 0.66071429 0.79464286 0.73214286]

mean value: 0.6982142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63636364 0.5        0.6        0.41666667 0.66666667 0.4
 0.5        0.54545455 0.7        0.6       ]

mean value: 0.5565151515151515

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.67

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.16274762 1.21323156 1.15071106 1.15295625 1.14983153 1.14728713
 1.15696621 1.16246438 1.17218447 1.15218377]

mean value: 1.1620563983917236

key: score_time
value: [0.08934927 0.08911204 0.08818769 0.0877502  0.08845139 0.0878191
 0.08764696 0.09043884 0.09330297 0.0872798 ]

mean value: 0.08893382549285889

key: test_mcc
value: [0.8819171  0.62994079 0.60714286 1.         0.875      0.75592895
 0.73214286 0.875      0.76376262 0.46428571]

mean value: 0.7585120882452765

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.8125     0.8        1.         0.93333333 0.86666667
 0.86666667 0.93333333 0.86666667 0.73333333]

mean value: 0.875

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.8        0.8        1.         0.93333333 0.83333333
 0.875      0.93333333 0.85714286 0.75      ]

mean value: 0.871547619047619

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.85714286 0.75       1.         0.875      1.
 0.875      1.         1.         0.75      ]

mean value: 0.9107142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.75       0.85714286 1.         1.         0.71428571
 0.875      0.875      0.75       0.75      ]

mean value: 0.8446428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.8125     0.80357143 1.         0.9375     0.85714286
 0.86607143 0.9375     0.875      0.73214286]

mean value: 0.8758928571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[0.875      0.66666667 0.66666667 1.         0.875      0.71428571
 0.77777778 0.875      0.75       0.6       ]

mean value: 0.7800396825396825

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.8

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.85972571 0.85355878 0.91043258 0.87334061 0.88241315 0.88078475
 0.8900435  0.86408782 0.90165615 0.90448761]

mean value: 0.8820530652999878

key: score_time
value: [0.1766603  0.22412515 0.13609123 0.21713638 0.24007654 0.21767449
 0.19926095 0.18716145 0.25138688 0.22168207]

mean value: 0.20712554454803467

key: test_mcc
value: [0.8819171  0.62994079 0.49099025 1.         0.875      0.75592895
 0.60714286 0.875      0.76376262 0.76376262]

mean value: 0.764344517988115

key: train_mcc
value: [0.97100831 0.97100831 0.98550725 0.97122151 0.97122151 0.97122151
 0.97120941 0.95710706 0.98550418 0.95629932]

mean value: 0.9711308369300188

key: test_accuracy
value: [0.9375     0.8125     0.73333333 1.         0.93333333 0.86666667
 0.8        0.93333333 0.86666667 0.86666667]

mean value: 0.875

key: train_accuracy
value: [0.98529412 0.98529412 0.99270073 0.98540146 0.98540146 0.98540146
 0.98540146 0.97810219 0.99270073 0.97810219]

mean value: 0.9853799914126234

key: test_fscore
value: [0.93333333 0.8        0.75       1.         0.93333333 0.83333333
 0.8        0.93333333 0.85714286 0.85714286]

mean value: 0.8697619047619047

key: train_fscore
value: [0.98507463 0.98507463 0.99270073 0.98529412 0.98529412 0.98529412
 0.98507463 0.97744361 0.99259259 0.97777778]

mean value: 0.9851620942858126

key: test_precision
value: [1.         0.85714286 0.66666667 1.         0.875      1.
 0.85714286 1.         1.         1.        ]

mean value: 0.9255952380952381

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.98507463]

mean value: 0.9985074626865672

key: test_recall
value: [0.875      0.75       0.85714286 1.         1.         0.71428571
 0.75       0.875      0.75       0.75      ]

mean value: 0.8321428571428572

key: train_recall
value: [0.97058824 0.97058824 0.98550725 0.97101449 0.97101449 0.97101449
 0.97058824 0.95588235 0.98529412 0.97058824]

mean value: 0.9722080136402387

key: test_roc_auc
value: [0.9375     0.8125     0.74107143 1.         0.9375     0.85714286
 0.80357143 0.9375     0.875      0.875     ]

mean value: 0.8776785714285714

key: train_roc_auc
value: [0.98529412 0.98529412 0.99275362 0.98550725 0.98550725 0.98550725
 0.98529412 0.97794118 0.99264706 0.97804774]

mean value: 0.98537936913896

key: test_jcc
value: [0.875      0.66666667 0.6        1.         0.875      0.71428571
 0.66666667 0.875      0.75       0.75      ]

mean value: 0.7772619047619047

key: train_jcc
value: [0.97058824 0.97058824 0.98550725 0.97101449 0.97101449 0.97101449
 0.97058824 0.95588235 0.98529412 0.95652174]

mean value: 0.9708013640238704

MCC on Blind test: 0.1

Accuracy on Blind test: 0.81

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02130222 0.00860238 0.00862336 0.00866222 0.00853252 0.00857735
 0.00868011 0.00864792 0.00861239 0.00864911]

mean value: 0.009888958930969239

key: score_time
value: [0.01332974 0.00847292 0.00869036 0.00839376 0.00838685 0.00843501
 0.00849009 0.00850749 0.00847769 0.00854087]

mean value: 0.008972477912902833

key: test_mcc
value: [0.         0.25       0.19642857 0.56407607 0.49099025 0.26189246
 0.19642857 0.47245559 0.46428571 0.64465837]

mean value: 0.35412156107816933

key: train_mcc
value: [0.5144434  0.54470478 0.54363493 0.50392153 0.52716245 0.54721488
 0.48424445 0.51877014 0.50571263 0.5019846 ]

mean value: 0.5191793798033469

key: test_accuracy
value: [0.5        0.625      0.6        0.73333333 0.73333333 0.6
 0.6        0.73333333 0.73333333 0.8       ]

mean value: 0.6658333333333333

key: train_accuracy
value: [0.75       0.76470588 0.76642336 0.74452555 0.75912409 0.76642336
 0.72992701 0.75182482 0.74452555 0.74452555]

mean value: 0.7522005152425933

key: test_fscore
value: [0.6        0.625      0.57142857 0.77777778 0.75       0.66666667
 0.625      0.77777778 0.75       0.84210526]

mean value: 0.6985756056808688

key: train_fscore
value: [0.77631579 0.78947368 0.78947368 0.77419355 0.78145695 0.79220779
 0.76433121 0.77631579 0.77124183 0.76821192]

mean value: 0.7783222202391937

key: test_precision
value: [0.5        0.625      0.57142857 0.63636364 0.66666667 0.54545455
 0.625      0.7        0.75       0.72727273]

mean value: 0.6347186147186147

key: train_precision
value: [0.70238095 0.71428571 0.72289157 0.69767442 0.7195122  0.71764706
 0.6741573  0.70238095 0.69411765 0.69879518]

mean value: 0.7043842989015312

key: test_recall
value: [0.75       0.625      0.57142857 1.         0.85714286 0.85714286
 0.625      0.875      0.75       1.        ]

mean value: 0.7910714285714285

key: train_recall
value: [0.86764706 0.88235294 0.86956522 0.86956522 0.85507246 0.88405797
 0.88235294 0.86764706 0.86764706 0.85294118]

mean value: 0.8698849104859335

key: test_roc_auc
value: [0.5        0.625      0.59821429 0.75       0.74107143 0.61607143
 0.59821429 0.72321429 0.73214286 0.78571429]

mean value: 0.6669642857142857

key: train_roc_auc
value: [0.75       0.76470588 0.76566496 0.74360614 0.75841858 0.7655584
 0.73103154 0.75266411 0.74541773 0.74531117]

mean value: 0.7522378516624041

key: test_jcc
value: [0.42857143 0.45454545 0.4        0.63636364 0.6        0.5
 0.45454545 0.63636364 0.6        0.72727273]

mean value: 0.5437662337662338

key: train_jcc
value: [0.6344086  0.65217391 0.65217391 0.63157895 0.64130435 0.65591398
 0.6185567  0.6344086  0.62765957 0.62365591]

mean value: 0.6371834493554671

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.04765677 0.15343904 0.03782272 0.03994322 0.04357862 0.04628658
 0.04569268 0.04593277 0.04655242 0.04481316]

mean value: 0.055171799659729

key: score_time
value: [0.01268363 0.01065946 0.01116037 0.01030374 0.01018286 0.01021385
 0.01012373 0.01024175 0.01021934 0.0104804 ]

mean value: 0.010626912117004395

key: test_mcc
value: [1.         0.75       1.         1.         0.73214286 1.
 0.87287156 1.         0.73214286 0.76376262]

mean value: 0.8850919891055657

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      1.         1.         0.86666667 1.
 0.93333333 1.         0.86666667 0.86666667]

mean value: 0.9408333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.875      1.         1.         0.85714286 1.
 0.94117647 1.         0.875      0.85714286]

mean value: 0.940546218487395

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      1.         1.         0.85714286 1.
 0.88888889 1.         0.875      1.        ]

mean value: 0.9496031746031746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 1.
 1.         1.         0.875      0.75      ]

mean value: 0.9357142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      1.         1.         0.86607143 1.
 0.92857143 1.         0.86607143 0.875     ]

mean value: 0.9410714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.77777778 1.         1.         0.75       1.
 0.88888889 1.         0.77777778 0.75      ]

mean value: 0.8944444444444445

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.85

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02869916 0.05144167 0.04765964 0.04727054 0.04757094 0.04816079
 0.04739809 0.04720759 0.04738903 0.04750156]

mean value: 0.0460299015045166

key: score_time
value: [0.02280951 0.02008939 0.02200198 0.01993203 0.02164054 0.02243447
 0.02050519 0.02272391 0.02087831 0.02276683]

mean value: 0.021578216552734376

key: test_mcc
value: [0.5        0.62994079 0.37796447 0.36689969 0.13363062 0.37796447
 0.33928571 0.34247476 0.32732684 0.6000992 ]

mean value: 0.3995586555675519

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.8125     0.66666667 0.6        0.53333333 0.66666667
 0.66666667 0.66666667 0.66666667 0.8       ]

mean value: 0.6829166666666666

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.82352941 0.70588235 0.7        0.63157895 0.70588235
 0.66666667 0.73684211 0.70588235 0.82352941]

mean value: 0.7249793601651187

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.77777778 0.6        0.53846154 0.5        0.6
 0.71428571 0.63636364 0.66666667 0.77777778]

mean value: 0.6561333111333111

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.875      0.85714286 1.         0.85714286 0.85714286
 0.625      0.875      0.75       0.875     ]

mean value: 0.8321428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.8125     0.67857143 0.625      0.55357143 0.67857143
 0.66964286 0.65178571 0.66071429 0.79464286]

mean value: 0.6875

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.7        0.54545455 0.53846154 0.46153846 0.54545455
 0.5        0.58333333 0.54545455 0.7       ]

mean value: 0.5719696969696969

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.59

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0237956  0.00884223 0.00858641 0.0084517  0.00839496 0.00843644
 0.00836205 0.00869083 0.00836253 0.00843644]

mean value: 0.01003592014312744

key: score_time
value: [0.00963569 0.00868869 0.00838709 0.00837803 0.00836611 0.00832963
 0.00844836 0.0084126  0.00840187 0.00849652]

mean value: 0.008554458618164062

key: test_mcc
value: [ 0.25        0.37796447  0.49099025  0.60714286  0.76376262 -0.04029115
  0.32732684 -0.07142857  0.07142857  0.47245559]

mean value: 0.3249351477423294

key: train_mcc
value: [0.36066785 0.35682062 0.41648963 0.30654574 0.33012155 0.37638937
 0.34831003 0.39368469 0.40740412 0.36064115]

mean value: 0.36570747455665553

key: test_accuracy
value: [0.625      0.6875     0.73333333 0.8        0.86666667 0.46666667
 0.66666667 0.46666667 0.53333333 0.73333333]

mean value: 0.6579166666666667

key: train_accuracy
value: [0.67647059 0.67647059 0.7080292  0.64963504 0.66423358 0.68613139
 0.67153285 0.69343066 0.70072993 0.67883212]

mean value: 0.6805495920996136

key: test_fscore
value: [0.625      0.70588235 0.75       0.8        0.875      0.55555556
 0.70588235 0.5        0.53333333 0.77777778]

mean value: 0.682843137254902

key: train_fscore
value: [0.70666667 0.69863014 0.71830986 0.68831169 0.68493151 0.7114094
 0.69387755 0.71621622 0.72108844 0.69444444]

mean value: 0.7033885900997274

key: test_precision
value: [0.625      0.66666667 0.66666667 0.75       0.77777778 0.45454545
 0.66666667 0.5        0.57142857 0.7       ]

mean value: 0.6378751803751803

key: train_precision
value: [0.64634146 0.65384615 0.69863014 0.62352941 0.64935065 0.6625
 0.64556962 0.6625     0.67088608 0.65789474]

mean value: 0.6571048248407082

key: test_recall
value: [0.625      0.75       0.85714286 0.85714286 1.         0.71428571
 0.75       0.5        0.5        0.875     ]

mean value: 0.7428571428571429

key: train_recall
value: [0.77941176 0.75       0.73913043 0.76811594 0.72463768 0.76811594
 0.75       0.77941176 0.77941176 0.73529412]

mean value: 0.7573529411764706

key: test_roc_auc
value: [0.625      0.6875     0.74107143 0.80357143 0.875      0.48214286
 0.66071429 0.46428571 0.53571429 0.72321429]

mean value: 0.6598214285714286

key: train_roc_auc
value: [0.67647059 0.67647059 0.70780051 0.64876385 0.66378943 0.68552856
 0.67210145 0.69405371 0.70130009 0.67924126]

mean value: 0.6805520034100596

key: test_jcc
value: [0.45454545 0.54545455 0.6        0.66666667 0.77777778 0.38461538
 0.54545455 0.33333333 0.36363636 0.63636364]

mean value: 0.5307847707847708

key: train_jcc
value: [0.54639175 0.53684211 0.56043956 0.52475248 0.52083333 0.55208333
 0.53125    0.55789474 0.56382979 0.53191489]

mean value: 0.5426231977887398

MCC on Blind test: 0.08

Accuracy on Blind test: 0.49

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01028538 0.01399684 0.01461887 0.01369882 0.01696754 0.01587057
 0.01506424 0.0152986  0.01567125 0.01478696]

mean value: 0.014625906944274902

key: score_time
value: [0.00841475 0.01135182 0.01127791 0.01128936 0.01133323 0.01156378
 0.01150846 0.01161075 0.01152873 0.01164389]

mean value: 0.011152267456054688

key: test_mcc
value: [0.51639778 0.62994079 0.60714286 1.         0.6000992  0.73214286
 0.41931393 0.75592895 0.73214286 0.73214286]

mean value: 0.6725252075270663

key: train_mcc
value: [0.92737353 0.97058824 0.97122151 0.92710997 0.92951942 0.97080136
 0.68283343 0.90259957 0.97080136 0.95630861]

mean value: 0.920915701445411

key: test_accuracy
value: [0.75       0.8125     0.8        1.         0.8        0.86666667
 0.66666667 0.86666667 0.86666667 0.86666667]

mean value: 0.8295833333333333

key: train_accuracy
value: [0.96323529 0.98529412 0.98540146 0.96350365 0.96350365 0.98540146
 0.81751825 0.94890511 0.98540146 0.97810219]

mean value: 0.9576266638042078

key: test_fscore
value: [0.71428571 0.82352941 0.8        1.         0.76923077 0.85714286
 0.76190476 0.88888889 0.875      0.875     ]

mean value: 0.8364982403217698

key: train_fscore
value: [0.96402878 0.98529412 0.98529412 0.96350365 0.96240602 0.98550725
 0.8447205  0.95104895 0.98529412 0.97810219]

mean value: 0.9605199678693419

key: test_precision
value: [0.83333333 0.77777778 0.75       1.         0.83333333 0.85714286
 0.61538462 0.8        0.875      0.875     ]

mean value: 0.8216971916971917

key: train_precision
value: [0.94366197 0.98529412 1.         0.97058824 1.         0.98550725
 0.7311828  0.90666667 0.98529412 0.97101449]

mean value: 0.9479209643915247

key: test_recall
value: [0.625      0.875      0.85714286 1.         0.71428571 0.85714286
 1.         1.         0.875      0.875     ]

mean value: 0.8678571428571429

key: train_recall
value: [0.98529412 0.98529412 0.97101449 0.95652174 0.92753623 0.98550725
 1.         1.         0.98529412 0.98529412]

mean value: 0.9781756180733163

key: test_roc_auc
value: [0.75       0.8125     0.80357143 1.         0.79464286 0.86607143
 0.64285714 0.85714286 0.86607143 0.86607143]

mean value: 0.8258928571428571

key: train_roc_auc
value: [0.96323529 0.98529412 0.98550725 0.96355499 0.96376812 0.98540068
 0.81884058 0.94927536 0.98540068 0.97815431]

mean value: 0.957843137254902

key: test_jcc
value: [0.55555556 0.7        0.66666667 1.         0.625      0.75
 0.61538462 0.8        0.77777778 0.77777778]

mean value: 0.7268162393162393

key: train_jcc
value: [0.93055556 0.97101449 0.97101449 0.92957746 0.92753623 0.97142857
 0.7311828  0.90666667 0.97101449 0.95714286]

mean value: 0.9267133621426236

MCC on Blind test: 0.16

Accuracy on Blind test: 0.79

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01456738 0.01306653 0.01286817 0.01258302 0.01323724 0.01244307
 0.01299739 0.0136261  0.01343799 0.01282883]

mean value: 0.013165569305419922

key: score_time
value: [0.01136112 0.01127458 0.01129842 0.01130366 0.01122713 0.01127219
 0.01125145 0.01158643 0.01138997 0.01129389]

mean value: 0.011325883865356445

key: test_mcc
value: [0.62994079 0.51639778 0.26189246 0.46770717 0.73214286 0.20044593
 0.60714286 1.         0.60714286 0.76376262]

mean value: 0.5786575323186889

key: train_mcc
value: [0.88852332 0.82402205 0.94318882 0.41647809 0.78854812 0.37364893
 0.92787101 0.86311873 0.94318882 0.86311873]

mean value: 0.7831706632681725

key: test_accuracy
value: [0.8125     0.75       0.6        0.66666667 0.86666667 0.6
 0.8        1.         0.8        0.86666667]

mean value: 0.77625

key: train_accuracy
value: [0.94117647 0.90441176 0.97080292 0.64963504 0.88321168 0.62043796
 0.96350365 0.9270073  0.97080292 0.9270073 ]

mean value: 0.8757996994418206

key: test_fscore
value: [0.8        0.77777778 0.66666667 0.73684211 0.85714286 0.4
 0.8        1.         0.8        0.85714286]

mean value: 0.7695572263993316

key: train_fscore
value: [0.94444444 0.91275168 0.97183099 0.74193548 0.86885246 0.39534884
 0.96240602 0.92063492 0.96969697 0.92063492]

mean value: 0.8608536714313355

key: test_precision
value: [0.85714286 0.7        0.54545455 0.58333333 0.85714286 0.66666667
 0.85714286 1.         0.85714286 1.        ]

mean value: 0.7924025974025973

key: train_precision
value: [0.89473684 0.83950617 0.94520548 0.58974359 1.         1.
 0.98461538 1.         1.         1.        ]

mean value: 0.9253807468755798

key: test_recall
value: [0.75       0.875      0.85714286 1.         0.85714286 0.28571429
 0.75       1.         0.75       0.75      ]

mean value: 0.7875

key: train_recall
value: [1.         1.         1.         1.         0.76811594 0.24637681
 0.94117647 0.85294118 0.94117647 0.85294118]

mean value: 0.8602728047740835

key: test_roc_auc
value: [0.8125     0.75       0.61607143 0.6875     0.86607143 0.58035714
 0.80357143 1.         0.80357143 0.875     ]

mean value: 0.7794642857142857

key: train_roc_auc
value: [0.94117647 0.90441176 0.97058824 0.64705882 0.88405797 0.62318841
 0.96334186 0.92647059 0.97058824 0.92647059]

mean value: 0.8757352941176471

key: test_jcc
value: [0.66666667 0.63636364 0.5        0.58333333 0.75       0.25
 0.66666667 1.         0.66666667 0.75      ]

mean value: 0.646969696969697

key: train_jcc
value: [0.89473684 0.83950617 0.94520548 0.58974359 0.76811594 0.24637681
 0.92753623 0.85294118 0.94117647 0.85294118]

mean value: 0.7858279893177071

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11904359 0.10382509 0.10486364 0.10416913 0.10492015 0.10402918
 0.10552669 0.10339594 0.10672402 0.10487676]

mean value: 0.10613741874694824

key: score_time
value: [0.01462102 0.01453567 0.01501799 0.01468349 0.01466584 0.01468468
 0.01471519 0.01495481 0.01457691 0.01460719]

mean value: 0.014706277847290039

key: test_mcc
value: [1.         0.75       0.76376262 1.         0.73214286 0.87287156
 0.75592895 1.         0.73214286 0.60714286]

mean value: 0.8213991694216969

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      0.86666667 1.         0.86666667 0.93333333
 0.86666667 1.         0.86666667 0.8       ]

mean value: 0.9075

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.875      0.875      1.         0.85714286 0.92307692
 0.88888889 1.         0.875      0.8       ]

mean value: 0.9094108669108669

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.77777778 1.         0.85714286 1.
 0.8        1.         0.875      0.85714286]

mean value: 0.9042063492063492

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 1.         1.         0.875      0.75      ]

mean value: 0.9214285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      0.875      1.         0.86607143 0.92857143
 0.85714286 1.         0.86607143 0.80357143]

mean value: 0.9071428571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.77777778 0.77777778 1.         0.75       0.85714286
 0.8        1.         0.77777778 0.66666667]

mean value: 0.8407142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.74

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03904724 0.03519297 0.03736401 0.05492711 0.04780269 0.04455471
 0.0496397  0.02868819 0.02949524 0.03737569]

mean value: 0.040408754348754884

key: score_time
value: [0.02148342 0.03485632 0.02557969 0.03648186 0.01914907 0.03731179
 0.02533007 0.02252865 0.01633787 0.03448296]

mean value: 0.027354168891906738

key: test_mcc
value: [1.         0.75       1.         1.         0.73214286 0.73214286
 0.87287156 1.         0.73214286 0.76376262]

mean value: 0.8583062748198514

key: train_mcc
value: [0.98540068 1.         0.98550725 1.         0.98550725 1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9941922415380265

key: test_accuracy
value: [1.         0.875      1.         1.         0.86666667 0.86666667
 0.93333333 1.         0.86666667 0.86666667]

mean value: 0.9275

key: train_accuracy
value: [0.99264706 1.         0.99270073 1.         0.99270073 1.
 1.         0.99270073 1.         1.        ]

mean value: 0.9970749248604551

key: test_fscore
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 0.94117647 1.         0.875      0.85714286]

mean value: 0.9262605042016807

key: train_fscore
value: [0.99259259 1.         0.99270073 1.         0.99270073 1.
 1.         0.99270073 1.         1.        ]

mean value: 0.9970694782373615

key: test_precision
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 0.88888889 1.         0.875      1.        ]

mean value: 0.9353174603174603

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9985507246376811

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 1.         1.         0.875      0.75      ]

mean value: 0.9214285714285714

key: train_recall
value: [0.98529412 1.         0.98550725 1.         0.98550725 1.
 1.         1.         1.         1.        ]

mean value: 0.9956308610400683

key: test_roc_auc
value: [1.         0.875      1.         1.         0.86607143 0.86607143
 0.92857143 1.         0.86607143 0.875     ]

mean value: 0.9276785714285715

key: train_roc_auc
value: [0.99264706 1.         0.99275362 1.         0.99275362 1.
 1.         0.99275362 1.         1.        ]

mean value: 0.9970907928388747

key: test_jcc
value: [1.         0.77777778 1.         1.         0.75       0.75
 0.88888889 1.         0.77777778 0.75      ]

mean value: 0.8694444444444445

key: train_jcc
value: [0.98529412 1.         0.98550725 1.         0.98550725 1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9941815856777494

MCC on Blind test: 0.12

Accuracy on Blind test: 0.85

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.10467362 0.06222582 0.04387116 0.04423785 0.0438931  0.05345726
 0.04401588 0.04991961 0.06123614 0.01985145]

mean value: 0.052738189697265625

key: score_time
value: [0.0277462  0.02309895 0.01929665 0.02137995 0.02270675 0.01724195
 0.02074289 0.02189159 0.02425289 0.01265621]

mean value: 0.02110140323638916

key: test_mcc
value: [0.57735027 0.37796447 0.26189246 0.49099025 0.66143783 0.46770717
 0.75592895 0.46428571 0.49099025 0.34247476]

mean value: 0.48910221326969133

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.6875     0.6        0.73333333 0.8        0.66666667
 0.86666667 0.73333333 0.73333333 0.66666667]

mean value: 0.72375

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        0.66666667 0.66666667 0.75       0.82352941 0.73684211
 0.88888889 0.75       0.71428571 0.73684211]

mean value: 0.7533721558798958

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.71428571 0.54545455 0.66666667 0.7        0.58333333
 0.8        0.75       0.83333333 0.63636364]

mean value: 0.6896103896103896

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.625      0.85714286 0.85714286 1.         1.
 1.         0.75       0.625      0.875     ]

mean value: 0.8589285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.6875     0.61607143 0.74107143 0.8125     0.6875
 0.85714286 0.73214286 0.74107143 0.65178571]

mean value: 0.7276785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.5        0.5        0.6        0.7        0.58333333
 0.8        0.6        0.55555556 0.58333333]

mean value: 0.6088888888888889

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.08

Accuracy on Blind test: 0.56

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.27606034 0.25747943 0.26339436 0.26111817 0.21840525 0.25673342
 0.26478767 0.25883365 0.26374054 0.26239896]

mean value: 0.2582951784133911

key: score_time
value: [0.00913334 0.00893497 0.00893378 0.00878906 0.00885391 0.00896621
 0.00904179 0.00909328 0.00900531 0.00891733]

mean value: 0.008966898918151856

key: test_mcc
value: [1.         0.75       0.875      1.         0.73214286 0.87287156
 1.         1.         0.73214286 0.875     ]

mean value: 0.8837157275229683

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      0.93333333 1.         0.86666667 0.93333333
 1.         1.         0.86666667 0.93333333]

mean value: 0.9408333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.875      0.93333333 1.         0.85714286 0.92307692
 1.         1.         0.875      0.93333333]

mean value: 0.9396886446886447

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.875      1.         0.85714286 1.
 1.         1.         0.875      1.        ]

mean value: 0.9482142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 1.         1.         0.875      0.875     ]

mean value: 0.9339285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      0.9375     1.         0.86607143 0.92857143
 1.         1.         0.86607143 0.9375    ]

mean value: 0.9410714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.77777778 0.875      1.         0.75       0.85714286
 1.         1.         0.77777778 0.875     ]

mean value: 0.8912698412698412

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01719666 0.01709819 0.01698685 0.01688099 0.01700592 0.01693296
 0.01785445 0.02687979 0.01735425 0.02863908]

mean value: 0.01928291320800781

key: score_time
value: [0.01173925 0.01162791 0.01165867 0.01255465 0.01246572 0.01261783
 0.01174974 0.01269937 0.01248431 0.01166534]

mean value: 0.012126278877258301

key: test_mcc
value: [0.37796447 0.62994079 0.87287156 0.87287156 0.73214286 0.87287156
 0.41931393 0.49099025 0.75592895 0.56407607]

mean value: 0.6588972009888676

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.625      0.8125     0.93333333 0.93333333 0.86666667 0.93333333
 0.66666667 0.73333333 0.86666667 0.73333333]

mean value: 0.8104166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.82352941 0.92307692 0.92307692 0.85714286 0.92307692
 0.76190476 0.71428571 0.88888889 0.66666667]

mean value: 0.8208921797157092

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.57142857 0.77777778 1.         1.         0.85714286 1.
 0.61538462 0.83333333 0.8        1.        ]

mean value: 0.8455067155067155

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 0.85714286 0.85714286 0.85714286
 1.         0.625      1.         0.5       ]

mean value: 0.8428571428571429

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.625      0.8125     0.92857143 0.92857143 0.86607143 0.92857143
 0.64285714 0.74107143 0.85714286 0.75      ]

mean value: 0.8080357142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57142857 0.7        0.85714286 0.85714286 0.75       0.85714286
 0.61538462 0.55555556 0.8        0.5       ]

mean value: 0.7063797313797313

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.99

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03938007 0.02838469 0.02835846 0.0434401  0.02969289 0.02813339
 0.03353381 0.0308671  0.03408647 0.03346825]

mean value: 0.03293452262878418

key: score_time
value: [0.02064252 0.01147461 0.02169156 0.01526952 0.02015853 0.01412463
 0.02033472 0.02013707 0.02364635 0.02059531]

mean value: 0.018807482719421387

key: test_mcc
value: [0.77459667 0.75       0.60714286 1.         0.66143783 0.73214286
 0.75592895 1.         0.75592895 0.73214286]

mean value: 0.7769320960473112

key: train_mcc
value: [0.95598573 0.97100831 0.97122151 0.95629932 0.97080136 0.95629932
 0.97080136 0.95630861 0.98550418 0.95630861]

mean value: 0.9650538317814117

key: test_accuracy
value: [0.875      0.875      0.8        1.         0.8        0.86666667
 0.86666667 1.         0.86666667 0.86666667]

mean value: 0.8816666666666667

key: train_accuracy
value: [0.97794118 0.98529412 0.98540146 0.97810219 0.98540146 0.97810219
 0.98540146 0.97810219 0.99270073 0.97810219]

mean value: 0.9824549162730786

key: test_fscore
value: [0.85714286 0.875      0.8        1.         0.82352941 0.85714286
 0.88888889 1.         0.88888889 0.875     ]

mean value: 0.8865592903828198

key: train_fscore
value: [0.97810219 0.98550725 0.98529412 0.97841727 0.98550725 0.97841727
 0.98529412 0.97810219 0.99259259 0.97810219]

mean value: 0.98253364223575

key: test_precision
value: [1.         0.875      0.75       1.         0.7        0.85714286
 0.8        1.         0.8        0.875     ]

mean value: 0.8657142857142857

key: train_precision
value: [0.97101449 0.97142857 1.         0.97142857 0.98550725 0.97142857
 0.98529412 0.97101449 1.         0.97101449]

mean value: 0.9798130556570455

key: test_recall
value: [0.75       0.875      0.85714286 1.         1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9214285714285714

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:135: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:138: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98529412 1.         0.97101449 0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9854006820119352

key: test_roc_auc
value: [0.875      0.875      0.80357143 1.         0.8125     0.86607143
 0.85714286 1.         0.85714286 0.86607143]

mean value: 0.88125

key: train_roc_auc
value: [0.97794118 0.98529412 0.98550725 0.97804774 0.98540068 0.97804774
 0.98540068 0.97815431 0.99264706 0.97815431]

mean value: 0.982459505541347

key: test_jcc
value: [0.75       0.77777778 0.66666667 1.         0.7        0.75
 0.8        1.         0.8        0.77777778]

mean value: 0.8022222222222222

key: train_jcc
value: [0.95714286 0.97142857 0.97101449 0.95774648 0.97142857 0.95774648
 0.97101449 0.95714286 0.98529412 0.95714286]

mean value: 0.9657101775186498

MCC on Blind test: 0.12

Accuracy on Blind test: 0.68

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.25549579 0.22080564 0.20524096 0.19396758 0.19368577 0.19323683
 0.19286656 0.19027901 0.19815254 0.30556083]

mean value: 0.21492915153503417

key: score_time
value: [0.01167083 0.02065277 0.01978779 0.01162076 0.0197885  0.02050185
 0.02145648 0.02330804 0.02225208 0.02265215]

mean value: 0.019369125366210938

key: test_mcc
value: [0.77459667 0.62994079 0.60714286 1.         0.66143783 0.76376262
 0.75592895 0.875      0.75592895 0.73214286]

mean value: 0.755588150750494

key: train_mcc
value: [0.95598573 1.         0.97122151 0.95629932 0.97080136 1.
 1.         0.98550418 0.98550418 0.95630861]

mean value: 0.9781624891968844

key: test_accuracy
value: [0.875      0.8125     0.8        1.         0.8        0.86666667
 0.86666667 0.93333333 0.86666667 0.86666667]

mean value: 0.86875

key: train_accuracy
value: [0.97794118 1.         0.98540146 0.97810219 0.98540146 1.
 1.         0.99270073 0.99270073 0.97810219]

mean value: 0.9890349935594676

key: test_fscore
value: [0.85714286 0.82352941 0.8        1.         0.82352941 0.875
 0.88888889 0.93333333 0.88888889 0.875     ]

mean value: 0.8765312791783381

key: train_fscore
value: [0.97810219 1.         0.98529412 0.97841727 0.98550725 1.
 1.         0.99259259 0.99259259 0.97810219]

mean value: 0.989060819495815

key: test_precision
value: [1.         0.77777778 0.75       1.         0.7        0.77777778
 0.8        1.         0.8        0.875     ]

mean value: 0.8480555555555556

key: train_precision
value: [0.97101449 1.         1.         0.97142857 0.98550725 1.
 1.         1.         1.         0.97101449]

mean value: 0.989896480331263

key: test_recall
value: [0.75       0.875      0.85714286 1.         1.         1.
 1.         0.875      1.         0.875     ]

mean value: 0.9232142857142858

key: train_recall
value: [0.98529412 1.         0.97101449 0.98550725 0.98550725 1.
 1.         0.98529412 0.98529412 0.98529412]

mean value: 0.9883205456095482

key: test_roc_auc
value: [0.875      0.8125     0.80357143 1.         0.8125     0.875
 0.85714286 0.9375     0.85714286 0.86607143]

mean value: 0.8696428571428572

key: train_roc_auc
value: [0.97794118 1.         0.98550725 0.97804774 0.98540068 1.
 1.         0.99264706 0.99264706 0.97815431]

mean value: 0.98903452685422

key: test_jcc
value: [0.75       0.7        0.66666667 1.         0.7        0.77777778
 0.8        0.875      0.8        0.77777778]

mean value: 0.7847222222222222

key: train_jcc
value: [0.95714286 1.         0.97101449 0.95774648 0.97142857 1.
 1.         0.98529412 0.98529412 0.95714286]

mean value: 0.9785063492635266

MCC on Blind test: 0.12

Accuracy on Blind test: 0.68

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02792597 0.02731872 0.0251379  0.0208137  0.02745724 0.02563
 0.02640152 0.02301621 0.02505469 0.02661943]

mean value: 0.025537538528442382

key: score_time
value: [0.01137447 0.01140642 0.01132774 0.01132226 0.011338   0.01140189
 0.01133633 0.01129055 0.01135898 0.0113225 ]

mean value: 0.01134791374206543

key: test_mcc
value: [0.77459667 0.5        0.37796447 1.         0.76376262 0.73214286
 0.46428571 0.73214286 0.49099025 0.47245559]

mean value: 0.630834103094063

key: train_mcc
value: [0.89715584 0.94117647 0.94323594 0.91281179 0.92791659 0.8978896
 0.89863497 0.94199209 0.91277477 0.8978896 ]

mean value: 0.9171477642860466

key: test_accuracy
value: [0.875      0.75       0.66666667 1.         0.86666667 0.86666667
 0.73333333 0.86666667 0.73333333 0.73333333]

mean value: 0.8091666666666667

key: train_accuracy
value: [0.94852941 0.97058824 0.97080292 0.95620438 0.96350365 0.94890511
 0.94890511 0.97080292 0.95620438 0.94890511]

mean value: 0.958335122370116

key: test_fscore
value: [0.85714286 0.75       0.70588235 1.         0.875      0.85714286
 0.75       0.875      0.71428571 0.77777778]

mean value: 0.8162231559290383

key: train_fscore
value: [0.94814815 0.97058824 0.97014925 0.95588235 0.96296296 0.94964029
 0.94736842 0.97014925 0.95522388 0.94814815]

mean value: 0.9578260944376671

key: test_precision
value: [1.         0.75       0.6        1.         0.77777778 0.85714286
 0.75       0.875      0.83333333 0.7       ]

mean value: 0.8143253968253968

key: train_precision
value: [0.95522388 0.97058824 1.         0.97014925 0.98484848 0.94285714
 0.96923077 0.98484848 0.96969697 0.95522388]

mean value: 0.9702667101701342

key: test_recall
value: [0.75       0.75       0.85714286 1.         1.         0.85714286
 0.75       0.875      0.625      0.875     ]

mean value: 0.8339285714285715

key: train_recall
value: [0.94117647 0.97058824 0.94202899 0.94202899 0.94202899 0.95652174
 0.92647059 0.95588235 0.94117647 0.94117647]

mean value: 0.9459079283887468

key: test_roc_auc
value: [0.875      0.75       0.67857143 1.         0.875      0.86607143
 0.73214286 0.86607143 0.74107143 0.72321429]

mean value: 0.8107142857142857

key: train_roc_auc
value: [0.94852941 0.97058824 0.97101449 0.95630861 0.96366155 0.9488491
 0.94874254 0.9706948  0.95609548 0.9488491 ]

mean value: 0.9583333333333334

key: test_jcc
value: [0.75       0.6        0.54545455 1.         0.77777778 0.75
 0.6        0.77777778 0.55555556 0.63636364]

mean value: 0.6992929292929293

key: train_jcc
value: [0.90140845 0.94285714 0.94202899 0.91549296 0.92857143 0.90410959
 0.9        0.94202899 0.91428571 0.90140845]

mean value: 0.9192191704924804

MCC on Blind test: 0.12

Accuracy on Blind test: 0.66

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.69153619 0.80936241 0.62225533 0.65131831 0.74939919 0.82583261
 0.61750174 0.64893556 0.70422363 0.64521885]

mean value: 0.6965583801269531

key: score_time
value: [0.012501   0.01190329 0.01287556 0.01288652 0.01281619 0.01286817
 0.01280427 0.01320028 0.01286268 0.01277542]

mean value: 0.012749338150024414

key: test_mcc
value: [0.8819171  0.75       0.76376262 1.         0.66143783 0.73214286
 0.64465837 0.87287156 0.46428571 0.87287156]

mean value: 0.7643947611817132

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.94160273]

mean value: 0.994160272804774

key: test_accuracy
value: [0.9375     0.875      0.86666667 1.         0.8        0.86666667
 0.8        0.93333333 0.73333333 0.93333333]

mean value: 0.8745833333333334

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97080292]

mean value: 0.997080291970803

key: test_fscore
value: [0.93333333 0.875      0.875      1.         0.82352941 0.85714286
 0.84210526 0.94117647 0.75       0.94117647]

mean value: 0.8838463806575262

key: train_fscore
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97058824]

mean value: 0.9970588235294118

key: test_precision
value: [1.         0.875      0.77777778 1.         0.7        0.85714286
 0.72727273 0.88888889 0.75       0.88888889]

mean value: 0.846497113997114

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97058824]

mean value: 0.9970588235294118

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9357142857142857

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97058824]

mean value: 0.9970588235294118

key: test_roc_auc
value: [0.9375     0.875      0.875      1.         0.8125     0.86607143
 0.78571429 0.92857143 0.73214286 0.92857143]

mean value: 0.8741071428571429

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97080136]

mean value: 0.997080136402387

key: test_jcc
value: [0.875      0.77777778 0.77777778 1.         0.7        0.75
 0.72727273 0.88888889 0.6        0.88888889]

mean value: 0.7985606060606061

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.94285714]

mean value: 0.9942857142857143

MCC on Blind test: 0.12

Accuracy on Blind test: 0.67

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01223969 0.00993228 0.00880289 0.00858593 0.00832534 0.00831509
 0.00848222 0.00867891 0.00828099 0.00835156]

mean value: 0.008999490737915039

key: score_time
value: [0.02127433 0.00887847 0.00874305 0.00918937 0.00837994 0.00838256
 0.00842857 0.00844622 0.00842428 0.00837636]

mean value: 0.009852313995361328

key: test_mcc
value: [ 0.37796447  0.40451992  0.21821789  0.37796447  0.46770717  0.36689969
  0.18898224  0.05455447  0.32732684 -0.25      ]

mean value: 0.25341371643494065

key: train_mcc
value: [0.413068   0.44611344 0.36921463 0.36921463 0.34968716 0.51090959
 0.54031142 0.48879481 0.41938946 0.43843163]

mean value: 0.4345134780961774

key: test_accuracy
value: [0.625      0.6875     0.6        0.66666667 0.66666667 0.6
 0.6        0.53333333 0.66666667 0.46666667]

mean value: 0.61125

key: train_accuracy
value: [0.68382353 0.69852941 0.66423358 0.66423358 0.65693431 0.75182482
 0.75912409 0.73722628 0.68613139 0.70072993]

mean value: 0.7002790897380851

key: test_fscore
value: [0.72727273 0.73684211 0.625      0.70588235 0.73684211 0.7
 0.66666667 0.58823529 0.70588235 0.63636364]

mean value: 0.6828987240829346

key: train_fscore
value: [0.74251497 0.75449102 0.72941176 0.72941176 0.72189349 0.77333333
 0.78709677 0.76315789 0.74251497 0.74846626]

mean value: 0.7492292238552293

key: test_precision
value: [0.57142857 0.63636364 0.55555556 0.6        0.58333333 0.53846154
 0.6        0.55555556 0.66666667 0.5       ]

mean value: 0.5807364857364857

key: train_precision
value: [0.62626263 0.63636364 0.61386139 0.61386139 0.61       0.71604938
 0.70114943 0.69047619 0.62626263 0.64210526]

mean value: 0.6476391922803607

key: test_recall
value: [1.         0.875      0.71428571 0.85714286 1.         1.
 0.75       0.625      0.75       0.875     ]

mean value: 0.8446428571428571

key: train_recall
value: [0.91176471 0.92647059 0.89855072 0.89855072 0.88405797 0.84057971
 0.89705882 0.85294118 0.91176471 0.89705882]

mean value: 0.8918797953964195

key: test_roc_auc
value: [0.625      0.6875     0.60714286 0.67857143 0.6875     0.625
 0.58928571 0.52678571 0.66071429 0.4375    ]

mean value: 0.6125

key: train_roc_auc
value: [0.68382353 0.69852941 0.66251066 0.66251066 0.65526428 0.75117221
 0.76012361 0.73806479 0.68776641 0.7021526 ]

mean value: 0.7001918158567775

key: test_jcc
value: [0.57142857 0.58333333 0.45454545 0.54545455 0.58333333 0.53846154
 0.5        0.41666667 0.54545455 0.46666667]

mean value: 0.5205344655344655

key: train_jcc
value: [0.59047619 0.60576923 0.57407407 0.57407407 0.56481481 0.63043478
 0.64893617 0.61702128 0.59047619 0.59803922]

mean value: 0.5994116019788055

MCC on Blind test: -0.0

Accuracy on Blind test: 0.32

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0087018  0.00853872 0.00848269 0.00848675 0.00855303 0.00848556
 0.0085392  0.00849295 0.00860119 0.00851774]

mean value: 0.008539962768554687

key: score_time
value: [0.00843382 0.00837398 0.00834632 0.00840616 0.00839901 0.00846195
 0.00838351 0.00841093 0.0083952  0.00841308]

mean value: 0.008402395248413085

key: test_mcc
value: [0.51639778 0.12598816 0.09449112 0.26189246 0.19642857 0.09449112
 0.37796447 0.21821789 0.21821789 0.37796447]

mean value: 0.24820539348959375

key: train_mcc
value: [0.53033009 0.50349655 0.50959996 0.60584099 0.53517487 0.50469525
 0.59240339 0.45151662 0.50525024 0.53294957]

mean value: 0.5271257516406771

key: test_accuracy
value: [0.75       0.5625     0.53333333 0.6        0.6        0.53333333
 0.66666667 0.6        0.6        0.66666667]

mean value: 0.61125

key: train_accuracy
value: [0.76470588 0.75       0.75182482 0.80291971 0.76642336 0.75182482
 0.79562044 0.72262774 0.75182482 0.76642336]

mean value: 0.7624194933447832

key: test_fscore
value: [0.77777778 0.53333333 0.58823529 0.66666667 0.57142857 0.58823529
 0.61538462 0.57142857 0.57142857 0.61538462]

mean value: 0.6099303311068016

key: train_fscore
value: [0.75757576 0.734375   0.734375   0.8057554  0.75757576 0.74626866
 0.8        0.69354839 0.73846154 0.76119403]

mean value: 0.7529129522960445

key: test_precision
value: [0.7        0.57142857 0.5        0.54545455 0.57142857 0.5
 0.8        0.66666667 0.66666667 0.8       ]

mean value: 0.6321645021645022

key: train_precision
value: [0.78125    0.78333333 0.79661017 0.8        0.79365079 0.76923077
 0.77777778 0.76785714 0.77419355 0.77272727]

mean value: 0.7816630807455712

key: test_recall
value: [0.875      0.5        0.71428571 0.85714286 0.57142857 0.71428571
 0.5        0.5        0.5        0.5       ]

mean value: 0.6232142857142857

key: train_recall
value: [0.73529412 0.69117647 0.68115942 0.8115942  0.72463768 0.72463768
 0.82352941 0.63235294 0.70588235 0.75      ]

mean value: 0.7280264279624894

key: test_roc_auc
value: [0.75       0.5625     0.54464286 0.61607143 0.59821429 0.54464286
 0.67857143 0.60714286 0.60714286 0.67857143]

mean value: 0.61875

key: train_roc_auc
value: [0.76470588 0.75       0.75234442 0.80285592 0.76673061 0.75202472
 0.79582268 0.72197357 0.7514919  0.76630435]

mean value: 0.7624254049445865

key: test_jcc
value: [0.63636364 0.36363636 0.41666667 0.5        0.4        0.41666667
 0.44444444 0.4        0.4        0.44444444]

mean value: 0.44222222222222224

key: train_jcc
value: [0.6097561  0.58024691 0.58024691 0.6746988  0.6097561  0.5952381
 0.66666667 0.5308642  0.58536585 0.61445783]

mean value: 0.6047297461882631

MCC on Blind test: 0.05

Accuracy on Blind test: 0.61

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00832582 0.00817609 0.00818443 0.00810075 0.00809479 0.00813198
 0.00812602 0.00824833 0.00817919 0.00815916]

mean value: 0.00817265510559082

key: score_time
value: [0.00930643 0.00926352 0.00937724 0.00929761 0.00933504 0.00922656
 0.00925708 0.00928569 0.00933361 0.00929666]

mean value: 0.009297943115234375

key: test_mcc
value: [ 0.62994079  0.62994079  0.09449112  0.60714286  0.73214286  0.26189246
 -0.07142857  0.53452248  0.46770717  0.05455447]

mean value: 0.3940906430845712

key: train_mcc
value: [0.54464795 0.57408838 0.62041773 0.57703846 0.49006025 0.5339313
 0.62305217 0.59138421 0.62163943 0.6062745 ]

mean value: 0.5782534378327537

key: test_accuracy
value: [0.8125     0.8125     0.53333333 0.8        0.86666667 0.6
 0.46666667 0.73333333 0.66666667 0.53333333]

mean value: 0.6825

key: train_accuracy
value: [0.77205882 0.78676471 0.81021898 0.78832117 0.74452555 0.76642336
 0.81021898 0.79562044 0.81021898 0.80291971]

mean value: 0.7887290682696436

key: test_fscore
value: [0.8        0.8        0.58823529 0.8        0.85714286 0.66666667
 0.5        0.8        0.54545455 0.58823529]

mean value: 0.6945734657499363

key: train_fscore
value: [0.77697842 0.79136691 0.8115942  0.79432624 0.75524476 0.76119403
 0.81690141 0.79104478 0.81428571 0.79699248]

mean value: 0.790992893292864

key: test_precision
value: [0.85714286 0.85714286 0.5        0.75       0.85714286 0.54545455
 0.5        0.66666667 1.         0.55555556]

mean value: 0.7089105339105339

key: train_precision
value: [0.76056338 0.77464789 0.8115942  0.77777778 0.72972973 0.78461538
 0.78378378 0.8030303  0.79166667 0.81538462]

mean value: 0.7832793731492446

key: test_recall
value: [0.75       0.75       0.71428571 0.85714286 0.85714286 0.85714286
 0.5        1.         0.375      0.625     ]

mean value: 0.7285714285714285

key: train_recall
value: [0.79411765 0.80882353 0.8115942  0.8115942  0.7826087  0.73913043
 0.85294118 0.77941176 0.83823529 0.77941176]

mean value: 0.7997868712702473

key: test_roc_auc
value: [0.8125     0.8125     0.54464286 0.80357143 0.86607143 0.61607143
 0.46428571 0.71428571 0.6875     0.52678571]

mean value: 0.6848214285714286

key: train_roc_auc
value: [0.77205882 0.78676471 0.81020887 0.78815004 0.74424552 0.76662404
 0.81052856 0.79550298 0.81042199 0.80274936]

mean value: 0.7887254901960784

key: test_jcc
value: [0.66666667 0.66666667 0.41666667 0.66666667 0.75       0.5
 0.33333333 0.66666667 0.375      0.41666667]

mean value: 0.5458333333333333

key: train_jcc
value: [0.63529412 0.6547619  0.68292683 0.65882353 0.60674157 0.61445783
 0.69047619 0.65432099 0.68674699 0.6625    ]

mean value: 0.6547049951530348

MCC on Blind test: -0.02

Accuracy on Blind test: 0.58

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01064086 0.00967836 0.00960207 0.0098033  0.00957322 0.0095706
 0.00952244 0.00956464 0.00957322 0.00955892]

mean value: 0.009708762168884277

key: score_time
value: [0.00922561 0.00877929 0.00873637 0.00890636 0.00864887 0.00866437
 0.00868011 0.00870419 0.00868416 0.00871134]

mean value: 0.008774065971374511

key: test_mcc
value: [0.62994079 0.37796447 0.49099025 0.73214286 0.73214286 0.46428571
 0.60714286 0.49099025 0.37796447 0.32732684]

mean value: 0.5230891361497406

key: train_mcc
value: [0.84051051 0.85628096 0.85440207 0.83947987 0.79880676 0.85440207
 0.8251228  0.79855228 0.79688349 0.81433714]

mean value: 0.8278777956305471

key: test_accuracy
value: [0.8125     0.6875     0.73333333 0.86666667 0.86666667 0.73333333
 0.8        0.73333333 0.66666667 0.66666667]

mean value: 0.7566666666666666

key: train_accuracy
value: [0.91911765 0.92647059 0.9270073  0.91970803 0.89781022 0.9270073
 0.91240876 0.89781022 0.89781022 0.90510949]

mean value: 0.9130259768140833

key: test_fscore
value: [0.8        0.66666667 0.75       0.85714286 0.85714286 0.71428571
 0.8        0.71428571 0.61538462 0.70588235]

mean value: 0.7480790777849601

key: train_fscore
value: [0.91603053 0.92307692 0.92647059 0.92086331 0.89393939 0.92647059
 0.91044776 0.89230769 0.89393939 0.89922481]

mean value: 0.9102770990833234

key: test_precision
value: [0.85714286 0.71428571 0.66666667 0.85714286 0.85714286 0.71428571
 0.85714286 0.83333333 0.8        0.66666667]

mean value: 0.7823809523809524

key: train_precision
value: [0.95238095 0.96774194 0.94029851 0.91428571 0.93650794 0.94029851
 0.92424242 0.93548387 0.921875   0.95081967]

mean value: 0.9383934520925161

key: test_recall
value: [0.75       0.625      0.85714286 0.85714286 0.85714286 0.71428571
 0.75       0.625      0.5        0.75      ]

mean value: 0.7285714285714285

key: train_recall
value: [0.88235294 0.88235294 0.91304348 0.92753623 0.85507246 0.91304348
 0.89705882 0.85294118 0.86764706 0.85294118]

mean value: 0.8843989769820971

key: test_roc_auc
value: [0.8125     0.6875     0.74107143 0.86607143 0.86607143 0.73214286
 0.80357143 0.74107143 0.67857143 0.66071429]

mean value: 0.7589285714285714

key: train_roc_auc
value: [0.91911765 0.92647059 0.92710997 0.91965047 0.89812447 0.92710997
 0.91229753 0.89748508 0.89759165 0.90473146]

mean value: 0.9129688832054561

key: test_jcc
value: [0.66666667 0.5        0.6        0.75       0.75       0.55555556
 0.66666667 0.55555556 0.44444444 0.54545455]

mean value: 0.6034343434343434

key: train_jcc
value: [0.84507042 0.85714286 0.8630137  0.85333333 0.80821918 0.8630137
 0.83561644 0.80555556 0.80821918 0.81690141]

mean value: 0.8356085768798484

MCC on Blind test: 0.06

Accuracy on Blind test: 0.69

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.51071286 0.62532473 0.49337792 0.53811383 0.55977559 0.71114159
 0.52939415 0.50543237 0.55521274 0.65180755]

mean value: 0.5680293321609498

key: score_time
value: [0.01190805 0.01196837 0.01200247 0.01195788 0.01197672 0.01212215
 0.01198769 0.0119679  0.01194954 0.0119822 ]

mean value: 0.011982297897338868

key: test_mcc
value: [0.75       0.75       0.66143783 1.         0.76376262 0.73214286
 0.33928571 0.75592895 0.49099025 0.75592895]

mean value: 0.6999477160088584

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.875      0.8        1.         0.86666667 0.86666667
 0.66666667 0.86666667 0.73333333 0.86666667]

mean value: 0.8416666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.875      0.875      0.82352941 1.         0.875      0.85714286
 0.66666667 0.88888889 0.71428571 0.88888889]

mean value: 0.8464402427637722

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.875      0.7        1.         0.77777778 0.85714286
 0.71428571 0.8        0.83333333 0.8       ]

mean value: 0.8232539682539682

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 0.625      1.         0.625      1.        ]

mean value: 0.8857142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.875      0.8125     1.         0.875      0.86607143
 0.66964286 0.85714286 0.74107143 0.85714286]

mean value: 0.8428571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.77777778 0.77777778 0.7        1.         0.77777778 0.75
 0.5        0.8        0.55555556 0.8       ]

mean value: 0.7438888888888889

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.7

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01587987 0.01330566 0.01045179 0.01051211 0.01048994 0.01024008
 0.01018476 0.00993347 0.01029968 0.01046205]

mean value: 0.011175942420959473

key: score_time
value: [0.01157689 0.00889277 0.00871801 0.0086174  0.0084703  0.0083189
 0.00842404 0.00833297 0.00833917 0.00839949]

mean value: 0.00880899429321289

key: test_mcc
value: [0.8819171  0.8819171  1.         0.875      0.875      0.87287156
 0.87287156 0.87287156 0.60714286 0.87287156]

mean value: 0.8612463308295129

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     1.         0.93333333 0.93333333 0.93333333
 0.93333333 0.93333333 0.8        0.93333333]

mean value: 0.9275

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.94117647 1.         0.93333333 0.93333333 0.92307692
 0.94117647 0.94117647 0.8        0.94117647]

mean value: 0.9295625942684766

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.88888889 1.         0.875      0.875      1.
 0.88888889 0.88888889 0.85714286 0.88888889]

mean value: 0.9051587301587302

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9607142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     1.         0.9375     0.9375     0.92857143
 0.92857143 0.92857143 0.80357143 0.92857143]

mean value: 0.9267857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.88888889 1.         0.875      0.875      0.85714286
 0.88888889 0.88888889 0.66666667 0.88888889]

mean value: 0.8718253968253968

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.0862155  0.08660769 0.08695054 0.08672595 0.08652806 0.08790731
 0.08697605 0.08698249 0.0864861  0.08691168]

mean value: 0.08682913780212402

key: score_time
value: [0.01669502 0.01704407 0.01698542 0.01700997 0.0178175  0.01703525
 0.01697755 0.01743746 0.01710749 0.01699448]

mean value: 0.017110419273376466

key: test_mcc
value: [0.8819171  0.62994079 0.66143783 0.87287156 0.73214286 0.60714286
 0.66143783 1.         0.66143783 0.6000992 ]

mean value: 0.7308427848714014

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.8125     0.8        0.93333333 0.86666667 0.8
 0.8        1.         0.8        0.8       ]

mean value: 0.855

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.82352941 0.82352941 0.92307692 0.85714286 0.8
 0.76923077 1.         0.76923077 0.82352941]

mean value: 0.852260288730877

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.77777778 0.7        1.         0.85714286 0.75
 1.         1.         1.         0.77777778]

mean value: 0.8862698412698413

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 0.85714286 0.85714286
 0.625      1.         0.625      0.875     ]

mean value: 0.8446428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.8125     0.8125     0.92857143 0.86607143 0.80357143
 0.8125     1.         0.8125     0.79464286]

mean value: 0.8580357142857142

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.7        0.7        0.85714286 0.75       0.66666667
 0.625      1.         0.625      0.7       ]

mean value: 0.7498809523809523

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.04

Accuracy on Blind test: 0.78

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00886059 0.00908279 0.00891972 0.00858617 0.00867081 0.00878835
 0.00867939 0.00867939 0.00872135 0.00870275]

mean value: 0.00876913070678711

key: score_time
value: [0.00852084 0.00866485 0.00882626 0.00853658 0.00854754 0.0085063
 0.00849438 0.00842929 0.00840282 0.0084486 ]

mean value: 0.008537745475769043

key: test_mcc
value: [0.51639778 0.8819171  0.76376262 0.73214286 0.66143783 0.60714286
 0.73214286 0.87287156 0.33928571 0.64465837]

mean value: 0.67517595446532

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.9375     0.86666667 0.86666667 0.8        0.8
 0.86666667 0.93333333 0.66666667 0.8       ]

mean value: 0.82875

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.77777778 0.93333333 0.875      0.85714286 0.82352941 0.8
 0.875      0.94117647 0.66666667 0.84210526]

mean value: 0.8391731780431471

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        1.         0.77777778 0.85714286 0.7        0.75
 0.875      0.88888889 0.71428571 0.72727273]

mean value: 0.7990367965367965

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.85714286
 0.875      1.         0.625      1.        ]

mean value: 0.8964285714285715

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.9375     0.875      0.86607143 0.8125     0.80357143
 0.86607143 0.92857143 0.66964286 0.78571429]

mean value: 0.8294642857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63636364 0.875      0.77777778 0.75       0.7        0.66666667
 0.77777778 0.88888889 0.5        0.72727273]

mean value: 0.7299747474747474

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.63

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.13003492 1.12355256 1.12175441 1.11954212 1.13803053 1.15066171
 1.15327144 1.13439465 1.11341023 1.12649703]

mean value: 1.1311149597167969

key: score_time
value: [0.08830166 0.08753419 0.08891082 0.09308314 0.09482765 0.09497786
 0.09345198 0.08721018 0.08751631 0.09360695]

mean value: 0.09094207286834717

key: test_mcc
value: [0.8819171  0.62994079 0.76376262 0.87287156 1.         0.73214286
 0.60714286 1.         0.66143783 0.64465837]

mean value: 0.7793873982079018

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.8125     0.86666667 0.93333333 1.         0.86666667
 0.8        1.         0.8        0.8       ]

mean value: 0.8816666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.82352941 0.875      0.92307692 1.         0.85714286
 0.8        1.         0.76923077 0.84210526]

mean value: 0.8823418557706484

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.77777778 0.77777778 1.         1.         0.85714286
 0.85714286 1.         1.         0.72727273]

mean value: 0.8997113997113997

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.85714286
 0.75       1.         0.625      1.        ]

mean value: 0.8839285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.8125     0.875      0.92857143 1.         0.86607143
 0.80357143 1.         0.8125     0.78571429]

mean value: 0.8821428571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.7        0.77777778 0.85714286 1.         0.75
 0.66666667 1.         0.625      0.72727273]

mean value: 0.7978860028860029

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.8

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.90586853 0.93603158 0.88809967 0.95108223 0.85255527 0.91164207
 0.91695261 0.85206008 0.92855096 0.90283537]

mean value: 0.9045678377151489

key: score_time
value: [0.24762607 0.22040629 0.204952   0.16551399 0.1194768  0.21311998
 0.22083116 0.18471122 0.22333884 0.13864398]

mean value: 0.19386203289031984

key: test_mcc
value: [0.75       0.75       0.76376262 0.87287156 0.875      0.6000992
 0.73214286 0.75592895 0.66143783 0.6000992 ]

mean value: 0.736134220399536

key: train_mcc
value: [0.98540068 1.         0.98550725 0.98550725 0.98550725 0.97122151
 0.97120941 0.98550418 0.98550418 0.97120941]

mean value: 0.9826571107637088

key: test_accuracy
value: [0.875      0.875      0.86666667 0.93333333 0.93333333 0.8
 0.86666667 0.86666667 0.8        0.8       ]

mean value: 0.8616666666666667

key: train_accuracy
value: [0.99264706 1.         0.99270073 0.99270073 0.99270073 0.98540146
 0.98540146 0.99270073 0.99270073 0.98540146]

mean value: 0.991235508802061

key: test_fscore
value: [0.875      0.875      0.875      0.92307692 0.93333333 0.76923077
 0.875      0.88888889 0.76923077 0.82352941]

mean value: 0.860729009552539

key: train_fscore
value: [0.99259259 1.         0.99270073 0.99270073 0.99270073 0.98529412
 0.98507463 0.99259259 0.99259259 0.98507463]

mean value: 0.9911323338937202

key: test_precision
value: [0.875      0.875      0.77777778 1.         0.875      0.83333333
 0.875      0.8        1.         0.77777778]

mean value: 0.8688888888888889

key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.71428571
 0.875      1.         0.625      0.875     ]

mean value: 0.8696428571428572

key: train_recall
value: [0.98529412 1.         0.98550725 0.98550725 0.98550725 0.97101449
 0.97058824 0.98529412 0.98529412 0.97058824]

mean value: 0.982459505541347

key: test_roc_auc
value: [0.875      0.875      0.875      0.92857143 0.9375     0.79464286
 0.86607143 0.85714286 0.8125     0.79464286]

mean value: 0.8616071428571429

key: train_roc_auc
value: [0.99264706 1.         0.99275362 0.99275362 0.99275362 0.98550725
 0.98529412 0.99264706 0.99264706 0.98529412]

mean value: 0.9912297527706735

key: test_jcc
value: [0.77777778 0.77777778 0.77777778 0.85714286 0.875      0.625
 0.77777778 0.8        0.625      0.7       ]

mean value: 0.7593253968253968

key: train_jcc
value: [0.98529412 1.         0.98550725 0.98550725 0.98550725 0.97101449
 0.97058824 0.98529412 0.98529412 0.97058824]

mean value: 0.982459505541347

MCC on Blind test: 0.1

Accuracy on Blind test: 0.79

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01912498 0.00874758 0.00960207 0.00982785 0.00947642 0.00888705
 0.00887609 0.00895357 0.00890136 0.00877714]

mean value: 0.010117411613464355

key: score_time
value: [0.00888324 0.00855899 0.00940561 0.00900817 0.00922036 0.00869274
 0.00869942 0.00855994 0.00850224 0.00864863]

mean value: 0.008817934989929199

key: test_mcc
value: [0.51639778 0.12598816 0.09449112 0.26189246 0.19642857 0.09449112
 0.37796447 0.21821789 0.21821789 0.37796447]

mean value: 0.24820539348959375

key: train_mcc
value: [0.53033009 0.50349655 0.50959996 0.60584099 0.53517487 0.50469525
 0.59240339 0.45151662 0.50525024 0.53294957]

mean value: 0.5271257516406771

key: test_accuracy
value: [0.75       0.5625     0.53333333 0.6        0.6        0.53333333
 0.66666667 0.6        0.6        0.66666667]

mean value: 0.61125

key: train_accuracy
value: [0.76470588 0.75       0.75182482 0.80291971 0.76642336 0.75182482
 0.79562044 0.72262774 0.75182482 0.76642336]

mean value: 0.7624194933447832

key: test_fscore
value: [0.77777778 0.53333333 0.58823529 0.66666667 0.57142857 0.58823529
 0.61538462 0.57142857 0.57142857 0.61538462]

mean value: 0.6099303311068016

key: train_fscore
value: [0.75757576 0.734375   0.734375   0.8057554  0.75757576 0.74626866
 0.8        0.69354839 0.73846154 0.76119403]

mean value: 0.7529129522960445

key: test_precision
value: [0.7        0.57142857 0.5        0.54545455 0.57142857 0.5
 0.8        0.66666667 0.66666667 0.8       ]

mean value: 0.6321645021645022

key: train_precision
value: [0.78125    0.78333333 0.79661017 0.8        0.79365079 0.76923077
 0.77777778 0.76785714 0.77419355 0.77272727]

mean value: 0.7816630807455712

key: test_recall
value: [0.875      0.5        0.71428571 0.85714286 0.57142857 0.71428571
 0.5        0.5        0.5        0.5       ]

mean value: 0.6232142857142857

key: train_recall
value: [0.73529412 0.69117647 0.68115942 0.8115942  0.72463768 0.72463768
 0.82352941 0.63235294 0.70588235 0.75      ]

mean value: 0.7280264279624894

key: test_roc_auc
value: [0.75       0.5625     0.54464286 0.61607143 0.59821429 0.54464286
 0.67857143 0.60714286 0.60714286 0.67857143]

mean value: 0.61875

key: train_roc_auc
value: [0.76470588 0.75       0.75234442 0.80285592 0.76673061 0.75202472
 0.79582268 0.72197357 0.7514919  0.76630435]

mean value: 0.7624254049445865

key: test_jcc
value: [0.63636364 0.36363636 0.41666667 0.5        0.4        0.41666667
 0.44444444 0.4        0.4        0.44444444]

mean value: 0.44222222222222224

key: train_jcc
value: [0.6097561  0.58024691 0.58024691 0.6746988  0.6097561  0.5952381
 0.66666667 0.5308642  0.58536585 0.61445783]

mean value: 0.6047297461882631

MCC on Blind test: 0.05

Accuracy on Blind test: 0.61

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.04883242 0.04633856 0.04238772 0.05236554 0.04883766 0.04279828
 0.04210663 0.04309344 0.04330492 0.22795987]

mean value: 0.06380250453948974

key: score_time
value: [0.01022696 0.01112413 0.01113605 0.01113963 0.01104569 0.0110836
 0.01104999 0.01103091 0.01105928 0.01130748]

mean value: 0.011020374298095704

key: test_mcc
value: [1.         0.75       1.         1.         0.875      1.
 0.87287156 1.         0.87287156 1.        ]

mean value: 0.9370743121887939

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.875      1.         1.         0.93333333 1.
 0.93333333 1.         0.93333333 1.        ]

mean value: 0.9675

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.875      1.         1.         0.93333333 1.
 0.94117647 1.         0.94117647 1.        ]

mean value: 0.9690686274509804

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      1.         1.         0.875      1.
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9527777777777777

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.    0.875 1.    1.    1.    1.    1.    1.    1.    1.   ]

mean value: 0.9875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.875      1.         1.         0.9375     1.
 0.92857143 1.         0.92857143 1.        ]

mean value: 0.9669642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.77777778 1.         1.         0.875      1.
 0.88888889 1.         0.88888889 1.        ]

mean value: 0.9430555555555555

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.0287745  0.04505372 0.0480001  0.02139306 0.0232172  0.04378772
 0.02212381 0.02828956 0.0516181  0.04729009]

mean value: 0.03595478534698486

key: score_time
value: [0.02199507 0.02018237 0.02165437 0.01165485 0.02183247 0.01162839
 0.01166296 0.02029037 0.02277994 0.02115297]

mean value: 0.01848337650299072

key: test_mcc
value: [0.37796447 0.62994079 0.56407607 0.56407607 0.56407607 0.60714286
 0.47245559 0.64465837 0.33928571 0.33928571]

mean value: 0.5102961734007362

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.6875     0.8125     0.73333333 0.73333333 0.73333333 0.8
 0.73333333 0.8        0.66666667 0.66666667]

mean value: 0.7366666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.70588235 0.82352941 0.77777778 0.77777778 0.77777778 0.8
 0.77777778 0.84210526 0.66666667 0.66666667]

mean value: 0.7615961472308221

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.77777778 0.63636364 0.63636364 0.63636364 0.75
 0.7        0.72727273 0.71428571 0.71428571]

mean value: 0.695937950937951

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.875      1.         1.         1.         0.85714286
 0.875      1.         0.625      0.625     ]

mean value: 0.8607142857142858

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.6875     0.8125     0.75       0.75       0.75       0.80357143
 0.72321429 0.78571429 0.66964286 0.66964286]

mean value: 0.7401785714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.54545455 0.7        0.63636364 0.63636364 0.63636364 0.66666667
 0.63636364 0.72727273 0.5        0.5       ]

mean value: 0.6184848484848484

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.56

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02826953 0.00896072 0.00866175 0.00845218 0.00843525 0.00844526
 0.00838947 0.00843453 0.00848961 0.00843048]

mean value: 0.010496878623962402

key: score_time
value: [0.00984025 0.00874448 0.00840664 0.00832248 0.00841665 0.00833011
 0.00835133 0.00834513 0.00835109 0.00833511]

mean value: 0.008544325828552246

key: test_mcc
value: [ 0.51639778  0.51639778  0.37796447  0.20044593  0.32732684 -0.04029115
  0.21821789  0.05455447  0.07142857  0.46428571]

mean value: 0.27067282990941854

key: train_mcc
value: [0.33856494 0.41194292 0.4690744  0.28501323 0.34357013 0.32871457
 0.35764563 0.32871457 0.34299724 0.34359809]

mean value: 0.3549835701506521

key: test_accuracy
value: [0.75       0.75       0.66666667 0.6        0.66666667 0.46666667
 0.6        0.53333333 0.53333333 0.73333333]

mean value: 0.63

key: train_accuracy
value: [0.66911765 0.70588235 0.72992701 0.64233577 0.67153285 0.66423358
 0.67883212 0.66423358 0.67153285 0.67153285]

mean value: 0.6769160583941606

key: test_fscore
value: [0.77777778 0.71428571 0.70588235 0.4        0.61538462 0.55555556
 0.57142857 0.58823529 0.53333333 0.75      ]

mean value: 0.6211883214824391

key: train_fscore
value: [0.66165414 0.70149254 0.704      0.65734266 0.66666667 0.66176471
 0.67164179 0.66666667 0.66666667 0.65648855]

mean value: 0.6714384376539886

key: test_precision
value: [0.7        0.83333333 0.6        0.66666667 0.66666667 0.45454545
 0.66666667 0.55555556 0.57142857 0.75      ]

mean value: 0.6464862914862914

key: train_precision
value: [0.67692308 0.71212121 0.78571429 0.63513514 0.68181818 0.67164179
 0.68181818 0.65714286 0.67164179 0.68253968]

mean value: 0.6856496195302165

key: test_recall
value: [0.875      0.625      0.85714286 0.28571429 0.57142857 0.71428571
 0.5        0.625      0.5        0.75      ]

mean value: 0.6303571428571428

key: train_recall
value: [0.64705882 0.69117647 0.63768116 0.68115942 0.65217391 0.65217391
 0.66176471 0.67647059 0.66176471 0.63235294]

mean value: 0.6593776641091219

key: test_roc_auc
value: [0.75       0.75       0.67857143 0.58035714 0.66071429 0.48214286
 0.60714286 0.52678571 0.53571429 0.73214286]

mean value: 0.6303571428571428

key: train_roc_auc
value: [0.66911765 0.70588235 0.73060529 0.6420503  0.67167519 0.66432225
 0.67870844 0.66432225 0.67146206 0.67124893]

mean value: 0.6769394714407502

key: test_jcc
value: [0.63636364 0.55555556 0.54545455 0.25       0.44444444 0.38461538
 0.4        0.41666667 0.36363636 0.6       ]

mean value: 0.4596736596736597

key: train_jcc
value: [0.49438202 0.54022989 0.54320988 0.48958333 0.5        0.49450549
 0.50561798 0.5        0.5        0.48863636]

mean value: 0.5056164953075872

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01021981 0.01472306 0.01346636 0.01583171 0.01679564 0.01497579
 0.01578879 0.01588607 0.01462793 0.01430941]

mean value: 0.014662456512451173

key: score_time
value: [0.00839567 0.01128626 0.01132059 0.01161289 0.01131344 0.01138806
 0.01125145 0.01127243 0.01126599 0.01127338]

mean value: 0.01103801727294922

key: test_mcc
value: [0.77459667 0.75       0.6000992  1.         0.76376262 0.73214286
 0.33928571 0.87287156 0.66143783 0.73214286]

mean value: 0.7226339300497981

key: train_mcc
value: [0.94158382 0.98540068 0.91597649 0.95713391 0.97122151 0.95629932
 0.94199209 0.97080136 0.95710706 0.92787101]

mean value: 0.9525387247176611

key: test_accuracy
value: [0.875      0.875      0.8        1.         0.86666667 0.86666667
 0.66666667 0.93333333 0.8        0.86666667]

mean value: 0.855

key: train_accuracy
value: [0.97058824 0.99264706 0.95620438 0.97810219 0.98540146 0.97810219
 0.97080292 0.98540146 0.97810219 0.96350365]

mean value: 0.9758855732073852

key: test_fscore
value: [0.85714286 0.875      0.76923077 1.         0.875      0.85714286
 0.66666667 0.94117647 0.76923077 0.875     ]

mean value: 0.8485590390002155

key: train_fscore
value: [0.97014925 0.99270073 0.95454545 0.97777778 0.98529412 0.97841727
 0.97014925 0.98529412 0.97744361 0.96240602]

mean value: 0.9754177595254244

key: test_precision
value: [1.         0.875      0.83333333 1.         0.77777778 0.85714286
 0.71428571 0.88888889 1.         0.875     ]

mean value: 0.8821428571428571

key: train_precision
value: [0.98484848 0.98550725 1.         1.         1.         0.97142857
 0.98484848 0.98529412 1.         0.98461538]

mean value: 0.9896542289764796

key: test_recall
value: [0.75       0.875      0.71428571 1.         1.         0.85714286
 0.625      1.         0.625      0.875     ]

mean value: 0.8321428571428572

key: train_recall
value: [0.95588235 1.         0.91304348 0.95652174 0.97101449 0.98550725
 0.95588235 0.98529412 0.95588235 0.94117647]

mean value: 0.9620204603580563

key: test_roc_auc
value: [0.875      0.875      0.79464286 1.         0.875      0.86607143
 0.66964286 0.92857143 0.8125     0.86607143]

mean value: 0.85625

key: train_roc_auc
value: [0.97058824 0.99264706 0.95652174 0.97826087 0.98550725 0.97804774
 0.9706948  0.98540068 0.97794118 0.96334186]

mean value: 0.9758951406649616

key: test_jcc
value: [0.75       0.77777778 0.625      1.         0.77777778 0.75
 0.5        0.88888889 0.625      0.77777778]

mean value: 0.7472222222222222

key: train_jcc
value: [0.94202899 0.98550725 0.91304348 0.95652174 0.97101449 0.95774648
 0.94202899 0.97101449 0.95588235 0.92753623]

mean value: 0.9522324483988329

MCC on Blind test: 0.08

Accuracy on Blind test: 0.46

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01297069 0.01287246 0.0131824  0.01306939 0.01283884 0.0133245
 0.01278806 0.01465464 0.01478004 0.035779  ]

mean value: 0.015626001358032226

key: score_time
value: [0.01125312 0.01122284 0.01120329 0.01118088 0.01124144 0.01121807
 0.01123738 0.01154613 0.01181602 0.0119319 ]

mean value: 0.011385107040405273

key: test_mcc
value: [0.57735027 0.62994079 0.875      0.875      0.66143783 0.6000992
 0.49099025 0.28571429 0.46428571 0.73214286]

mean value: 0.6191961193627304

key: train_mcc
value: [0.61134064 0.91533482 1.         0.87086187 0.94199209 0.62625207
 0.84660737 0.4690195  0.73427935 0.92709446]

mean value: 0.7942782177369494

key: test_accuracy
value: [0.75       0.8125     0.93333333 0.93333333 0.8        0.8
 0.73333333 0.6        0.73333333 0.86666667]

mean value: 0.79625

key: train_accuracy
value: [0.77205882 0.95588235 1.         0.93430657 0.97080292 0.7810219
 0.91970803 0.67883212 0.8540146  0.96350365]

mean value: 0.8830130957492486

key: test_fscore
value: [0.8        0.82352941 0.93333333 0.93333333 0.82352941 0.76923077
 0.71428571 0.72727273 0.75       0.875     ]

mean value: 0.814951470098529

key: train_fscore
value: [0.81437126 0.95774648 1.         0.93706294 0.97142857 0.72222222
 0.91338583 0.75555556 0.87012987 0.96296296]

mean value: 0.8904865682492042

key: test_precision
value: [0.66666667 0.77777778 0.875      0.875      0.7        0.83333333
 0.83333333 0.57142857 0.75       0.875     ]

mean value: 0.7757539682539683

key: train_precision
value: [0.68686869 0.91891892 1.         0.90540541 0.95774648 1.
 0.98305085 0.60714286 0.77906977 0.97014925]

mean value: 0.8808352215839939

key: test_recall
value: [1.         0.875      1.         1.         1.         0.71428571
 0.625      1.         0.75       0.875     ]

mean value: 0.8839285714285714

key: train_recall
value: [1.         1.         1.         0.97101449 0.98550725 0.56521739
 0.85294118 1.         0.98529412 0.95588235]

mean value: 0.9315856777493606

key: test_roc_auc
value: [0.75       0.8125     0.9375     0.9375     0.8125     0.79464286
 0.74107143 0.57142857 0.73214286 0.86607143]

mean value: 0.7955357142857142

key: train_roc_auc
value: [0.77205882 0.95588235 1.         0.93403666 0.9706948  0.7826087
 0.91922421 0.68115942 0.8549659  0.96344842]

mean value: 0.8834079283887468

key: test_jcc
value: [0.66666667 0.7        0.875      0.875      0.7        0.625
 0.55555556 0.57142857 0.6        0.77777778]

mean value: 0.6946428571428571

key: train_jcc
value: [0.68686869 0.91891892 1.         0.88157895 0.94444444 0.56521739
 0.84057971 0.60714286 0.77011494 0.92857143]

mean value: 0.8143437327292767

MCC on Blind test: 0.08

Accuracy on Blind test: 0.48

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10859799 0.09386134 0.09250259 0.09275675 0.09327698 0.09390163
 0.0930047  0.09283376 0.09291363 0.09362578]

mean value: 0.0947275161743164

key: score_time
value: [0.01455379 0.01461649 0.01445794 0.01458097 0.01457262 0.01485562
 0.01465678 0.01462412 0.01461625 0.01483679]

mean value: 0.014637136459350586

key: test_mcc
value: [0.8819171  0.75       0.875      1.         0.875      1.
 0.75592895 1.         0.60714286 0.87287156]

mean value: 0.8617860467793478

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.93333333 1.         0.93333333 1.
 0.86666667 1.         0.8        0.93333333]

mean value: 0.9279166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.875      0.93333333 1.         0.93333333 1.
 0.88888889 1.         0.8        0.94117647]

mean value: 0.9305065359477124

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.875      0.875      1.         0.875      1.
 0.8        1.         0.85714286 0.88888889]

mean value: 0.9171031746031746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875 0.875 1.    1.    1.    1.    1.    1.    0.75  1.   ]

mean value: 0.95

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      0.9375     1.         0.9375     1.
 0.85714286 1.         0.80357143 0.92857143]

mean value: 0.9276785714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.77777778 0.875      1.         0.875      1.
 0.8        1.         0.66666667 0.88888889]

mean value: 0.8758333333333334

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.79

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04233217 0.0453558  0.03615093 0.0350399  0.05168247 0.04953241
 0.03539181 0.02963448 0.02912092 0.03293037]

mean value: 0.038717126846313475

key: score_time
value: [0.02476406 0.01846552 0.02050114 0.03115773 0.03751206 0.03031969
 0.01657867 0.01927161 0.01637602 0.02038503]

mean value: 0.023533153533935546

key: test_mcc
value: [1.         0.75       0.875      1.         0.73214286 0.73214286
 0.87287156 1.         0.60714286 1.        ]

mean value: 0.856930013237254

key: train_mcc
value: [1.         1.         0.98550725 1.         0.98550725 1.
 1.         1.         1.         1.        ]

mean value: 0.9971014492753624

key: test_accuracy
value: [1.         0.875      0.93333333 1.         0.86666667 0.86666667
 0.93333333 1.         0.8        1.        ]

mean value: 0.9275

key: train_accuracy
value: [1.         1.         0.99270073 1.         0.99270073 1.
 1.         1.         1.         1.        ]

mean value: 0.9985401459854014

key: test_fscore
value: [1.         0.875      0.93333333 1.         0.85714286 0.85714286
 0.94117647 1.         0.8        1.        ]

mean value: 0.9263795518207283

key: train_fscore
value: [1.         1.         0.99270073 1.         0.99270073 1.
 1.         1.         1.         1.        ]

mean value: 0.9985401459854015

key: test_precision
value: [1.         0.875      0.875      1.         0.85714286 0.85714286
 0.88888889 1.         0.85714286 1.        ]

mean value: 0.921031746031746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9339285714285714

key: train_recall
value: [1.         1.         0.98550725 1.         0.98550725 1.
 1.         1.         1.         1.        ]

mean value: 0.9971014492753624

key: test_roc_auc
value: [1.         0.875      0.9375     1.         0.86607143 0.86607143
 0.92857143 1.         0.80357143 1.        ]

mean value: 0.9276785714285715

key: train_roc_auc
value: [1.         1.         0.99275362 1.         0.99275362 1.
 1.         1.         1.         1.        ]

mean value: 0.9985507246376811

key: test_jcc
value: [1.         0.77777778 0.875      1.         0.75       0.75
 0.88888889 1.         0.66666667 1.        ]

mean value: 0.8708333333333333

key: train_jcc
value: [1.         1.         0.98550725 1.         0.98550725 1.
 1.         1.         1.         1.        ]

mean value: 0.9971014492753624

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.04632425 0.04589558 0.05224323 0.04858661 0.0493474  0.04912758
 0.04896307 0.06054115 0.0467627  0.04628658]

mean value: 0.04940781593322754

key: score_time
value: [0.0182476  0.022089   0.02181435 0.02127481 0.02358222 0.02248502
 0.02054977 0.02320457 0.01664424 0.02343416]

mean value: 0.021332573890686036

key: test_mcc
value: [0.40451992 0.62994079 0.56407607 0.875      0.76376262 0.49099025
 0.33928571 0.87287156 0.66143783 0.6000992 ]

mean value: 0.620198395064619

key: train_mcc
value: [0.98540068 0.98540068 0.98550418 0.98550418 0.98550418 0.98550418
 0.98550725 0.98550725 0.98550725 1.        ]

mean value: 0.9869339808783808

key: test_accuracy
value: [0.6875     0.8125     0.73333333 0.93333333 0.86666667 0.73333333
 0.66666667 0.93333333 0.8        0.8       ]

mean value: 0.7966666666666666

key: train_accuracy
value: [0.99264706 0.99264706 0.99270073 0.99270073 0.99270073 0.99270073
 0.99270073 0.99270073 0.99270073 1.        ]

mean value: 0.993419922713611

key: test_fscore
value: [0.73684211 0.82352941 0.77777778 0.93333333 0.875      0.75
 0.66666667 0.94117647 0.76923077 0.82352941]

mean value: 0.8097085946389352

key: train_fscore
value: [0.99270073 0.99270073 0.99280576 0.99280576 0.99280576 0.99280576
 0.99270073 0.99270073 0.99270073 1.        ]

mean value: 0.9934726671217771

key: test_precision
value: [0.63636364 0.77777778 0.63636364 0.875      0.77777778 0.66666667
 0.71428571 0.88888889 1.         0.77777778]

mean value: 0.7750901875901876

key: train_precision
value: [0.98550725 0.98550725 0.98571429 0.98571429 0.98571429 0.98571429
 0.98550725 0.98550725 0.98550725 1.        ]

mean value: 0.9870393374741201

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 0.625      1.         0.625      0.875     ]

mean value: 0.8732142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.6875     0.8125     0.75       0.9375     0.875      0.74107143
 0.66964286 0.92857143 0.8125     0.79464286]

mean value: 0.8008928571428572

key: train_roc_auc
value: [0.99264706 0.99264706 0.99264706 0.99264706 0.99264706 0.99264706
 0.99275362 0.99275362 0.99275362 1.        ]

mean value: 0.9934143222506394

key: test_jcc
value: [0.58333333 0.7        0.63636364 0.875      0.77777778 0.6
 0.5        0.88888889 0.625      0.7       ]

mean value: 0.6886363636363636

key: train_jcc
value: [0.98550725 0.98550725 0.98571429 0.98571429 0.98571429 0.98571429
 0.98550725 0.98550725 0.98550725 1.        ]

mean value: 0.9870393374741201

MCC on Blind test: -0.06

Accuracy on Blind test: 0.67

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.24086428 0.1915431  0.19633222 0.23168492 0.23006678 0.22822237
 0.22251344 0.19692373 0.23212194 0.23451424]

mean value: 0.2204787015914917

key: score_time
value: [0.00912738 0.00884342 0.00901175 0.00892282 0.00904131 0.00898123
 0.00881338 0.00891829 0.00899601 0.00898814]

mean value: 0.00896437168121338

key: test_mcc
value: [1.         0.8819171  1.         1.         0.875      0.87287156
 0.87287156 0.87287156 0.60714286 0.87287156]

mean value: 0.8855546204606932

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     1.         1.         0.93333333 0.93333333
 0.93333333 0.93333333 0.8        0.93333333]

mean value: 0.9404166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94117647 1.         1.         0.93333333 0.92307692
 0.94117647 0.94117647 0.8        0.94117647]

mean value: 0.9421116138763197

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.88888889 1.         1.         0.875      1.
 0.88888889 0.88888889 0.85714286 0.88888889]

mean value: 0.9287698412698413

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9607142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9375     1.         1.         0.9375     0.92857143
 0.92857143 0.92857143 0.80357143 0.92857143]

mean value: 0.9392857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.88888889 1.         1.         0.875      0.85714286
 0.88888889 0.88888889 0.66666667 0.88888889]

mean value: 0.8954365079365079

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01531243 0.01658034 0.01707101 0.01658964 0.01652956 0.01666713
 0.0167017  0.0350194  0.01742578 0.03410149]

mean value: 0.020199847221374512

key: score_time
value: [0.01163173 0.01171184 0.0116539  0.01162744 0.01165056 0.01259828
 0.01246953 0.01177812 0.01256156 0.01198077]

mean value: 0.011966371536254882

key: test_mcc
value: [0.77459667 0.8819171  0.75592895 0.75592895 0.75592895 0.75592895
 0.46770717 1.         0.56407607 0.66143783]

mean value: 0.7373450632934155

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.9375     0.86666667 0.86666667 0.86666667 0.86666667
 0.66666667 1.         0.73333333 0.8       ]

mean value: 0.8479166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.93333333 0.83333333 0.83333333 0.83333333 0.83333333
 0.54545455 1.         0.66666667 0.76923077]

mean value: 0.8105161505161504

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.875      0.71428571 0.71428571 0.71428571 0.71428571
 0.375      1.         0.5        0.625     ]

mean value: 0.6982142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.9375     0.85714286 0.85714286 0.85714286 0.85714286
 0.6875     1.         0.75       0.8125    ]

mean value: 0.8491071428571428

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.875      0.71428571 0.71428571 0.71428571 0.71428571
 0.375      1.         0.5        0.625     ]

mean value: 0.6982142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.99

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03337741 0.03868771 0.03221297 0.03223085 0.03231907 0.03239298
 0.032233   0.03233957 0.03223705 0.03226876]

mean value: 0.033029937744140626

key: score_time
value: [0.02033663 0.0221262  0.01984525 0.02155614 0.01149225 0.02050543
 0.02263975 0.01156116 0.02123237 0.02290106]

mean value: 0.01941962242126465

key: test_mcc
value: [0.8819171  0.75       0.66143783 1.         0.66143783 0.73214286
 0.87287156 0.87287156 0.60714286 0.87287156]

mean value: 0.7912693156338115

key: train_mcc
value: [0.97058824 0.98540068 0.98550725 0.97080136 0.97080136 0.97080136
 0.97080136 0.97080136 0.98550418 0.97080136]

mean value: 0.975180852135241

key: test_accuracy
value: [0.9375     0.875      0.8        1.         0.8        0.86666667
 0.93333333 0.93333333 0.8        0.93333333]

mean value: 0.8879166666666667

key: train_accuracy
value: [0.98529412 0.99264706 0.99270073 0.98540146 0.98540146 0.98540146
 0.98540146 0.98540146 0.99270073 0.98540146]

mean value: 0.9875751395448691

key: test_fscore
value: [0.93333333 0.875      0.82352941 1.         0.82352941 0.85714286
 0.94117647 0.94117647 0.8        0.94117647]

mean value: 0.8936064425770308

key: train_fscore
value: [0.98529412 0.99270073 0.99270073 0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 0.99259259 0.98529412]

mean value: 0.9875692262165278

key: test_precision
value: [1.         0.875      0.7        1.         0.7        0.85714286
 0.88888889 0.88888889 0.85714286 0.88888889]

mean value: 0.8655952380952381

key: train_precision
value: [0.98529412 0.98550725 1.         0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 1.         0.98529412]

mean value: 0.9883205456095482

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9357142857142857

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:155: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:158: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98529412 1.         0.98550725 0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9868499573742541

key: test_roc_auc
value: [0.9375     0.875      0.8125     1.         0.8125     0.86607143
 0.92857143 0.92857143 0.80357143 0.92857143]

mean value: 0.8892857142857143

key: train_roc_auc
value: [0.98529412 0.99264706 0.99275362 0.98540068 0.98540068 0.98540068
 0.98540068 0.98540068 0.99264706 0.98540068]

mean value: 0.9875745950554136

key: test_jcc
value: [0.875      0.77777778 0.7        1.         0.7        0.75
 0.88888889 0.88888889 0.66666667 0.88888889]

mean value: 0.8136111111111111

key: train_jcc
value: [0.97101449 0.98550725 0.98550725 0.97142857 0.97142857 0.97142857
 0.97101449 0.97101449 0.98529412 0.97101449]

mean value: 0.975465229570089

MCC on Blind test: 0.12

Accuracy on Blind test: 0.67

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.21865153 0.19576311 0.21859837 0.21810484 0.19659543 0.20908713
 0.1991601  0.19221115 0.19844651 0.20431876]

mean value: 0.20509369373321534

key: score_time
value: [0.02091098 0.02276993 0.0219698  0.0205822  0.02166986 0.01975226
 0.02068043 0.02233529 0.02191591 0.02306247]

mean value: 0.02156491279602051

key: test_mcc
value: [0.8819171  0.75       0.66143783 1.         0.66143783 0.49099025
 0.6000992  0.87287156 0.60714286 0.87287156]

mean value: 0.7398768189431251

key: train_mcc
value: [0.97058824 0.98540068 0.98550725 0.98550418 0.97080136 1.
 0.98550725 0.97080136 0.98550418 0.97080136]

mean value: 0.9810415854946039

key: test_accuracy
value: [0.9375     0.875      0.8        1.         0.8        0.73333333
 0.8        0.93333333 0.8        0.93333333]

mean value: 0.8612500000000001

key: train_accuracy
value: [0.98529412 0.99264706 0.99270073 0.99270073 0.98540146 1.
 0.99270073 0.98540146 0.99270073 0.98540146]

mean value: 0.9904948475740661

key: test_fscore
value: [0.93333333 0.875      0.82352941 1.         0.82352941 0.75
 0.82352941 0.94117647 0.8        0.94117647]

mean value: 0.8711274509803921

key: train_fscore
value: [0.98529412 0.99270073 0.99270073 0.99280576 0.98550725 1.
 0.99270073 0.98529412 0.99259259 0.98529412]

mean value: 0.9904890137087287

key: test_precision
value: [1.         0.875      0.7        1.         0.7        0.66666667
 0.77777778 0.88888889 0.85714286 0.88888889]

mean value: 0.8354365079365079

key: train_precision
value: [0.98529412 0.98550725 1.         0.98571429 0.98550725 1.
 0.98550725 0.98529412 1.         0.98529412]

mean value: 0.9898118377785897

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 0.875      1.         0.75       1.        ]

mean value: 0.9232142857142858

key: train_recall
value: [0.98529412 1.         0.98550725 1.         0.98550725 1.
 1.         0.98529412 0.98529412 0.98529412]

mean value: 0.9912190963341859

key: test_roc_auc
value: [0.9375     0.875      0.8125     1.         0.8125     0.74107143
 0.79464286 0.92857143 0.80357143 0.92857143]

mean value: 0.8633928571428572

key: train_roc_auc
value: [0.98529412 0.99264706 0.99275362 0.99264706 0.98540068 1.
 0.99275362 0.98540068 0.99264706 0.98540068]

mean value: 0.9904944586530264

key: test_jcc
value: [0.875      0.77777778 0.7        1.         0.7        0.6
 0.7        0.88888889 0.66666667 0.88888889]

mean value: 0.7797222222222222

key: train_jcc
value: [0.97101449 0.98550725 0.98550725 0.98571429 0.97142857 1.
 0.98550725 0.97101449 0.98529412 0.97101449]

mean value: 0.9812002192181221

MCC on Blind test: 0.11

Accuracy on Blind test: 0.62

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02503753 0.02437925 0.0221231  0.02506065 0.02024388 0.02260089
 0.02252674 0.0209341  0.02013969 0.02230787]

mean value: 0.022535371780395507

key: score_time
value: [0.01144624 0.01141882 0.01129723 0.0114677  0.01127219 0.01128459
 0.01127362 0.01134109 0.01124883 0.01125479]

mean value: 0.011330509185791015

key: test_mcc
value: [0.31622777 0.15811388 0.8        0.5976143  0.63245553 0.8
 0.25819889 0.5        0.         0.77459667]

mean value: 0.4837207044714774

key: train_mcc
value: [1.         0.92234997 0.92240216 0.94804318 0.8972297  0.87263594
 0.94871795 0.97467943 0.92338052 0.89861829]

mean value: 0.930805714729196

key: test_accuracy
value: [0.66666667 0.55555556 0.88888889 0.77777778 0.77777778 0.88888889
 0.625      0.75       0.5        0.875     ]

mean value: 0.7305555555555555

key: train_accuracy
value: [1.         0.96103896 0.96103896 0.97402597 0.94805195 0.93506494
 0.97435897 0.98717949 0.96153846 0.94871795]

mean value: 0.9651015651015651

key: test_fscore
value: [0.57142857 0.6        0.88888889 0.83333333 0.75       0.88888889
 0.66666667 0.75       0.6        0.85714286]

mean value: 0.7406349206349206

key: train_fscore
value: [1.         0.96202532 0.96103896 0.97368421 0.94594595 0.93150685
 0.97435897 0.98701299 0.96103896 0.94736842]

mean value: 0.9643980626745541

key: test_precision
value: [0.66666667 0.5        0.8        0.71428571 1.         1.
 0.6        0.75       0.5        1.        ]

mean value: 0.753095238095238

key: train_precision
value: [1.         0.95       0.97368421 0.97368421 0.97222222 0.97142857
 0.97435897 1.         0.97368421 0.97297297]

mean value: 0.9762035372561688

key: test_recall
value: [0.5  0.75 1.   1.   0.6  0.8  0.75 0.75 0.75 0.75]

mean value: 0.765

key: train_recall
value: [1.         0.97435897 0.94871795 0.97368421 0.92105263 0.89473684
 0.97435897 0.97435897 0.94871795 0.92307692]

mean value: 0.953306342780027

key: test_roc_auc
value: [0.65  0.575 0.9   0.75  0.8   0.9   0.625 0.75  0.5   0.875]

mean value: 0.7325

key: train_roc_auc
value: [1.         0.9608637  0.96120108 0.97402159 0.9477058  0.93454791
 0.97435897 0.98717949 0.96153846 0.94871795]

mean value: 0.9650134952766531

key: test_jcc
value: [0.4        0.42857143 0.8        0.71428571 0.6        0.8
 0.5        0.6        0.42857143 0.75      ]

mean value: 0.6021428571428571

key: train_jcc
value: [1.         0.92682927 0.925      0.94871795 0.8974359  0.87179487
 0.95       0.97435897 0.925      0.9       ]

mean value: 0.9319136960600375

MCC on Blind test: 0.09

Accuracy on Blind test: 0.55

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.67788839 0.4790628  0.51931477 0.50791335 0.656353   0.51453042
 0.48985314 0.49819374 0.5512023  0.60133958]

mean value: 0.5495651483535766

key: score_time
value: [0.01164913 0.01161647 0.01164889 0.01170087 0.01168561 0.02243257
 0.01179647 0.01173663 0.02173638 0.0123179 ]

mean value: 0.01383209228515625

key: test_mcc
value: [0.35       0.47809144 0.8        0.5976143  0.63245553 0.1
 0.57735027 0.5        0.5        0.77459667]

mean value: 0.5310108218865739

key: train_mcc
value: [1.         0.61039852 1.         1.         0.97434188 0.48234809
 1.         1.         1.         1.        ]

mean value: 0.9067088488928768

key: test_accuracy
value: [0.66666667 0.66666667 0.88888889 0.77777778 0.77777778 0.55555556
 0.75       0.75       0.75       0.875     ]

mean value: 0.7458333333333333

key: train_accuracy
value: [1.         0.80519481 1.         1.         0.98701299 0.74025974
 1.         1.         1.         1.        ]

mean value: 0.9532467532467532

key: test_fscore
value: [0.66666667 0.72727273 0.88888889 0.83333333 0.75       0.6
 0.8        0.75       0.75       0.85714286]

mean value: 0.7623304473304473

key: train_fscore
value: [1.         0.81012658 1.         1.         0.98666667 0.72222222
 1.         1.         1.         1.        ]

mean value: 0.951901547116737

key: test_precision
value: [0.6        0.57142857 0.8        0.71428571 1.         0.6
 0.66666667 0.75       0.75       1.        ]

mean value: 0.7452380952380953

key: train_precision
value: [1.         0.8        1.         1.         1.         0.76470588
 1.         1.         1.         1.        ]

mean value: 0.9564705882352941

key: test_recall
value: [0.75 1.   1.   1.   0.6  0.6  1.   0.75 0.75 0.75]

mean value: 0.82

key: train_recall
value: [1.         0.82051282 1.         1.         0.97368421 0.68421053
 1.         1.         1.         1.        ]

mean value: 0.9478407557354925

key: test_roc_auc
value: [0.675 0.7   0.9   0.75  0.8   0.55  0.75  0.75  0.75  0.875]

mean value: 0.75

key: train_roc_auc
value: [1.         0.80499325 1.         1.         0.98684211 0.73954116
 1.         1.         1.         1.        ]

mean value: 0.9531376518218624

key: test_jcc
value: [0.5        0.57142857 0.8        0.71428571 0.6        0.42857143
 0.66666667 0.6        0.6        0.75      ]

mean value: 0.623095238095238

key: train_jcc
value: [1.         0.68085106 1.         1.         0.97368421 0.56521739
 1.         1.         1.         1.        ]

mean value: 0.9219752665660451

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01265955 0.01204038 0.0094521  0.00962925 0.00864935 0.00926232
 0.00898719 0.00849867 0.00921798 0.00849581]

mean value: 0.009689259529113769

key: score_time
value: [0.01206136 0.00932217 0.00953627 0.00923824 0.00897503 0.00906634
 0.00922036 0.00909305 0.00911117 0.00872612]

mean value: 0.009435009956359864

key: test_mcc
value: [-0.15811388  0.31622777  0.63245553  0.39528471  0.1         0.1
  0.37796447  0.25819889  0.25819889  0.57735027]

mean value: 0.28575666442563175

key: train_mcc
value: [0.66849369 0.50674764 0.53342348 0.53279352 0.53591229 0.5064147
 0.7073494  0.56428809 0.68516016 0.4897623 ]

mean value: 0.5730345270522292

key: test_accuracy
value: [0.44444444 0.55555556 0.77777778 0.66666667 0.55555556 0.55555556
 0.625      0.625      0.625      0.75      ]

mean value: 0.6180555555555556

key: train_accuracy
value: [0.83116883 0.75324675 0.76623377 0.76623377 0.76623377 0.75324675
 0.84615385 0.78205128 0.83333333 0.74358974]

mean value: 0.7841491841491842

key: test_fscore
value: [0.28571429 0.66666667 0.8        0.76923077 0.6        0.6
 0.72727273 0.66666667 0.57142857 0.66666667]

mean value: 0.6353646353646354

key: train_fscore
value: [0.82191781 0.75324675 0.76315789 0.75675676 0.775      0.74666667
 0.82857143 0.77922078 0.8115942  0.75609756]

mean value: 0.7792229851292566

key: test_precision
value: [0.33333333 0.5        0.66666667 0.625      0.6        0.6
 0.57142857 0.6        0.66666667 1.        ]

mean value: 0.6163095238095238

key: train_precision
value: [0.88235294 0.76315789 0.78378378 0.77777778 0.73809524 0.75675676
 0.93548387 0.78947368 0.93333333 0.72093023]

mean value: 0.808114551339661

key: test_recall
value: [0.25 1.   1.   1.   0.6  0.6  1.   0.75 0.5  0.5 ]

mean value: 0.72

key: train_recall
value: [0.76923077 0.74358974 0.74358974 0.73684211 0.81578947 0.73684211
 0.74358974 0.76923077 0.71794872 0.79487179]

mean value: 0.7571524966261809

key: test_roc_auc
value: [0.425 0.6   0.8   0.625 0.55  0.55  0.625 0.625 0.625 0.75 ]

mean value: 0.6175

key: train_roc_auc
value: [0.83198381 0.75337382 0.76653171 0.76585695 0.7668691  0.75303644
 0.84615385 0.78205128 0.83333333 0.74358974]

mean value: 0.7842780026990553

key: test_jcc
value: [0.16666667 0.5        0.66666667 0.625      0.42857143 0.42857143
 0.57142857 0.5        0.4        0.5       ]

mean value: 0.47869047619047617

key: train_jcc
value: [0.69767442 0.60416667 0.61702128 0.60869565 0.63265306 0.59574468
 0.70731707 0.63829787 0.68292683 0.60784314]

mean value: 0.6392340668150881

MCC on Blind test: 0.02

Accuracy on Blind test: 0.48

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00923014 0.0090003  0.00944948 0.00926304 0.0088408  0.00969529
 0.00979328 0.00931954 0.00897288 0.00965595]

mean value: 0.009322071075439453

key: score_time
value: [0.00919342 0.0087328  0.00874853 0.00913787 0.00951767 0.00950384
 0.00897765 0.00919819 0.00955391 0.00890183]

mean value: 0.009146571159362793

key: test_mcc
value: [ 0.5976143   0.55        0.8         0.15811388  0.15811388  0.05976143
 -0.57735027  0.25819889 -0.25819889  0.37796447]

mean value: 0.2124217704970356

key: train_mcc
value: [0.74021592 0.71670195 0.68898046 0.68898046 0.71987403 0.64957894
 0.72392277 0.66864785 0.72980045 0.76948376]

mean value: 0.7096186585687321

key: test_accuracy
value: [0.77777778 0.77777778 0.88888889 0.55555556 0.55555556 0.55555556
 0.25       0.625      0.375      0.625     ]

mean value: 0.5986111111111111

key: train_accuracy
value: [0.87012987 0.85714286 0.84415584 0.84415584 0.85714286 0.81818182
 0.85897436 0.83333333 0.85897436 0.88461538]

mean value: 0.8526806526806526

key: test_fscore
value: [0.66666667 0.75       0.88888889 0.5        0.5        0.66666667
 0.         0.57142857 0.44444444 0.4       ]

mean value: 0.5388095238095238

key: train_fscore
value: [0.87179487 0.85333333 0.85       0.83783784 0.84507042 0.79411765
 0.84931507 0.82666667 0.84507042 0.88607595]

mean value: 0.8459282219622195

key: test_precision
value: [1.         0.75       0.8        0.66666667 0.66666667 0.57142857
 0.         0.66666667 0.4        1.        ]

mean value: 0.6521428571428571

key: train_precision
value: [0.87179487 0.88888889 0.82926829 0.86111111 0.90909091 0.9
 0.91176471 0.86111111 0.9375     0.875     ]

mean value: 0.8845529890562172

key: test_recall
value: [0.5  0.75 1.   0.4  0.4  0.8  0.   0.5  0.5  0.25]

mean value: 0.51

key: train_recall
value: [0.87179487 0.82051282 0.87179487 0.81578947 0.78947368 0.71052632
 0.79487179 0.79487179 0.76923077 0.8974359 ]

mean value: 0.8136302294197031

key: test_roc_auc
value: [0.75  0.775 0.9   0.575 0.575 0.525 0.25  0.625 0.375 0.625]

mean value: 0.5975

key: train_roc_auc
value: [0.87010796 0.85762483 0.84379217 0.84379217 0.8562753  0.81680162
 0.85897436 0.83333333 0.85897436 0.88461538]

mean value: 0.8524291497975709

key: test_jcc
value: [0.5        0.6        0.8        0.33333333 0.33333333 0.5
 0.         0.4        0.28571429 0.25      ]

mean value: 0.4002380952380952

key: train_jcc
value: [0.77272727 0.74418605 0.73913043 0.72093023 0.73170732 0.65853659
 0.73809524 0.70454545 0.73170732 0.79545455]

mean value: 0.7337020444187082

MCC on Blind test: 0.04

Accuracy on Blind test: 0.56

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00935268 0.0094099  0.00923061 0.00814486 0.00917649 0.00927663
 0.00916672 0.00931597 0.00929832 0.00910544]

mean value: 0.0091477632522583

key: score_time
value: [0.0149107  0.0150919  0.01361489 0.0100975  0.01014137 0.01008272
 0.01007795 0.01012373 0.01002026 0.01466894]

mean value: 0.011882996559143067

key: test_mcc
value: [ 0.          0.55        0.8         0.55        0.31622777  0.31622777
 -0.37796447  0.25819889  0.57735027  0.25819889]

mean value: 0.32482391077083966

key: train_mcc
value: [0.45639039 0.48741471 0.53591229 0.53238866 0.63928106 0.48977837
 0.64442408 0.46291005 0.48782136 0.36004115]

mean value: 0.5096362111380015

key: test_accuracy
value: [0.55555556 0.77777778 0.88888889 0.77777778 0.55555556 0.66666667
 0.375      0.625      0.75       0.625     ]

mean value: 0.6597222222222222

key: train_accuracy
value: [0.72727273 0.74025974 0.76623377 0.76623377 0.81818182 0.74025974
 0.82051282 0.73076923 0.74358974 0.67948718]

mean value: 0.7532800532800533

key: test_fscore
value: [0.         0.75       0.88888889 0.8        0.33333333 0.72727273
 0.54545455 0.66666667 0.66666667 0.57142857]

mean value: 0.5949711399711399

key: train_fscore
value: [0.72       0.72222222 0.75675676 0.76315789 0.80555556 0.70588235
 0.81081081 0.72       0.73684211 0.66666667]

mean value: 0.7407894364953188

key: test_precision
value: [0.         0.75       0.8        0.8        1.         0.66666667
 0.42857143 0.6        1.         0.66666667]

mean value: 0.6711904761904762

key: train_precision
value: [0.75       0.78787879 0.8        0.76315789 0.85294118 0.8
 0.85714286 0.75       0.75675676 0.69444444]

mean value: 0.7812321917430276

key: test_recall
value: [0.   0.75 1.   0.8  0.2  0.8  0.75 0.75 0.5  0.5 ]

mean value: 0.605

key: train_recall
value: [0.69230769 0.66666667 0.71794872 0.76315789 0.76315789 0.63157895
 0.76923077 0.69230769 0.71794872 0.64102564]

mean value: 0.7055330634278003

key: test_roc_auc
value: [0.5   0.775 0.9   0.775 0.6   0.65  0.375 0.625 0.75  0.625]

mean value: 0.6575

key: train_roc_auc
value: [0.72773279 0.74122807 0.7668691  0.76619433 0.81747638 0.7388664
 0.82051282 0.73076923 0.74358974 0.67948718]

mean value: 0.7532726045883941

key: test_jcc
value: [0.         0.6        0.8        0.66666667 0.2        0.57142857
 0.375      0.5        0.5        0.4       ]

mean value: 0.46130952380952384

key: train_jcc
value: [0.5625     0.56521739 0.60869565 0.61702128 0.6744186  0.54545455
 0.68181818 0.5625     0.58333333 0.5       ]

mean value: 0.5900958985331228

MCC on Blind test: 0.04

Accuracy on Blind test: 0.57

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0090909  0.00947165 0.0086453  0.01064086 0.00960255 0.00955296
 0.00961971 0.00936699 0.00874853 0.00870728]

mean value: 0.009344673156738282

key: score_time
value: [0.01019144 0.00849819 0.00846887 0.00918436 0.00922036 0.00914359
 0.00917006 0.00848866 0.00857306 0.00854564]

mean value: 0.008948421478271485

key: test_mcc
value: [0.39528471 0.47809144 0.8        0.31622777 0.47809144 0.31622777
 0.         0.5        0.         0.57735027]

mean value: 0.3861273396211864

key: train_mcc
value: [0.84537494 0.79217274 0.81836616 0.87035806 0.82046748 0.71613058
 0.84615385 0.8229512  0.84615385 0.8720816 ]

mean value: 0.8250210452339253

key: test_accuracy
value: [0.66666667 0.66666667 0.88888889 0.66666667 0.66666667 0.66666667
 0.5        0.75       0.5        0.75      ]

mean value: 0.6722222222222222

key: train_accuracy
value: [0.92207792 0.8961039  0.90909091 0.93506494 0.90909091 0.85714286
 0.92307692 0.91025641 0.92307692 0.93589744]

mean value: 0.9120879120879121

key: test_fscore
value: [0.4        0.72727273 0.88888889 0.72727273 0.57142857 0.72727273
 0.5        0.75       0.5        0.66666667]

mean value: 0.6458802308802308

key: train_fscore
value: [0.92105263 0.8974359  0.91139241 0.93333333 0.90410959 0.84931507
 0.92307692 0.90666667 0.92307692 0.93670886]

mean value: 0.9106168298525722

key: test_precision
value: [1.         0.57142857 0.8        0.66666667 1.         0.66666667
 0.5        0.75       0.5        1.        ]

mean value: 0.7454761904761905

key: train_precision
value: [0.94594595 0.8974359  0.9        0.94594595 0.94285714 0.88571429
 0.92307692 0.94444444 0.92307692 0.925     ]

mean value: 0.9233497508497509

key: test_recall
value: [0.25 1.   1.   0.8  0.4  0.8  0.5  0.75 0.5  0.5 ]

mean value: 0.65

key: train_recall
value: [0.8974359  0.8974359  0.92307692 0.92105263 0.86842105 0.81578947
 0.92307692 0.87179487 0.92307692 0.94871795]

mean value: 0.8989878542510121

key: test_roc_auc
value: [0.625 0.7   0.9   0.65  0.7   0.65  0.5   0.75  0.5   0.75 ]

mean value: 0.6725

key: train_roc_auc
value: [0.92240216 0.89608637 0.90890688 0.93488529 0.9085695  0.85661269
 0.92307692 0.91025641 0.92307692 0.93589744]

mean value: 0.9119770580296896

key: test_jcc
value: [0.25       0.57142857 0.8        0.57142857 0.4        0.57142857
 0.33333333 0.6        0.33333333 0.5       ]

mean value: 0.4930952380952381

key: train_jcc
value: [0.85365854 0.81395349 0.8372093  0.875      0.825      0.73809524
 0.85714286 0.82926829 0.85714286 0.88095238]

mean value: 0.83674229532993

MCC on Blind test: 0.09

Accuracy on Blind test: 0.54

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.32213855 0.34692574 0.33995461 0.34015656 0.43762326 0.43245387
 0.34698081 0.32564402 0.34216499 0.35122037]

mean value: 0.35852627754211425

key: score_time
value: [0.01188755 0.01189065 0.0119307  0.0119338  0.01207781 0.0119822
 0.01191235 0.01189542 0.01186323 0.01194644]

mean value: 0.011932015419006348

key: test_mcc
value: [0.1        0.47809144 0.8        0.5976143  0.55       0.5976143
 0.         0.5        0.25819889 0.77459667]

mean value: 0.46561156120567954

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.66666667 0.88888889 0.77777778 0.77777778 0.77777778
 0.5        0.75       0.625      0.875     ]

mean value: 0.7194444444444444

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.72727273 0.88888889 0.83333333 0.8        0.83333333
 0.6        0.75       0.57142857 0.85714286]

mean value: 0.7361399711399712

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.57142857 0.8        0.71428571 0.8        0.71428571
 0.5        0.75       0.66666667 1.        ]

mean value: 0.7016666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  1.   1.   1.   0.8  1.   0.75 0.75 0.5  0.75]

mean value: 0.805

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.55  0.7   0.9   0.75  0.775 0.75  0.5   0.75  0.625 0.875]

mean value: 0.7175

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.57142857 0.8        0.71428571 0.66666667 0.71428571
 0.42857143 0.6        0.4        0.75      ]

mean value: 0.5978571428571429

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.55

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01298523 0.01243281 0.00981116 0.00963116 0.00943565 0.009233
 0.00946522 0.00921392 0.0091486  0.00971317]

mean value: 0.010106992721557618

key: score_time
value: [0.01131511 0.00947094 0.0087316  0.00852537 0.00846863 0.00840878
 0.00836754 0.00833654 0.00854063 0.00851274]

mean value: 0.008867788314819335

key: test_mcc
value: [0.55       0.8        0.55       1.         0.8        1.
 0.57735027 0.77459667 0.57735027 1.        ]

mean value: 0.7629297207620735

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.88888889 0.77777778 1.         0.88888889 1.
 0.75       0.875      0.75       1.        ]

mean value: 0.8708333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.88888889 0.75       1.         0.88888889 1.
 0.8        0.85714286 0.66666667 1.        ]

mean value: 0.8601587301587301

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.8        0.75       1.         1.         1.
 0.66666667 1.         1.         1.        ]

mean value: 0.8966666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   0.75 1.   0.8  1.   1.   0.75 0.5  1.  ]

mean value: 0.855

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 0.9   0.775 1.    0.9   1.    0.75  0.875 0.75  1.   ]

mean value: 0.8725

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.8        0.6        1.         0.8        1.
 0.66666667 0.75       0.5        1.        ]

mean value: 0.7716666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08166194 0.08092213 0.08124018 0.0808332  0.08057642 0.08089733
 0.08081889 0.08135033 0.08086824 0.08122468]

mean value: 0.08103933334350585

key: score_time
value: [0.01668835 0.0169456  0.01684809 0.01695967 0.01681328 0.01679158
 0.01694441 0.01681828 0.0169208  0.01684427]

mean value: 0.016857433319091796

key: test_mcc
value: [0.55       1.         0.8        0.31622777 0.35       0.79056942
 0.         0.5        0.25819889 0.77459667]

mean value: 0.5339592740047577

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 1.         0.88888889 0.66666667 0.66666667 0.88888889
 0.5        0.75       0.625      0.875     ]

mean value: 0.7638888888888888

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       1.         0.88888889 0.72727273 0.66666667 0.90909091
 0.6        0.75       0.57142857 0.85714286]

mean value: 0.772049062049062

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       1.         0.8        0.66666667 0.75       0.83333333
 0.5        0.75       0.66666667 1.        ]

mean value: 0.7716666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   1.   0.8  0.6  1.   0.75 0.75 0.5  0.75]

mean value: 0.79

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 1.    0.9   0.65  0.675 0.875 0.5   0.75  0.625 0.875]

mean value: 0.7625000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        1.         0.8        0.57142857 0.5        0.83333333
 0.42857143 0.6        0.4        0.75      ]

mean value: 0.6483333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.01

Accuracy on Blind test: 0.6

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00858474 0.00827479 0.00837827 0.00822592 0.00829554 0.0083642
 0.00828242 0.00841022 0.00844193 0.00838256]

mean value: 0.008364057540893555

key: score_time
value: [0.00838494 0.00850153 0.00845408 0.00845599 0.00838614 0.00832748
 0.00842834 0.00836754 0.00846052 0.00834703]

mean value: 0.008411359786987305

key: test_mcc
value: [ 0.1        -0.05976143  0.55        0.5976143   0.8         0.5976143
 -0.25819889  0.          0.          0.        ]

mean value: 0.2327268289120513

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.44444444 0.77777778 0.77777778 0.88888889 0.77777778
 0.375      0.5        0.5        0.5       ]

mean value: 0.6097222222222223

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.54545455 0.75       0.83333333 0.88888889 0.83333333
 0.44444444 0.5        0.5        0.33333333]

mean value: 0.6128787878787879

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.42857143 0.75       0.71428571 1.         0.71428571
 0.4        0.5        0.5        0.5       ]

mean value: 0.6007142857142858

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.75 0.75 1.   0.8  1.   0.5  0.5  0.5  0.25]

mean value: 0.655

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.55  0.475 0.775 0.75  0.9   0.75  0.375 0.5   0.5   0.5  ]

mean value: 0.6075

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.375      0.6        0.71428571 0.8        0.71428571
 0.28571429 0.33333333 0.33333333 0.2       ]

mean value: 0.4689285714285714

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.03

Accuracy on Blind test: 0.5

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.01921344 1.02071166 1.02129936 1.0181725  1.01626253 1.0281322
 1.03024292 1.02138448 1.02007842 1.02826262]

mean value: 1.022376012802124

key: score_time
value: [0.08772469 0.08730388 0.08674073 0.0869844  0.08715105 0.08721757
 0.08726931 0.0868032  0.14008069 0.09193897]

mean value: 0.09292144775390625

key: test_mcc
value: [0.55       0.8        0.55       0.55       0.15811388 1.
 0.25819889 0.77459667 0.57735027 0.77459667]

mean value: 0.5992856380428173

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.88888889 0.77777778 0.77777778 0.55555556 1.
 0.625      0.875      0.75       0.875     ]

mean value: 0.7902777777777777

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.88888889 0.75       0.8        0.5        1.
 0.66666667 0.85714286 0.66666667 0.85714286]

mean value: 0.7736507936507937

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.8        0.75       0.8        0.66666667 1.
 0.6        1.         1.         1.        ]

mean value: 0.8366666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   0.75 0.8  0.4  1.   0.75 0.75 0.5  0.75]

mean value: 0.745

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 0.9   0.775 0.775 0.575 1.    0.625 0.875 0.75  0.875]

mean value: 0.7925

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.8        0.6        0.66666667 0.33333333 1.
 0.5        0.75       0.5        0.75      ]

mean value: 0.65

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.62

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.8344028  0.83407116 0.86222911 0.86600399 0.8268683  0.86827183
 0.97598505 0.82194805 0.89098978 0.82547593]

mean value: 0.8606245994567872

key: score_time
value: [0.21745658 0.13854885 0.21888685 0.18107748 0.22129059 0.20861673
 0.14700174 0.21403289 0.17643619 0.23291492]

mean value: 0.19562628269195556

key: test_mcc
value: [0.55       0.63245553 0.8        0.55       0.15811388 0.5976143
 0.         0.5        0.25819889 0.77459667]

mean value: 0.4820979278697936

key: train_mcc
value: [0.94935876 0.97435897 0.94935876 0.94929201 0.97434188 0.97434188
 0.94996791 0.94996791 0.97467943 0.97467943]

mean value: 0.9620346953307984

key: test_accuracy
value: [0.77777778 0.77777778 0.88888889 0.77777778 0.55555556 0.77777778
 0.5        0.75       0.625      0.875     ]

mean value: 0.7305555555555555

key: train_accuracy
value: [0.97402597 0.98701299 0.97402597 0.97402597 0.98701299 0.98701299
 0.97435897 0.97435897 0.98717949 0.98717949]

mean value: 0.9806193806193806

key: test_fscore
value: [0.75       0.8        0.88888889 0.8        0.5        0.83333333
 0.6        0.75       0.57142857 0.85714286]

mean value: 0.7350793650793651

key: train_fscore
value: [0.97368421 0.98701299 0.97368421 0.97297297 0.98666667 0.98666667
 0.97368421 0.97368421 0.98701299 0.98701299]

mean value: 0.9802082109450531

key: test_precision
value: [0.75       0.66666667 0.8        0.8        0.66666667 0.71428571
 0.5        0.75       0.66666667 1.        ]

mean value: 0.7314285714285714

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   1.   0.8  0.4  1.   0.75 0.75 0.5  0.75]

mean value: 0.77

key: train_recall
value: [0.94871795 0.97435897 0.94871795 0.94736842 0.97368421 0.97368421
 0.94871795 0.94871795 0.97435897 0.97435897]

mean value: 0.9612685560053981

key: test_roc_auc
value: [0.775 0.8   0.9   0.775 0.575 0.75  0.5   0.75  0.625 0.875]

mean value: 0.7325

key: train_roc_auc
value: [0.97435897 0.98717949 0.97435897 0.97368421 0.98684211 0.98684211
 0.97435897 0.97435897 0.98717949 0.98717949]

mean value: 0.9806342780026991

key: test_jcc
value: [0.6        0.66666667 0.8        0.66666667 0.33333333 0.71428571
 0.42857143 0.6        0.4        0.75      ]

mean value: 0.5959523809523809

key: train_jcc
value: [0.94871795 0.97435897 0.94871795 0.94736842 0.97368421 0.97368421
 0.94871795 0.94871795 0.97435897 0.97435897]

mean value: 0.9612685560053981

MCC on Blind test: 0.05

Accuracy on Blind test: 0.61

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02049661 0.00852871 0.0086658  0.00836873 0.00848603 0.00852489
 0.00861835 0.00847673 0.00877643 0.00880885]

mean value: 0.009775114059448243

key: score_time
value: [0.01595473 0.00848293 0.00887704 0.00862145 0.00873375 0.00857711
 0.00855875 0.00848603 0.00864792 0.00869274]

mean value: 0.009363245964050294

key: test_mcc
value: [ 0.5976143   0.55        0.8         0.15811388  0.15811388  0.05976143
 -0.57735027  0.25819889 -0.25819889  0.37796447]

mean value: 0.2124217704970356

key: train_mcc
value: [0.74021592 0.71670195 0.68898046 0.68898046 0.71987403 0.64957894
 0.72392277 0.66864785 0.72980045 0.76948376]

mean value: 0.7096186585687321

key: test_accuracy
value: [0.77777778 0.77777778 0.88888889 0.55555556 0.55555556 0.55555556
 0.25       0.625      0.375      0.625     ]

mean value: 0.5986111111111111

key: train_accuracy
value: [0.87012987 0.85714286 0.84415584 0.84415584 0.85714286 0.81818182
 0.85897436 0.83333333 0.85897436 0.88461538]

mean value: 0.8526806526806526

key: test_fscore
value: [0.66666667 0.75       0.88888889 0.5        0.5        0.66666667
 0.         0.57142857 0.44444444 0.4       ]

mean value: 0.5388095238095238

key: train_fscore
value: [0.87179487 0.85333333 0.85       0.83783784 0.84507042 0.79411765
 0.84931507 0.82666667 0.84507042 0.88607595]

mean value: 0.8459282219622195

key: test_precision
value: [1.         0.75       0.8        0.66666667 0.66666667 0.57142857
 0.         0.66666667 0.4        1.        ]

mean value: 0.6521428571428571

key: train_precision
value: [0.87179487 0.88888889 0.82926829 0.86111111 0.90909091 0.9
 0.91176471 0.86111111 0.9375     0.875     ]

mean value: 0.8845529890562172

key: test_recall
value: [0.5  0.75 1.   0.4  0.4  0.8  0.   0.5  0.5  0.25]

mean value: 0.51

key: train_recall
value: [0.87179487 0.82051282 0.87179487 0.81578947 0.78947368 0.71052632
 0.79487179 0.79487179 0.76923077 0.8974359 ]

mean value: 0.8136302294197031

key: test_roc_auc
value: [0.75  0.775 0.9   0.575 0.575 0.525 0.25  0.625 0.375 0.625]

mean value: 0.5975

key: train_roc_auc
value: [0.87010796 0.85762483 0.84379217 0.84379217 0.8562753  0.81680162
 0.85897436 0.83333333 0.85897436 0.88461538]

mean value: 0.8524291497975709

key: test_jcc
value: [0.5        0.6        0.8        0.33333333 0.33333333 0.5
 0.         0.4        0.28571429 0.25      ]

mean value: 0.4002380952380952

key: train_jcc
value: [0.77272727 0.74418605 0.73913043 0.72093023 0.73170732 0.65853659
 0.73809524 0.70454545 0.73170732 0.79545455]

mean value: 0.7337020444187082

MCC on Blind test: 0.04

Accuracy on Blind test: 0.56

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.0401094  0.03476858 0.03738213 0.0384171  0.03525496 0.03481197
 0.03531337 0.03826714 0.0395844  0.03738236]

mean value: 0.03712913990020752

key: score_time
value: [0.01210713 0.01098847 0.01114774 0.0102675  0.01021433 0.0108428
 0.01119161 0.01130676 0.01111627 0.01118207]

mean value: 0.01103646755218506

key: test_mcc
value: [0.8        0.8        1.         1.         0.8        1.
 1.         1.         0.57735027 1.        ]

mean value: 0.8977350269189626

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 0.88888889 1.         1.         0.88888889 1.
 1.         1.         0.75       1.        ]

mean value: 0.9416666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.88888889 1.         1.         0.88888889 1.
 1.         1.         0.66666667 1.        ]

mean value: 0.9333333333333333

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 0.8 1.  1.  1.  1.  1.  1.  1.  1. ]

mean value: 0.96

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  1.  0.8 1.  1.  1.  0.5 1. ]

mean value: 0.93

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9  0.9  1.   1.   0.9  1.   1.   1.   0.75 1.  ]

mean value: 0.9450000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8 0.8 1.  1.  0.8 1.  1.  1.  0.5 1. ]

mean value: 0.89

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.71

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02817845 0.04007769 0.06127048 0.04485011 0.04049444 0.0356071
 0.04591703 0.04527164 0.04855704 0.03276825]

mean value: 0.04229922294616699

key: score_time
value: [0.02117968 0.02192545 0.01181006 0.01930475 0.02256536 0.02384424
 0.0226078  0.02120304 0.03123069 0.02361965]

mean value: 0.021929073333740234

key: test_mcc
value: [0.15811388 0.15811388 0.31622777 0.5976143  0.05976143 0.8
 0.57735027 0.5        0.5        1.        ]

mean value: 0.46671815363572183

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.55555556 0.66666667 0.77777778 0.55555556 0.88888889
 0.75       0.75       0.75       1.        ]

mean value: 0.725

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.6        0.6        0.57142857 0.83333333 0.66666667 0.88888889
 0.8        0.75       0.75       1.        ]

mean value: 0.746031746031746

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.5        0.66666667 0.71428571 0.57142857 1.
 0.66666667 0.75       0.75       1.        ]

mean value: 0.7119047619047619

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.75 0.5  1.   0.8  0.8  1.   0.75 0.75 1.  ]

mean value: 0.81

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.575 0.575 0.65  0.75  0.525 0.9   0.75  0.75  0.75  1.   ]

mean value: 0.7225

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.42857143 0.42857143 0.4        0.71428571 0.5        0.8
 0.66666667 0.6        0.6        1.        ]

mean value: 0.6138095238095238

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.03

Accuracy on Blind test: 0.5

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02220154 0.00995994 0.00930357 0.00952268 0.0093174  0.0093205
 0.00861454 0.00954103 0.00942183 0.00962305]

mean value: 0.01068260669708252

key: score_time
value: [0.00999022 0.00950193 0.00910759 0.008986   0.00915051 0.00919533
 0.00878143 0.00928998 0.00907397 0.00929141]

mean value: 0.009236836433410644

key: test_mcc
value: [0.39528471 0.47809144 1.         0.39528471 0.15811388 0.1
 0.         0.25819889 0.25819889 0.        ]

mean value: 0.3043172521278593

key: train_mcc
value: [0.53238866 0.42847503 0.40311986 0.37940693 0.53342348 0.40243088
 0.48717949 0.46291005 0.53846154 0.46291005]

mean value: 0.46307059797406186

key: test_accuracy
value: [0.66666667 0.66666667 1.         0.66666667 0.55555556 0.55555556
 0.5        0.625      0.625      0.5       ]

mean value: 0.6361111111111111

key: train_accuracy
value: [0.76623377 0.71428571 0.7012987  0.68831169 0.76623377 0.7012987
 0.74358974 0.73076923 0.76923077 0.73076923]

mean value: 0.7312021312021312

key: test_fscore
value: [0.4        0.72727273 1.         0.76923077 0.5        0.6
 0.6        0.57142857 0.57142857 0.33333333]

mean value: 0.6072693972693973

key: train_fscore
value: [0.76923077 0.71794872 0.71604938 0.7        0.76923077 0.69333333
 0.74358974 0.72       0.76923077 0.74074074]

mean value: 0.7339354226020893

key: test_precision
value: [1.         0.57142857 1.         0.625      0.66666667 0.6
 0.5        0.66666667 0.66666667 0.5       ]

mean value: 0.6796428571428571

key: train_precision
value: [0.76923077 0.71794872 0.69047619 0.66666667 0.75       0.7027027
 0.74358974 0.75       0.76923077 0.71428571]

mean value: 0.7274131274131275

key: test_recall
value: [0.25 1.   1.   1.   0.4  0.6  0.75 0.5  0.5  0.25]

mean value: 0.625

key: train_recall
value: [0.76923077 0.71794872 0.74358974 0.73684211 0.78947368 0.68421053
 0.74358974 0.69230769 0.76923077 0.76923077]

mean value: 0.7415654520917679

key: test_roc_auc
value: [0.625 0.7   1.    0.625 0.575 0.55  0.5   0.625 0.625 0.5  ]

mean value: 0.6325

key: train_roc_auc
value: [0.76619433 0.71423752 0.70074224 0.68893387 0.76653171 0.70107962
 0.74358974 0.73076923 0.76923077 0.73076923]

mean value: 0.7312078272604589

key: test_jcc
value: [0.25       0.57142857 1.         0.625      0.33333333 0.42857143
 0.42857143 0.4        0.4        0.2       ]

mean value: 0.4636904761904762

key: train_jcc
value: [0.625      0.56       0.55769231 0.53846154 0.625      0.53061224
 0.59183673 0.5625     0.625      0.58823529]

mean value: 0.580433811986333

MCC on Blind test: 0.09

Accuracy on Blind test: 0.51

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01010633 0.01337552 0.01346779 0.01255846 0.01274538 0.01246047
 0.01407218 0.01253891 0.01436639 0.01313567]

mean value: 0.012882709503173828

key: score_time
value: [0.00920749 0.01127768 0.01129055 0.01131392 0.01129127 0.0113771
 0.01156116 0.01149035 0.0113523  0.01129508]

mean value: 0.011145687103271485

key: test_mcc
value: [0.31622777 0.63245553 0.8        0.39528471 0.63245553 0.79056942
 0.57735027 0.25819889 0.5        0.77459667]

mean value: 0.5677138780825602

key: train_mcc
value: [1.         1.         0.80937951 0.6642433  0.97435897 0.74617462
 1.         0.9258201  1.         0.97467943]

mean value: 0.9094655927553618

key: test_accuracy
value: [0.66666667 0.77777778 0.88888889 0.66666667 0.77777778 0.88888889
 0.75       0.625      0.75       0.875     ]

mean value: 0.7666666666666666

key: train_accuracy
value: [1.         1.         0.8961039  0.80519481 0.98701299 0.85714286
 1.         0.96153846 1.         0.98717949]

mean value: 0.9494172494172495

key: test_fscore
value: [0.57142857 0.8        0.88888889 0.76923077 0.75       0.90909091
 0.8        0.66666667 0.75       0.85714286]

mean value: 0.7762448662448662

key: train_fscore
value: [1.         1.         0.90697674 0.83516484 0.98701299 0.87356322
 1.         0.96296296 1.         0.98701299]

mean value: 0.9552693734730623

key: test_precision
value: [0.66666667 0.66666667 0.8        0.625      1.         0.83333333
 0.66666667 0.6        0.75       1.        ]

mean value: 0.7608333333333334

key: train_precision
value: [1.         1.         0.82978723 0.71698113 0.97435897 0.7755102
 1.         0.92857143 1.         1.        ]

mean value: 0.922520897313006

key: test_recall
value: [0.5  1.   1.   1.   0.6  1.   1.   0.75 0.75 0.75]

mean value: 0.835

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.97435897]

mean value: 0.9974358974358974

key: test_roc_auc
value: [0.65  0.8   0.9   0.625 0.8   0.875 0.75  0.625 0.75  0.875]

mean value: 0.765

key: train_roc_auc
value: [1.         1.         0.89473684 0.80769231 0.98717949 0.85897436
 1.         0.96153846 1.         0.98717949]

mean value: 0.9497300944669366

key: test_jcc
value: [0.4        0.66666667 0.8        0.625      0.6        0.83333333
 0.66666667 0.5        0.6        0.75      ]

mean value: 0.6441666666666667

key: train_jcc
value: [1.         1.         0.82978723 0.71698113 0.97435897 0.7755102
 1.         0.92857143 1.         0.97435897]

mean value: 0.9199567947489035

MCC on Blind test: 0.07

Accuracy on Blind test: 0.44

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01249099 0.0119276  0.01191974 0.01216722 0.01207805 0.01217175
 0.01221275 0.01212144 0.0122242  0.01232362]

mean value: 0.012163734436035157

key: score_time
value: [0.0113256  0.01128769 0.01134348 0.01128006 0.01128006 0.01129794
 0.01123834 0.01133847 0.01128244 0.0112834 ]

mean value: 0.011295747756958009

key: test_mcc
value: [0.31622777 0.63245553 0.8        0.39528471 0.63245553 0.5976143
 0.57735027 0.5        0.5        0.77459667]

mean value: 0.5725984780703544

key: train_mcc
value: [1.         0.94804318 0.74445317 0.68442809 0.82485566 0.6642433
 1.         1.         1.         1.        ]

mean value: 0.8866023410978029

key: test_accuracy
value: [0.66666667 0.77777778 0.88888889 0.66666667 0.77777778 0.77777778
 0.75       0.75       0.75       0.875     ]

mean value: 0.7680555555555555

key: train_accuracy
value: [1.         0.97402597 0.85714286 0.81818182 0.90909091 0.80519481
 1.         1.         1.         1.        ]

mean value: 0.9363636363636364

key: test_fscore
value: [0.57142857 0.8        0.88888889 0.76923077 0.75       0.83333333
 0.8        0.75       0.75       0.85714286]

mean value: 0.777002442002442

key: train_fscore
value: [1.         0.97435897 0.87640449 0.84444444 0.90140845 0.83516484
 1.         1.         1.         1.        ]

mean value: 0.9431781199054502

key: test_precision
value: [0.66666667 0.66666667 0.8        0.625      1.         0.71428571
 0.66666667 0.75       0.75       1.        ]

mean value: 0.7639285714285714

key: train_precision
value: [1.         0.97435897 0.78       0.73076923 0.96969697 0.71698113
 1.         1.         1.         1.        ]

mean value: 0.9171806306900646

key: test_recall
value: [0.5  1.   1.   1.   0.6  1.   1.   0.75 0.75 0.75]

mean value: 0.835

key: train_recall
value: [1.         0.97435897 1.         1.         0.84210526 1.
 1.         1.         1.         1.        ]

mean value: 0.9816464237516869

key: test_roc_auc
value: [0.65  0.8   0.9   0.625 0.8   0.75  0.75  0.75  0.75  0.875]

mean value: 0.765

key: train_roc_auc
value: [1.         0.97402159 0.85526316 0.82051282 0.90823212 0.80769231
 1.         1.         1.         1.        ]

mean value: 0.9365721997300944

key: test_jcc
value: [0.4        0.66666667 0.8        0.625      0.6        0.71428571
 0.66666667 0.6        0.6        0.75      ]

mean value: 0.6422619047619047

key: train_jcc
value: [1.         0.95       0.78       0.73076923 0.82051282 0.71698113
 1.         1.         1.         1.        ]

mean value: 0.8998263183357523

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.09537435 0.08456874 0.08424854 0.08335257 0.08354545 0.08079076
 0.08060503 0.08644676 0.08553648 0.08298945]

mean value: 0.08474581241607666

key: score_time
value: [0.01557899 0.01554394 0.01590896 0.01558757 0.01592422 0.01518011
 0.01472735 0.01587462 0.01568294 0.01505804]

mean value: 0.015506672859191894

key: test_mcc
value: [0.55       0.8        0.8        1.         1.         1.
 1.         1.         0.57735027 1.        ]

mean value: 0.8727350269189627

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.88888889 0.88888889 1.         1.         1.
 1.         1.         0.75       1.        ]

mean value: 0.9305555555555556

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.88888889 0.88888889 1.         1.         1.
 1.         1.         0.66666667 1.        ]

mean value: 0.9194444444444445

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75 0.8  0.8  1.   1.   1.   1.   1.   1.   1.  ]

mean value: 0.935

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   1.   1.   1.   1.   1.   1.   0.5  1.  ]

mean value: 0.925

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 0.9   0.9   1.    1.    1.    1.    1.    0.75  1.   ]

mean value: 0.9325

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6 0.8 0.8 1.  1.  1.  1.  1.  0.5 1. ]

mean value: 0.87

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.55

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.0307467  0.02915907 0.04344821 0.03888488 0.04137826 0.04140258
 0.03178215 0.05092216 0.04483032 0.04534054]

mean value: 0.03978948593139649

key: score_time
value: [0.02144599 0.02702022 0.02193284 0.03621364 0.02020788 0.02563
 0.02068305 0.0353334  0.03633976 0.02274179]

mean value: 0.02675485610961914

key: test_mcc
value: [0.55       0.8        0.55       1.         0.8        1.
 1.         1.         0.57735027 1.        ]

mean value: 0.8277350269189626

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.88888889 0.77777778 1.         0.88888889 1.
 1.         1.         0.75       1.        ]

mean value: 0.9083333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.88888889 0.75       1.         0.88888889 1.
 1.         1.         0.66666667 1.        ]

mean value: 0.8944444444444445

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75 0.8  0.75 1.   1.   1.   1.   1.   1.   1.  ]

mean value: 0.93

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   0.75 1.   0.8  1.   1.   1.   0.5  1.  ]

mean value: 0.88

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 0.9   0.775 1.    0.9   1.    1.    1.    0.75  1.   ]

mean value: 0.91

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6 0.8 0.6 1.  0.8 1.  1.  1.  0.5 1. ]

mean value: 0.83

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.01333833 0.01516891 0.0148015  0.01531124 0.01535654 0.01530838
 0.01538754 0.01552606 0.01552677 0.01573038]

mean value: 0.015145564079284668

key: score_time
value: [0.01117516 0.01111007 0.01163745 0.01162291 0.01162505 0.01161265
 0.01158023 0.01157904 0.01167965 0.01166892]

mean value: 0.011529111862182617

key: test_mcc
value: [ 0.05976143  0.35        0.8         0.31622777  0.47809144  0.5976143
 -0.25819889  0.25819889  0.25819889  0.25819889]

mean value: 0.3118092724378834

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.55555556 0.66666667 0.88888889 0.66666667 0.66666667 0.77777778
 0.375      0.625      0.625      0.625     ]

mean value: 0.6472222222222223

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.33333333 0.66666667 0.88888889 0.72727273 0.57142857 0.83333333
 0.44444444 0.66666667 0.57142857 0.57142857]

mean value: 0.6274891774891774

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.6        0.8        0.66666667 1.         0.71428571
 0.4        0.6        0.66666667 0.66666667]

mean value: 0.6614285714285715

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.25 0.75 1.   0.8  0.4  1.   0.5  0.75 0.5  0.5 ]

mean value: 0.645

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.525 0.675 0.9   0.65  0.7   0.75  0.375 0.625 0.625 0.625]

mean value: 0.645

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.2        0.5        0.8        0.57142857 0.4        0.71428571
 0.28571429 0.5        0.4        0.4       ]

mean value: 0.47714285714285715

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.08

Accuracy on Blind test: 0.56

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.15129781 0.11171222 0.17469287 0.15850496 0.1549716  0.13646603
 0.15618372 0.1372323  0.12951589 0.15986323]

mean value: 0.14704406261444092

key: score_time
value: [0.00914025 0.00896955 0.00894713 0.00897336 0.00900054 0.00885558
 0.00901031 0.00896049 0.00905704 0.00884628]

mean value: 0.008976054191589356

key: test_mcc
value: [0.55       0.8        1.         1.         0.8        1.
 0.57735027 0.77459667 0.57735027 1.        ]

mean value: 0.8079297207620735

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.88888889 1.         1.         0.88888889 1.
 0.75       0.875      0.75       1.        ]

mean value: 0.8930555555555555

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.88888889 1.         1.         0.88888889 1.
 0.8        0.85714286 0.66666667 1.        ]

mean value: 0.8851587301587301

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.8        1.         1.         1.         1.
 0.66666667 1.         1.         1.        ]

mean value: 0.9216666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   1.   1.   0.8  1.   1.   0.75 0.5  1.  ]

mean value: 0.88

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.775 0.9   1.    1.    0.9   1.    0.75  0.875 0.75  1.   ]

mean value: 0.895

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.8        1.         1.         0.8        1.
 0.66666667 0.75       0.5        1.        ]

mean value: 0.8116666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01548123 0.01329041 0.0134387  0.01483011 0.01330495 0.0194087
 0.01376367 0.02508187 0.01355028 0.01389837]

mean value: 0.015604829788208008

key: score_time
value: [0.01172805 0.01162148 0.01177287 0.01175046 0.01169348 0.01178837
 0.01441002 0.01195979 0.01457405 0.01157618]

mean value: 0.012287473678588868

key: test_mcc
value: [-0.1        -0.1         0.05976143 -0.05976143  0.1         0.15811388
  0.77459667  0.5        -0.25819889  0.        ]

mean value: 0.10745116625027412

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.44444444 0.44444444 0.55555556 0.44444444 0.55555556 0.55555556
 0.875      0.75       0.375      0.5       ]

mean value: 0.55

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.44444444 0.44444444 0.33333333 0.28571429 0.6        0.5
 0.85714286 0.75       0.28571429 0.5       ]

mean value: 0.5000793650793651

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.4        0.4        0.5        0.5        0.6        0.66666667
 1.         0.75       0.33333333 0.5       ]

mean value: 0.565

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.5  0.25 0.2  0.6  0.4  0.75 0.75 0.25 0.5 ]

mean value: 0.47

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.45  0.45  0.525 0.475 0.55  0.575 0.875 0.75  0.375 0.5  ]

mean value: 0.5525

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.28571429 0.28571429 0.2        0.16666667 0.42857143 0.33333333
 0.75       0.6        0.16666667 0.33333333]

mean value: 0.355

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.03

Accuracy on Blind test: 0.51

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02107096 0.01260209 0.03267598 0.05434585 0.02943563 0.03695607
 0.03151107 0.03148913 0.03227115 0.03163171]

mean value: 0.031398963928222653

key: score_time
value: [0.01179409 0.01174188 0.02450228 0.02844524 0.02454138 0.02180123
 0.02072978 0.02000141 0.02119827 0.0115056 ]

mean value: 0.01962611675262451

key: test_mcc
value: [0.35       0.47809144 0.8        0.79056942 0.55       1.
 0.57735027 0.5        0.5        0.77459667]

mean value: 0.6320607797206962

key: train_mcc
value: [1.         0.97435897 0.97435897 0.97434188 0.97434188 0.97434188
 0.97467943 0.97467943 1.         0.97467943]

mean value: 0.9795781901615055

key: test_accuracy
value: [0.66666667 0.66666667 0.88888889 0.88888889 0.77777778 1.
 0.75       0.75       0.75       0.875     ]

mean value: 0.8013888888888889

key: train_accuracy
value: [1.         0.98701299 0.98701299 0.98701299 0.98701299 0.98701299
 0.98717949 0.98717949 1.         0.98717949]

mean value: 0.9896603396603396

key: test_fscore
value: [0.66666667 0.72727273 0.88888889 0.90909091 0.8        1.
 0.8        0.75       0.75       0.85714286]

mean value: 0.8149062049062049

key: train_fscore
value: [1.         0.98701299 0.98701299 0.98666667 0.98666667 0.98666667
 0.98701299 0.98701299 1.         0.98701299]

mean value: 0.9895064935064934

key: test_precision
value: [0.6        0.57142857 0.8        0.83333333 0.8        1.
 0.66666667 0.75       0.75       1.        ]

mean value: 0.7771428571428571

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 1.   1.   1.   0.8  1.   1.   0.75 0.75 0.75]

mean value: 0.88

key: train_recall
value: [1.         0.97435897 0.97435897 0.97368421 0.97368421 0.97368421
 0.97435897 0.97435897 1.         0.97435897]

mean value: 0.9792847503373819

key: test_roc_auc
value: [0.675 0.7   0.9   0.875 0.775 1.    0.75  0.75  0.75  0.875]

mean value: 0.805

key: train_roc_auc
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:175: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:178: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[1.         0.98717949 0.98717949 0.98684211 0.98684211 0.98684211
 0.98717949 0.98717949 1.         0.98717949]

mean value: 0.989642375168691

key: test_jcc
value: [0.5        0.57142857 0.8        0.83333333 0.66666667 1.
 0.66666667 0.6        0.6        0.75      ]

mean value: 0.6988095238095238

key: train_jcc
value: [1.         0.97435897 0.97435897 0.97368421 0.97368421 0.97368421
 0.97435897 0.97435897 1.         0.97435897]

mean value: 0.9792847503373819

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.08567047 0.17216563 0.17648339 0.17911506 0.1741271  0.1765976
 0.17673564 0.22768736 0.23544955 0.17722917]

mean value: 0.17812609672546387

key: score_time
value: [0.0117352  0.01921391 0.02300906 0.02202415 0.02308655 0.02050304
 0.02041197 0.02305675 0.0117054  0.02008557]

mean value: 0.019483160972595216

key: test_mcc
value: [0.35       0.47809144 0.8        0.79056942 0.55       1.
 0.25819889 0.5        0.         0.77459667]

mean value: 0.5501456417764496

key: train_mcc
value: [1.         0.97435897 0.97435897 0.97434188 0.97434188 0.97434188
 0.8720816  0.97467943 0.79591906 0.97467943]

mean value: 0.948910312364934

key: test_accuracy
value: [0.66666667 0.66666667 0.88888889 0.88888889 0.77777778 1.
 0.625      0.75       0.5        0.875     ]

mean value: 0.7638888888888888

key: train_accuracy
value: [1.         0.98701299 0.98701299 0.98701299 0.98701299 0.98701299
 0.93589744 0.98717949 0.8974359  0.98717949]

mean value: 0.9742757242757243

key: test_fscore
value: [0.66666667 0.72727273 0.88888889 0.90909091 0.8        1.
 0.66666667 0.75       0.6        0.85714286]

mean value: 0.7865728715728716

key: train_fscore
value: [1.         0.98701299 0.98701299 0.98666667 0.98666667 0.98666667
 0.93506494 0.98701299 0.89473684 0.98701299]

mean value: 0.9737853725222145

key: test_precision
value: [0.6        0.57142857 0.8        0.83333333 0.8        1.
 0.6        0.75       0.5        1.        ]

mean value: 0.7454761904761905

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 0.94736842 1.         0.91891892 1.        ]

mean value: 0.986628733997155

key: test_recall
value: [0.75 1.   1.   1.   0.8  1.   0.75 0.75 0.75 0.75]

mean value: 0.855

key: train_recall
value: [1.         0.97435897 0.97435897 0.97368421 0.97368421 0.97368421
 0.92307692 0.97435897 0.87179487 0.97435897]

mean value: 0.9613360323886639

key: test_roc_auc
value: [0.675 0.7   0.9   0.875 0.775 1.    0.625 0.75  0.5   0.875]

mean value: 0.7675

key: train_roc_auc
value: [1.         0.98717949 0.98717949 0.98684211 0.98684211 0.98684211
 0.93589744 0.98717949 0.8974359  0.98717949]

mean value: 0.9742577597840756

key: test_jcc
value: [0.5        0.57142857 0.8        0.83333333 0.66666667 1.
 0.5        0.6        0.42857143 0.75      ]

mean value: 0.665

key: train_jcc
value: [1.         0.97435897 0.97435897 0.97368421 0.97368421 0.97368421
 0.87804878 0.97435897 0.80952381 0.97435897]

mean value: 0.9506061119026459

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02788711 0.02603817 0.02707958 0.02680755 0.02836752 0.02531695
 0.0276475  0.02511096 0.02504849 0.02622795]

mean value: 0.02655317783355713

key: score_time
value: [0.01307678 0.01143813 0.01144481 0.0114646  0.01140261 0.01142645
 0.01154256 0.01154447 0.01145744 0.01143503]

mean value: 0.011623287200927734

key: test_mcc
value: [0.62994079 0.5        0.73214286 0.76376262 0.76376262 0.6000992
 0.66143783 0.75592895 0.33928571 0.73214286]

mean value: 0.6478503420505668

key: train_mcc
value: [0.92737353 0.94117647 0.89869927 0.90025835 0.92791659 0.92791659
 0.88466669 0.89863497 0.89863497 0.91392776]

mean value: 0.9119205184892752

key: test_accuracy
value: [0.8125     0.75       0.86666667 0.86666667 0.86666667 0.8
 0.8        0.86666667 0.66666667 0.86666667]

mean value: 0.81625

key: train_accuracy
value: [0.96323529 0.97058824 0.94890511 0.94890511 0.96350365 0.96350365
 0.94160584 0.94890511 0.94890511 0.95620438]

mean value: 0.9554261485616145

key: test_fscore
value: [0.8        0.75       0.85714286 0.875      0.875      0.76923077
 0.76923077 0.88888889 0.66666667 0.875     ]

mean value: 0.8126159951159951

key: train_fscore
value: [0.96240602 0.97058824 0.94814815 0.94736842 0.96296296 0.96296296
 0.93939394 0.94736842 0.94736842 0.95454545]

mean value: 0.9543112981503075

key: test_precision
value: [0.85714286 0.75       0.85714286 0.77777778 0.77777778 0.83333333
 1.         0.8        0.71428571 0.875     ]

mean value: 0.8242460317460317

key: train_precision
value: [0.98461538 0.97058824 0.96969697 0.984375   0.98484848 0.98484848
 0.96875    0.96923077 0.96923077 0.984375  ]

mean value: 0.977055909776498

key: test_recall
value: [0.75       0.75       0.85714286 1.         1.         0.71428571
 0.625      1.         0.625      0.875     ]

mean value: 0.8196428571428571

key: train_recall
value: [0.94117647 0.97058824 0.92753623 0.91304348 0.94202899 0.94202899
 0.91176471 0.92647059 0.92647059 0.92647059]

mean value: 0.9327578857630009

key: test_roc_auc
value: [0.8125     0.75       0.86607143 0.875      0.875      0.79464286
 0.8125     0.85714286 0.66964286 0.86607143]

mean value: 0.8178571428571428

key: train_roc_auc
value: [0.96323529 0.97058824 0.94906223 0.9491688  0.96366155 0.96366155
 0.9413896  0.94874254 0.94874254 0.95598892]

mean value: 0.9554241261722081

key: test_jcc
value: [0.66666667 0.6        0.75       0.77777778 0.77777778 0.625
 0.625      0.8        0.5        0.77777778]

mean value: 0.69

key: train_jcc
value: [0.92753623 0.94285714 0.90140845 0.9        0.92857143 0.92857143
 0.88571429 0.9        0.9        0.91304348]

mean value: 0.9127702446563438

MCC on Blind test: 0.12

Accuracy on Blind test: 0.66

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.72274685 0.82610178 0.69772458 0.78831792 0.76682043 0.69155788
 0.73042655 0.76333189 0.84150457 0.74985647]

mean value: 0.7578388929367066

key: score_time
value: [0.0148797  0.01483393 0.0148766  0.01480103 0.01492453 0.01183009
 0.02841091 0.01590896 0.01522398 0.01588035]

mean value: 0.016157007217407225

key: test_mcc
value: [0.62994079 0.75       0.73214286 0.875      0.66143783 0.73214286
 0.87287156 1.         0.49099025 0.87287156]

mean value: 0.7617397705319495

key: train_mcc
value: [1.         1.         1.         0.98550418 1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9971011425665239

key: test_accuracy
value: [0.8125     0.875      0.86666667 0.93333333 0.8        0.86666667
 0.93333333 1.         0.73333333 0.93333333]

mean value: 0.8754166666666667

key: train_accuracy
value: [1.         1.         1.         0.99270073 1.         1.
 1.         0.99270073 1.         1.        ]

mean value: 0.9985401459854014

key: test_fscore
value: [0.82352941 0.875      0.85714286 0.93333333 0.82352941 0.85714286
 0.94117647 1.         0.71428571 0.94117647]

mean value: 0.8766316526610645

key: train_fscore
value: [1.         1.         1.         0.99280576 1.         1.
 1.         0.99270073 1.         1.        ]

mean value: 0.998550648532269

key: test_precision
value: [0.77777778 0.875      0.85714286 0.875      0.7        0.85714286
 0.88888889 1.         0.83333333 0.88888889]

mean value: 0.8553174603174603

key: train_precision
value: [1.         1.         1.         0.98571429 1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9971221532091097

key: test_recall
value: [0.875      0.875      0.85714286 1.         1.         0.85714286
 1.         1.         0.625      1.        ]

mean value: 0.9089285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.875      0.86607143 0.9375     0.8125     0.86607143
 0.92857143 1.         0.74107143 0.92857143]

mean value: 0.8767857142857143

key: train_roc_auc
value: [1.         1.         1.         0.99264706 1.         1.
 1.         0.99275362 1.         1.        ]

mean value: 0.9985400682011936

key: test_jcc
value: [0.7        0.77777778 0.75       0.875      0.7        0.75
 0.88888889 1.         0.55555556 0.88888889]

mean value: 0.7886111111111112

key: train_jcc
value: [1.         1.         1.         0.98571429 1.         1.
 1.         0.98550725 1.         1.        ]

mean value: 0.9971221532091097

MCC on Blind test: 0.12

Accuracy on Blind test: 0.67

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01252723 0.01011229 0.00880003 0.00886822 0.00845242 0.00850248
 0.00847101 0.00859404 0.00865054 0.00848961]

mean value: 0.009146785736083985

key: score_time
value: [0.0118022  0.00891519 0.00875592 0.00878024 0.00850177 0.00916266
 0.0085156  0.00849533 0.0085237  0.00861311]

mean value: 0.009006571769714356

key: test_mcc
value: [ 0.48038446 -0.25819889 -0.20044593  0.07142857  0.66143783  0.26189246
  0.20044593  0.04029115 -0.07142857  0.47245559]

mean value: 0.16582626022052996

key: train_mcc
value: [0.4031367  0.44611344 0.40900022 0.49281415 0.38922674 0.4221034
 0.39979063 0.36184606 0.44729758 0.41272853]

mean value: 0.4184057451895615

key: test_accuracy
value: [0.6875     0.4375     0.4        0.53333333 0.8        0.6
 0.6        0.53333333 0.46666667 0.73333333]

mean value: 0.5791666666666666

key: train_accuracy
value: [0.66911765 0.69852941 0.68613139 0.74452555 0.67153285 0.69343066
 0.67883212 0.66423358 0.72262774 0.68613139]

mean value: 0.6915092314297981

key: test_fscore
value: [0.76190476 0.60869565 0.52631579 0.53333333 0.82352941 0.66666667
 0.7        0.63157895 0.5        0.77777778]

mean value: 0.6529802340463263

key: train_fscore
value: [0.73988439 0.75449102 0.74251497 0.76190476 0.73684211 0.74698795
 0.73493976 0.7195122  0.73239437 0.73939394]

mean value: 0.7408865459811903

key: test_precision
value: [0.61538462 0.46666667 0.41666667 0.5        0.7        0.54545455
 0.58333333 0.54545455 0.5        0.7       ]

mean value: 0.5572960372960373

key: train_precision
value: [0.60952381 0.63636364 0.63265306 0.71794872 0.61764706 0.63917526
 0.62244898 0.61458333 0.7027027  0.62886598]

mean value: 0.6421912536625458

key: test_recall
value: [1.         0.875      0.71428571 0.57142857 1.         0.85714286
 0.875      0.75       0.5        0.875     ]

mean value: 0.8017857142857143

key: train_recall
value: [0.94117647 0.92647059 0.89855072 0.8115942  0.91304348 0.89855072
 0.89705882 0.86764706 0.76470588 0.89705882]

mean value: 0.8815856777493606

key: test_roc_auc
value: [0.6875     0.4375     0.41964286 0.53571429 0.8125     0.61607143
 0.58035714 0.51785714 0.46428571 0.72321429]

mean value: 0.5794642857142857

key: train_roc_auc
value: [0.66911765 0.69852941 0.68456948 0.7440324  0.66975703 0.69192242
 0.68041347 0.66570759 0.72293265 0.68765985]

mean value: 0.6914641943734016

key: test_jcc
value: [0.61538462 0.4375     0.35714286 0.36363636 0.7        0.5
 0.53846154 0.46153846 0.33333333 0.63636364]

mean value: 0.4943360805860806

key: train_jcc
value: [0.58715596 0.60576923 0.59047619 0.61538462 0.58333333 0.59615385
 0.58095238 0.56190476 0.57777778 0.58653846]

mean value: 0.588544656159335

MCC on Blind test: -0.0

Accuracy on Blind test: 0.32

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00886059 0.00875497 0.0086782  0.00869155 0.00862312 0.00867844
 0.00865197 0.00868607 0.00863123 0.00867724]

mean value: 0.008693337440490723

key: score_time
value: [0.00846362 0.00860286 0.00848532 0.00850081 0.0084486  0.00854588
 0.00849843 0.00848842 0.00855756 0.00858855]

mean value: 0.008518004417419433

key: test_mcc
value: [ 0.13483997  0.25        0.33928571  0.37796447  0.33928571  0.6000992
 -0.19642857  0.21821789  0.21821789 -0.07142857]

mean value: 0.22100537098371253

key: train_mcc
value: [0.60352881 0.61791438 0.48913043 0.46134236 0.54743126 0.53185944
 0.54864511 0.62041773 0.56235346 0.50525024]

mean value: 0.5487873223363263

key: test_accuracy
value: [0.5625     0.625      0.66666667 0.66666667 0.66666667 0.8
 0.4        0.6        0.6        0.46666667]

mean value: 0.6054166666666667

key: train_accuracy
value: [0.80147059 0.80882353 0.74452555 0.72992701 0.77372263 0.75912409
 0.77372263 0.81021898 0.7810219  0.75182482]

mean value: 0.7734381708887935

key: test_fscore
value: [0.63157895 0.625      0.66666667 0.70588235 0.66666667 0.76923077
 0.4        0.57142857 0.57142857 0.5       ]

mean value: 0.6107882545730843

key: train_fscore
value: [0.79699248 0.80597015 0.74452555 0.72180451 0.77697842 0.73170732
 0.76335878 0.80882353 0.7826087  0.73846154]

mean value: 0.7671230965670979

key: test_precision
value: [0.54545455 0.625      0.625      0.6        0.625      0.83333333
 0.42857143 0.66666667 0.66666667 0.5       ]

mean value: 0.611569264069264

key: train_precision
value: [0.81538462 0.81818182 0.75       0.75       0.77142857 0.83333333
 0.79365079 0.80882353 0.77142857 0.77419355]

mean value: 0.7886424781206565

key: test_recall
value: [0.75       0.625      0.71428571 0.85714286 0.71428571 0.71428571
 0.375      0.5        0.5        0.5       ]

mean value: 0.625

key: train_recall
value: [0.77941176 0.79411765 0.73913043 0.69565217 0.7826087  0.65217391
 0.73529412 0.80882353 0.79411765 0.70588235]

mean value: 0.7487212276214834

key: test_roc_auc
value: [0.5625     0.625      0.66964286 0.67857143 0.66964286 0.79464286
 0.40178571 0.60714286 0.60714286 0.46428571]

mean value: 0.6080357142857142

key: train_roc_auc
value: [0.80147059 0.80882353 0.74456522 0.73017903 0.77365729 0.75991049
 0.77344416 0.81020887 0.78111679 0.7514919 ]

mean value: 0.7734867860187553

key: test_jcc
value: [0.46153846 0.45454545 0.5        0.54545455 0.5        0.625
 0.25       0.4        0.4        0.33333333]

mean value: 0.4469871794871795

key: train_jcc
value: [0.6625     0.675      0.59302326 0.56470588 0.63529412 0.57692308
 0.61728395 0.67901235 0.64285714 0.58536585]

mean value: 0.6231965625549006

MCC on Blind test: 0.05

Accuracy on Blind test: 0.61

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00869894 0.00854826 0.00917315 0.00936198 0.00942206 0.00934601
 0.0093236  0.00952411 0.00928068 0.00939059]

mean value: 0.009206938743591308

key: score_time
value: [0.00957131 0.01183295 0.01011848 0.01026583 0.01015377 0.01021051
 0.01025248 0.0101738  0.01047158 0.0102458 ]

mean value: 0.010329651832580566

key: test_mcc
value: [ 0.37796447  0.13483997  0.6000992   0.6000992   0.60714286  0.33928571
 -0.21821789  0.87287156 -0.05455447  0.19642857]

mean value: 0.34559591828059555

key: train_mcc
value: [0.57359144 0.64705882 0.51856637 0.53294957 0.50427283 0.5182264
 0.678815   0.57663612 0.62163943 0.53314859]

mean value: 0.5704904582687371

key: test_accuracy
value: [0.6875     0.5625     0.8        0.8        0.8        0.66666667
 0.4        0.93333333 0.46666667 0.6       ]

mean value: 0.6716666666666666

key: train_accuracy
value: [0.78676471 0.82352941 0.75912409 0.76642336 0.75182482 0.75912409
 0.83941606 0.78832117 0.81021898 0.76642336]

mean value: 0.7851170030055817

key: test_fscore
value: [0.70588235 0.63157895 0.76923077 0.76923077 0.8        0.66666667
 0.47058824 0.94117647 0.42857143 0.625     ]

mean value: 0.6807925639891584

key: train_fscore
value: [0.78518519 0.82352941 0.76595745 0.77142857 0.76056338 0.76258993
 0.83823529 0.78518519 0.81428571 0.76811594]

mean value: 0.787507605914375

key: test_precision
value: [0.66666667 0.54545455 0.83333333 0.83333333 0.75       0.625
 0.44444444 0.88888889 0.5        0.625     ]

mean value: 0.6712121212121211

key: train_precision
value: [0.79104478 0.82352941 0.75       0.76056338 0.73972603 0.75714286
 0.83823529 0.79104478 0.79166667 0.75714286]

mean value: 0.780009604675249

key: test_recall
value: [0.75       0.75       0.71428571 0.71428571 0.85714286 0.71428571
 0.5        1.         0.375      0.625     ]

mean value: 0.7

key: train_recall
value: [0.77941176 0.82352941 0.7826087  0.7826087  0.7826087  0.76811594
 0.83823529 0.77941176 0.83823529 0.77941176]

mean value: 0.7954177323103154

key: test_roc_auc
value: [0.6875     0.5625     0.79464286 0.79464286 0.80357143 0.66964286
 0.39285714 0.92857143 0.47321429 0.59821429]

mean value: 0.6705357142857142

key: train_roc_auc
value: [0.78676471 0.82352941 0.75895141 0.76630435 0.75159847 0.75905797
 0.8394075  0.78825661 0.81042199 0.76651748]

mean value: 0.785080988917306

key: test_jcc
value: [0.54545455 0.46153846 0.625      0.625      0.66666667 0.5
 0.30769231 0.88888889 0.27272727 0.45454545]

mean value: 0.5347513597513598

key: train_jcc
value: [0.64634146 0.7        0.62068966 0.62790698 0.61363636 0.61627907
 0.72151899 0.64634146 0.68674699 0.62352941]

mean value: 0.6502990379207959

MCC on Blind test: -0.02

Accuracy on Blind test: 0.58

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01127934 0.01113343 0.01129603 0.01077962 0.01127839 0.01124501
 0.01120043 0.01065063 0.01107168 0.01019716]

mean value: 0.011013174057006836

key: score_time
value: [0.00994158 0.00969481 0.00966859 0.00969338 0.0096314  0.01019287
 0.00968385 0.00993729 0.0096488  0.00895643]

mean value: 0.009704899787902833

key: test_mcc
value: [0.62994079 0.25819889 0.73214286 0.60714286 0.60714286 0.6000992
 0.56407607 0.60714286 0.21821789 0.46428571]

mean value: 0.5288389984155754

key: train_mcc
value: [0.86774089 0.85294118 0.83951407 0.81247516 0.84173622 0.88320546
 0.78496269 0.79855228 0.8251228  0.78764184]

mean value: 0.8293892570231083

key: test_accuracy
value: [0.8125     0.625      0.86666667 0.8        0.8        0.8
 0.73333333 0.8        0.6        0.73333333]

mean value: 0.7570833333333333

key: train_accuracy
value: [0.93382353 0.92647059 0.91970803 0.90510949 0.91970803 0.94160584
 0.89051095 0.89781022 0.91240876 0.89051095]

mean value: 0.9137666380420781

key: test_fscore
value: [0.8        0.66666667 0.85714286 0.8        0.8        0.76923077
 0.66666667 0.8        0.57142857 0.75      ]

mean value: 0.7481135531135531

key: train_fscore
value: [0.93430657 0.92647059 0.91970803 0.90225564 0.91729323 0.94202899
 0.88372093 0.89230769 0.91044776 0.88188976]

mean value: 0.9110429191976945

key: test_precision
value: [0.85714286 0.6        0.85714286 0.75       0.75       0.83333333
 1.         0.85714286 0.66666667 0.75      ]

mean value: 0.7921428571428571

key: train_precision
value: [0.92753623 0.92647059 0.92647059 0.9375     0.953125   0.94202899
 0.93442623 0.93548387 0.92424242 0.94915254]

mean value: 0.9356436460953137

key: test_recall
value: [0.75       0.75       0.85714286 0.85714286 0.85714286 0.71428571
 0.5        0.75       0.5        0.75      ]

mean value: 0.7285714285714285

key: train_recall
value: [0.94117647 0.92647059 0.91304348 0.86956522 0.88405797 0.94202899
 0.83823529 0.85294118 0.89705882 0.82352941]

mean value: 0.8888107416879796

key: test_roc_auc
value: [0.8125     0.625      0.86607143 0.80357143 0.80357143 0.79464286
 0.75       0.80357143 0.60714286 0.73214286]

mean value: 0.7598214285714285

key: train_roc_auc
value: [0.93382353 0.92647059 0.91975703 0.90537084 0.91997016 0.94160273
 0.89013214 0.89748508 0.91229753 0.89002558]

mean value: 0.9136935208866155

key: test_jcc
value: [0.66666667 0.5        0.75       0.66666667 0.66666667 0.625
 0.5        0.66666667 0.4        0.6       ]

mean value: 0.6041666666666666

key: train_jcc
value: [0.87671233 0.8630137  0.85135135 0.82191781 0.84722222 0.89041096
 0.79166667 0.80555556 0.83561644 0.78873239]

mean value: 0.8372199423038705

MCC on Blind test: 0.06

Accuracy on Blind test: 0.69

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.52222729 0.468153   0.55469823 0.67369676 0.54588056 0.5346756
 0.54372573 0.65563345 0.51112771 0.51687884]

mean value: 0.5526697158813476

key: score_time
value: [0.01204991 0.01208353 0.01204729 0.01215768 0.01216936 0.01213455
 0.01216435 0.01220655 0.01210856 0.01213312]

mean value: 0.012125492095947266

key: test_mcc
value: [0.62994079 0.62994079 0.875      0.66143783 0.60714286 0.73214286
 0.66143783 0.87287156 0.33928571 0.73214286]

mean value: 0.6741343078887975

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8125     0.8125     0.93333333 0.8        0.8        0.86666667
 0.8        0.93333333 0.66666667 0.86666667]

mean value: 0.8291666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.82352941 0.93333333 0.82352941 0.8        0.85714286
 0.76923077 0.94117647 0.66666667 0.875     ]

mean value: 0.8313138332255979

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.77777778 0.875      0.7        0.75       0.85714286
 1.         0.88888889 0.71428571 0.875     ]

mean value: 0.8215873015873015

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         1.         0.85714286 0.85714286
 0.625      1.         0.625      0.875     ]

mean value: 0.8589285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.8125     0.9375     0.8125     0.80357143 0.86607143
 0.8125     0.92857143 0.66964286 0.86607143]

mean value: 0.8321428571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.7        0.875      0.7        0.66666667 0.75
 0.625      0.88888889 0.5        0.77777778]

mean value: 0.7183333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.7

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01453376 0.01425242 0.01198053 0.01168633 0.01525187 0.01180673
 0.01120734 0.0110929  0.01055717 0.01045299]

mean value: 0.012282204627990723

key: score_time
value: [0.01155376 0.00964212 0.00991273 0.00939679 0.01865244 0.00950599
 0.00897455 0.00899673 0.00857711 0.00861549]

mean value: 0.010382771492004395

key: test_mcc
value: [0.8819171  0.75       0.875      1.         0.76376262 0.73214286
 0.87287156 1.         0.60714286 1.        ]

mean value: 0.8482836994743854

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.93333333 1.         0.86666667 0.86666667
 0.93333333 1.         0.8        1.        ]

mean value: 0.92125

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.875      0.93333333 1.         0.875      0.85714286
 0.94117647 1.         0.8        1.        ]

mean value: 0.9222829131652661

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.875      0.875      1.         0.77777778 0.85714286
 0.88888889 1.         0.85714286 1.        ]

mean value: 0.901984126984127

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9482142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      0.9375     1.         0.875      0.86607143
 0.92857143 1.         0.80357143 1.        ]

mean value: 0.9223214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.77777778 0.875      1.         0.77777778 0.75
 0.88888889 1.         0.66666667 1.        ]

mean value: 0.8625

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08960724 0.08749914 0.08709145 0.0881474  0.08605218 0.08645868
 0.09108973 0.08840013 0.08683872 0.0871675 ]

mean value: 0.08783521652221679

key: score_time
value: [0.01729894 0.01750088 0.01744103 0.01703835 0.01712751 0.01697516
 0.01730871 0.01716542 0.01720405 0.01721263]

mean value: 0.01722726821899414

key: test_mcc
value: [0.62994079 0.75       1.         0.73214286 0.875      0.60714286
 0.66143783 1.         0.49099025 0.87287156]

mean value: 0.7619526144375526

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8125     0.875      1.         0.86666667 0.93333333 0.8
 0.8        1.         0.73333333 0.93333333]

mean value: 0.8754166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.875      1.         0.85714286 0.93333333 0.8
 0.76923077 1.         0.71428571 0.94117647]

mean value: 0.8713698556345615

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.875      1.         0.85714286 0.875      0.75
 1.         1.         0.83333333 0.88888889]

mean value: 0.8857142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.85714286
 0.625      1.         0.625      1.        ]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.875      1.         0.86607143 0.9375     0.80357143
 0.8125     1.         0.74107143 0.92857143]

mean value: 0.8776785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.77777778 1.         0.75       0.875      0.66666667
 0.625      1.         0.55555556 0.88888889]

mean value: 0.7838888888888889

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.04

Accuracy on Blind test: 0.78

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00924444 0.00962591 0.01027131 0.00915217 0.00907969 0.0097084
 0.00920677 0.00894308 0.00980639 0.0091753 ]

mean value: 0.009421348571777344

key: score_time
value: [0.00945544 0.010427   0.00939941 0.00925112 0.00885606 0.00896883
 0.00881791 0.00923467 0.00883985 0.00884008]

mean value: 0.009209036827087402

key: test_mcc
value: [0.51639778 0.51639778 0.76376262 0.60714286 0.56407607 0.73214286
 0.60714286 0.75592895 0.33928571 0.87287156]

mean value: 0.6275149042309094

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.75       0.86666667 0.8        0.73333333 0.86666667
 0.8        0.86666667 0.66666667 0.93333333]

mean value: 0.8033333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.77777778 0.77777778 0.875      0.8        0.77777778 0.85714286
 0.8        0.88888889 0.66666667 0.94117647]

mean value: 0.8162208216619982

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.7        0.77777778 0.75       0.63636364 0.85714286
 0.85714286 0.8        0.71428571 0.88888889]

mean value: 0.7681601731601732

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.85714286
 0.75       1.         0.625      1.        ]

mean value: 0.8839285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.75       0.875      0.80357143 0.75       0.86607143
 0.80357143 0.85714286 0.66964286 0.92857143]

mean value: 0.8053571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63636364 0.63636364 0.77777778 0.66666667 0.63636364 0.75
 0.66666667 0.8        0.5        0.88888889]

mean value: 0.6959090909090909

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.63

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.12276006 1.12847686 1.12388277 1.12958765 1.19054484 1.1342032
 1.2050643  1.20557404 1.12965131 1.12810969]

mean value: 1.1497854709625244

key: score_time
value: [0.08781695 0.08817816 0.09504771 0.09640932 0.08869791 0.09375215
 0.09752321 0.0974381  0.09431314 0.08888292]

mean value: 0.09280595779418946

key: test_mcc
value: [0.8819171  0.8819171  1.         0.60714286 0.875      0.73214286
 0.60714286 1.         0.49099025 0.75592895]

mean value: 0.7832181977854402

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     1.         0.8        0.93333333 0.86666667
 0.8        1.         0.73333333 0.86666667]

mean value: 0.8875

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.93333333 1.         0.8        0.93333333 0.85714286
 0.8        1.         0.71428571 0.88888889]

mean value: 0.8860317460317461

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.75       0.875      0.85714286
 0.85714286 1.         0.83333333 0.8       ]

mean value: 0.8972619047619048

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.85714286
 0.75       1.         0.625      1.        ]

mean value: 0.8839285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     1.         0.80357143 0.9375     0.86607143
 0.80357143 1.         0.74107143 0.85714286]

mean value: 0.8883928571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: test_jcc
value: [0.875      0.875      1.         0.66666667 0.875      0.75
 0.66666667 1.         0.55555556 0.8       ]

mean value: 0.8063888888888889

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.8

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.89038706 0.92174602 0.82130027 0.89599752 0.8950417  0.82764983
 0.90394044 0.864923   0.87637615 0.87166667]

mean value: 0.8769028663635254

key: score_time
value: [0.1252284  0.20888615 0.21349883 0.23671651 0.32386017 0.22654843
 0.23682284 0.18615389 0.17947078 0.20715261]

mean value: 0.2144338607788086

key: test_mcc
value: [0.8819171  0.8819171  0.875      0.73214286 0.875      0.6000992
 0.60714286 0.76376262 0.49099025 0.6000992 ]

mean value: 0.7308071186817022

key: train_mcc
value: [0.98540068 1.         0.98550725 0.98550725 0.98550725 0.97122151
 0.98550418 0.97120941 0.98550418 0.97120941]

mean value: 0.9826571107637088

key: test_accuracy
value: [0.9375     0.9375     0.93333333 0.86666667 0.93333333 0.8
 0.8        0.86666667 0.73333333 0.8       ]

mean value: 0.8608333333333333

key: train_accuracy
value: [0.99264706 1.         0.99270073 0.99270073 0.99270073 0.98540146
 0.99270073 0.98540146 0.99270073 0.98540146]

mean value: 0.991235508802061

key: test_fscore
value: [0.93333333 0.93333333 0.93333333 0.85714286 0.93333333 0.76923077
 0.8        0.85714286 0.71428571 0.82352941]

mean value: 0.8554664942900236

key: train_fscore
value: [0.99259259 1.         0.99270073 0.99270073 0.99270073 0.98529412
 0.99259259 0.98507463 0.99259259 0.98507463]

mean value: 0.9911323338937202

key: test_precision
value: [1.         1.         0.875      0.85714286 0.875      0.83333333
 0.85714286 1.         0.83333333 0.77777778]

mean value: 0.8908730158730158

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875      0.875      1.         0.85714286 1.         0.71428571
 0.75       0.75       0.625      0.875     ]

mean value: 0.8321428571428572

key: train_recall
value: [0.98529412 1.         0.98550725 0.98550725 0.98550725 0.97101449
 0.98529412 0.97058824 0.98529412 0.97058824]

mean value: 0.982459505541347

key: test_roc_auc
value: [0.9375     0.9375     0.9375     0.86607143 0.9375     0.79464286
 0.80357143 0.875      0.74107143 0.79464286]

mean value: 0.8625

key: train_roc_auc
value: [0.99264706 1.         0.99275362 0.99275362 0.99275362 0.98550725
 0.99264706 0.98529412 0.99264706 0.98529412]

mean value: 0.9912297527706735

key: test_jcc
value: [0.875      0.875      0.875      0.75       0.875      0.625
 0.66666667 0.75       0.55555556 0.7       ]

mean value: 0.7547222222222222

key: train_jcc
value: [0.98529412 1.         0.98550725 0.98550725 0.98550725 0.97101449
 0.98529412 0.97058824 0.98529412 0.97058824]

mean value: 0.982459505541347

MCC on Blind test: 0.09

Accuracy on Blind test: 0.78

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02329063 0.00864911 0.00860119 0.00866508 0.00866747 0.00861073
 0.00878024 0.0087285  0.00864959 0.00864077]

mean value: 0.010128331184387208

key: score_time
value: [0.01024389 0.00845623 0.00859714 0.0085454  0.008533   0.00846243
 0.00858235 0.00851011 0.00851059 0.00857377]

mean value: 0.008701491355895995

key: test_mcc
value: [ 0.13483997  0.25        0.33928571  0.37796447  0.33928571  0.6000992
 -0.19642857  0.21821789  0.21821789 -0.07142857]

mean value: 0.22100537098371253

key: train_mcc
value: [0.60352881 0.61791438 0.48913043 0.46134236 0.54743126 0.53185944
 0.54864511 0.62041773 0.56235346 0.50525024]

mean value: 0.5487873223363263

key: test_accuracy
value: [0.5625     0.625      0.66666667 0.66666667 0.66666667 0.8
 0.4        0.6        0.6        0.46666667]

mean value: 0.6054166666666667

key: train_accuracy
value: [0.80147059 0.80882353 0.74452555 0.72992701 0.77372263 0.75912409
 0.77372263 0.81021898 0.7810219  0.75182482]

mean value: 0.7734381708887935

key: test_fscore
value: [0.63157895 0.625      0.66666667 0.70588235 0.66666667 0.76923077
 0.4        0.57142857 0.57142857 0.5       ]

mean value: 0.6107882545730843

key: train_fscore
value: [0.79699248 0.80597015 0.74452555 0.72180451 0.77697842 0.73170732
 0.76335878 0.80882353 0.7826087  0.73846154]

mean value: 0.7671230965670979

key: test_precision
value: [0.54545455 0.625      0.625      0.6        0.625      0.83333333
 0.42857143 0.66666667 0.66666667 0.5       ]

mean value: 0.611569264069264

key: train_precision
value: [0.81538462 0.81818182 0.75       0.75       0.77142857 0.83333333
 0.79365079 0.80882353 0.77142857 0.77419355]

mean value: 0.7886424781206565

key: test_recall
value: [0.75       0.625      0.71428571 0.85714286 0.71428571 0.71428571
 0.375      0.5        0.5        0.5       ]

mean value: 0.625

key: train_recall
value: [0.77941176 0.79411765 0.73913043 0.69565217 0.7826087  0.65217391
 0.73529412 0.80882353 0.79411765 0.70588235]

mean value: 0.7487212276214834

key: test_roc_auc
value: [0.5625     0.625      0.66964286 0.67857143 0.66964286 0.79464286
 0.40178571 0.60714286 0.60714286 0.46428571]

mean value: 0.6080357142857142

key: train_roc_auc
value: [0.80147059 0.80882353 0.74456522 0.73017903 0.77365729 0.75991049
 0.77344416 0.81020887 0.78111679 0.7514919 ]

mean value: 0.7734867860187553

key: test_jcc
value: [0.46153846 0.45454545 0.5        0.54545455 0.5        0.625
 0.25       0.4        0.4        0.33333333]

mean value: 0.4469871794871795

key: train_jcc
value: [0.6625     0.675      0.59302326 0.56470588 0.63529412 0.57692308
 0.61728395 0.67901235 0.64285714 0.58536585]

mean value: 0.6231965625549006

MCC on Blind test: 0.05

Accuracy on Blind test: 0.61

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.081218   0.03844333 0.06259561 0.03818011 0.03826547 0.04282951
 0.04249716 0.04346681 0.0452919  0.04476476]

mean value: 0.04775526523590088

key: score_time
value: [0.01017928 0.010149   0.01186872 0.01010132 0.0101397  0.01045871
 0.01019597 0.01062679 0.01143193 0.01149797]

mean value: 0.010664939880371094

key: test_mcc
value: [0.8819171  1.         1.         1.         0.76376262 0.875
 1.         1.         1.         1.        ]

mean value: 0.952067971951417

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     1.         1.         1.         0.86666667 0.93333333
 1.         1.         1.         1.        ]

mean value: 0.97375

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 1.         1.         1.         0.875      0.93333333
 1.         1.         1.         1.        ]

mean value: 0.9749509803921569

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 1.         1.         1.         0.77777778 0.875
 1.         1.         1.         1.        ]

mean value: 0.9541666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375 1.     1.     1.     0.875  0.9375 1.     1.     1.     1.    ]

mean value: 0.975

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 1.         1.         1.         0.77777778 0.875
 1.         1.         1.         1.        ]

mean value: 0.9541666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.84

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02603173 0.06835532 0.02540469 0.02657223 0.04874587 0.04817772
 0.04732275 0.02142692 0.04089713 0.04774785]

mean value: 0.04006822109222412

key: score_time
value: [0.021065   0.01183319 0.01171088 0.02026939 0.01557493 0.02242637
 0.01170111 0.01167488 0.02084661 0.02236056]

mean value: 0.016946291923522948

key: test_mcc
value: [0.67419986 0.62994079 0.49099025 0.49099025 0.66143783 0.26189246
 0.18898224 0.75592895 0.19642857 0.46428571]

mean value: 0.481507691618567

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8125     0.8125     0.73333333 0.73333333 0.8        0.6
 0.6        0.86666667 0.6        0.73333333]

mean value: 0.7291666666666666

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84210526 0.82352941 0.75       0.75       0.82352941 0.66666667
 0.66666667 0.88888889 0.625      0.75      ]

mean value: 0.7586386308909528

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.72727273 0.77777778 0.66666667 0.66666667 0.7        0.54545455
 0.6        0.8        0.625      0.75      ]

mean value: 0.6858838383838384

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      0.85714286 0.85714286 1.         0.85714286
 0.75       1.         0.625      0.75      ]

mean value: 0.8571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.8125     0.74107143 0.74107143 0.8125     0.61607143
 0.58928571 0.85714286 0.59821429 0.73214286]

mean value: 0.73125

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.72727273 0.7        0.6        0.6        0.7        0.5
 0.5        0.8        0.45454545 0.6       ]

mean value: 0.6181818181818182

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.56

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0242095  0.00912499 0.00889945 0.0089488  0.00893569 0.00852466
 0.0086081  0.00871897 0.00848079 0.00855207]

mean value: 0.010300302505493164

key: score_time
value: [0.01155972 0.00883555 0.00887346 0.0088973  0.00890279 0.00845218
 0.00838852 0.00840425 0.00845933 0.0083673 ]

mean value: 0.008914041519165038

key: test_mcc
value: [ 0.62994079  0.12598816  0.46428571  0.34247476  0.19642857  0.21821789
  0.49099025  0.33928571 -0.07142857  0.07142857]

mean value: 0.28076118489962165

key: train_mcc
value: [0.36800537 0.33827187 0.37278745 0.37278745 0.41939006 0.29950971
 0.34357013 0.31379238 0.34429809 0.40309474]

mean value: 0.3575507254340045

key: test_accuracy
value: [0.8125     0.5625     0.73333333 0.66666667 0.6        0.6
 0.73333333 0.66666667 0.46666667 0.53333333]

mean value: 0.6375

key: train_accuracy
value: [0.68382353 0.66911765 0.68613139 0.68613139 0.7080292  0.64963504
 0.67153285 0.65693431 0.67153285 0.70072993]

mean value: 0.6783598110777158

key: test_fscore
value: [0.82352941 0.58823529 0.71428571 0.54545455 0.57142857 0.625
 0.71428571 0.66666667 0.5        0.53333333]

mean value: 0.6282219251336898

key: train_fscore
value: [0.67669173 0.66666667 0.68148148 0.68148148 0.69230769 0.64705882
 0.67625899 0.65185185 0.68085106 0.68217054]

mean value: 0.6736820325913095

key: test_precision
value: [0.77777778 0.55555556 0.71428571 0.75       0.57142857 0.55555556
 0.83333333 0.71428571 0.5        0.57142857]

mean value: 0.6543650793650794

key: train_precision
value: [0.69230769 0.67164179 0.6969697  0.6969697  0.73770492 0.65671642
 0.66197183 0.65671642 0.65753425 0.72131148]

mean value: 0.6849844184116639

key: test_recall
value: [0.875      0.625      0.71428571 0.42857143 0.57142857 0.71428571
 0.625      0.625      0.5        0.5       ]

mean value: 0.6178571428571429

key: train_recall
value: [0.66176471 0.66176471 0.66666667 0.66666667 0.65217391 0.63768116
 0.69117647 0.64705882 0.70588235 0.64705882]

mean value: 0.6637894288150042

key: test_roc_auc
value: [0.8125     0.5625     0.73214286 0.65178571 0.59821429 0.60714286
 0.74107143 0.66964286 0.46428571 0.53571429]

mean value: 0.6375000000000001

key: train_roc_auc
value: [0.68382353 0.66911765 0.68627451 0.68627451 0.7084399  0.64972293
 0.67167519 0.65686275 0.67178176 0.70034101]

mean value: 0.6784313725490195

key: test_jcc
value: [0.7        0.41666667 0.55555556 0.375      0.4        0.45454545
 0.55555556 0.5        0.33333333 0.36363636]

mean value: 0.4654292929292929

key: train_jcc
value: [0.51136364 0.5        0.51685393 0.51685393 0.52941176 0.47826087
 0.51086957 0.48351648 0.51612903 0.51764706]

mean value: 0.5080906275618744

MCC on Blind test: 0.1

Accuracy on Blind test: 0.58

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01075029 0.0149982  0.0139091  0.01420832 0.01531124 0.01442099
 0.01410794 0.01402354 0.0139544  0.01422453]

mean value: 0.01399085521697998

key: score_time
value: [0.00854301 0.01132798 0.01137161 0.01131582 0.01130557 0.01135063
 0.01129627 0.01128507 0.01138473 0.01128197]

mean value: 0.011046266555786133

key: test_mcc
value: [0.62994079 0.16012815 0.64465837 0.41931393 0.76376262 0.75592895
 0.56407607 0.64465837 0.21821789 0.87287156]

mean value: 0.5673556707125331

key: train_mcc
value: [0.84942274 0.83666003 0.79573573 0.62625207 0.98550418 0.95630861
 0.77016909 0.85060976 0.88920184 0.98550418]

mean value: 0.8545368225433598

key: test_accuracy
value: [0.8125     0.5625     0.8        0.66666667 0.86666667 0.86666667
 0.73333333 0.8        0.6        0.93333333]

mean value: 0.7641666666666667

key: train_accuracy
value: [0.91911765 0.91176471 0.89051095 0.7810219  0.99270073 0.97810219
 0.87591241 0.91970803 0.94160584 0.99270073]

mean value: 0.9203145126663804

key: test_fscore
value: [0.8        0.66666667 0.72727273 0.44444444 0.875      0.83333333
 0.66666667 0.84210526 0.57142857 0.94117647]

mean value: 0.736809414355854

key: train_fscore
value: [0.912      0.91891892 0.88       0.72222222 0.99280576 0.97810219
 0.85950413 0.92517007 0.9375     0.99259259]

mean value: 0.9118815879169054

key: test_precision
value: [0.85714286 0.53846154 1.         1.         0.77777778 1.
 1.         0.72727273 0.66666667 0.88888889]

mean value: 0.8456210456210456

key: train_precision
value: [1.         0.85       0.98214286 1.         0.98571429 0.98529412
 0.98113208 0.86075949 1.         1.        ]

mean value: 0.9645042829646786

key: test_recall
value: [0.75       0.875      0.57142857 0.28571429 1.         0.71428571
 0.5        1.         0.5        1.        ]

mean value: 0.7196428571428571

key: train_recall
value: [0.83823529 1.         0.79710145 0.56521739 1.         0.97101449
 0.76470588 1.         0.88235294 0.98529412]

mean value: 0.8803921568627451

key: test_roc_auc
value: [0.8125     0.5625     0.78571429 0.64285714 0.875      0.85714286
 0.75       0.78571429 0.60714286 0.92857143]

mean value: 0.7607142857142857

key: train_roc_auc
value: [0.91911765 0.91176471 0.89119778 0.7826087  0.99264706 0.97815431
 0.87510656 0.92028986 0.94117647 0.99264706]

mean value: 0.9204710144927536

key: test_jcc
value: [0.66666667 0.5        0.57142857 0.28571429 0.77777778 0.71428571
 0.5        0.72727273 0.4        0.88888889]

mean value: 0.6032034632034632

key: train_jcc
value: [0.83823529 0.85       0.78571429 0.56521739 0.98571429 0.95714286
 0.75362319 0.86075949 0.88235294 0.98529412]

mean value: 0.8464053854893636

MCC on Blind test: 0.1

Accuracy on Blind test: 0.56

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01437926 0.01255703 0.01282096 0.01237679 0.01261806 0.01318884
 0.01328611 0.01358056 0.01302099 0.01280618]

mean value: 0.013063478469848632

key: score_time
value: [0.01148844 0.01132059 0.01194882 0.0113225  0.01146626 0.01135039
 0.01135278 0.01143408 0.0113225  0.01133156]

mean value: 0.011433792114257813

key: test_mcc
value: [0.62994079 0.51639778 0.66143783 0.46428571 0.60714286 0.87287156
 0.875      0.53452248 0.37796447 0.37796447]

mean value: 0.5917527957825026

key: train_mcc
value: [1.         0.79405762 0.9001543  0.82106272 0.82629176 0.92709446
 0.97122151 0.60385237 0.87609014 0.66978117]

mean value: 0.8389606031963133

key: test_accuracy
value: [0.8125     0.75       0.8        0.73333333 0.8        0.93333333
 0.93333333 0.73333333 0.66666667 0.66666667]

mean value: 0.7829166666666667

key: train_accuracy
value: [1.         0.88970588 0.94890511 0.90510949 0.91240876 0.96350365
 0.98540146 0.76642336 0.93430657 0.81021898]

mean value: 0.9115983254615715

key: test_fscore
value: [0.82352941 0.71428571 0.82352941 0.71428571 0.8        0.92307692
 0.93333333 0.8        0.61538462 0.61538462]

mean value: 0.7762809739280327

key: train_fscore
value: [1.         0.87804878 0.95104895 0.8976378  0.91044776 0.96402878
 0.98550725 0.80952381 0.92913386 0.76363636]

mean value: 0.9089013342789495

key: test_precision
value: [0.77777778 0.83333333 0.7        0.71428571 0.75       1.
 1.         0.66666667 0.8        0.8       ]

mean value: 0.8042063492063493

key: train_precision
value: [1.         0.98181818 0.91891892 0.98275862 0.93846154 0.95714286
 0.97142857 0.68       1.         1.        ]

mean value: 0.9430528688459723

key: test_recall
value: [0.875      0.625      1.         0.71428571 0.85714286 0.85714286
 0.875      1.         0.5        0.5       ]

mean value: 0.7803571428571429

key: train_recall
value: [1.         0.79411765 0.98550725 0.82608696 0.88405797 0.97101449
 1.         1.         0.86764706 0.61764706]

mean value: 0.8946078431372549

key: test_roc_auc
value: [0.8125     0.75       0.8125     0.73214286 0.80357143 0.92857143
 0.9375     0.71428571 0.67857143 0.67857143]

mean value: 0.7848214285714286

key: train_roc_auc
value: [1.         0.88970588 0.94863598 0.90569054 0.91261722 0.96344842
 0.98550725 0.76811594 0.93382353 0.80882353]

mean value: 0.9116368286445012

key: test_jcc
value: [0.7        0.55555556 0.7        0.55555556 0.66666667 0.85714286
 0.875      0.66666667 0.44444444 0.44444444]

mean value: 0.646547619047619

key: train_jcc
value: [1.         0.7826087  0.90666667 0.81428571 0.83561644 0.93055556
 0.97142857 0.68       0.86764706 0.61764706]

mean value: 0.8406455759591905

MCC on Blind test: 0.11

Accuracy on Blind test: 0.63

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.1090014  0.09479475 0.09645629 0.09737253 0.09540439 0.09643078
 0.09812617 0.09583139 0.09525943 0.09584975]

mean value: 0.09745268821716309

key: score_time
value: [0.01493382 0.01502466 0.01518154 0.01484466 0.01519299 0.01544476
 0.01564693 0.01502728 0.01509666 0.01492715]

mean value: 0.01513204574584961

key: test_mcc
value: [0.8819171  0.8819171  1.         0.875      0.76376262 0.73214286
 1.         1.         0.76376262 0.87287156]

mean value: 0.8771373857115167

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     1.         0.93333333 0.86666667 0.86666667
 1.         1.         0.86666667 0.93333333]

mean value: 0.9341666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.93333333 1.         0.93333333 0.875      0.85714286
 1.         1.         0.85714286 0.94117647]

mean value: 0.9338305322128851

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 1.         1.         0.875      0.77777778 0.85714286
 1.         1.         1.         0.88888889]

mean value: 0.9287698412698413

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9482142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.9375     1.         0.9375     0.875      0.86607143
 1.         1.         0.875      0.92857143]

mean value: 0.9357142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.875      1.         0.875      0.77777778 0.75
 1.         1.         0.75       0.88888889]

mean value: 0.8805555555555555

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.79

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03332615 0.03480697 0.04867983 0.03268933 0.03992987 0.04574323
 0.03273129 0.03599477 0.03098392 0.04712749]

mean value: 0.03820128440856933

key: score_time
value: [0.01736021 0.0297308  0.04027843 0.02309608 0.03837204 0.02011347
 0.02397585 0.02971005 0.03037763 0.02972603]

mean value: 0.028274059295654297

key: test_mcc
value: [0.8819171  0.8819171  1.         1.         0.60714286 0.73214286
 1.         1.         0.60714286 1.        ]

mean value: 0.8710262778804965

key: train_mcc
value: [1.         1.         1.         0.98550725 0.98550725 0.98550725
 1.         0.98550418 0.98550418 1.        ]

mean value: 0.9927530097707291

key: test_accuracy
value: [0.9375     0.9375     1.         1.         0.8        0.86666667
 1.         1.         0.8        1.        ]

mean value: 0.9341666666666667

key: train_accuracy
value: [1.         1.         1.         0.99270073 0.99270073 0.99270073
 1.         0.99270073 0.99270073 1.        ]

mean value: 0.9963503649635037

key: test_fscore
value: [0.94117647 0.93333333 1.         1.         0.8        0.85714286
 1.         1.         0.8        1.        ]

mean value: 0.9331652661064426

key: train_fscore
value: [1.         1.         1.         0.99270073 0.99270073 0.99270073
 1.         0.99259259 0.99259259 1.        ]

mean value: 0.9963287374966208

key: test_precision
value: [0.88888889 1.         1.         1.         0.75       0.85714286
 1.         1.         0.85714286 1.        ]

mean value: 0.9353174603174603

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         0.85714286 0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9339285714285714

key: train_recall
value: [1.         1.         1.         0.98550725 0.98550725 0.98550725
 1.         0.98529412 0.98529412 1.        ]

mean value: 0.9927109974424553

key: test_roc_auc
value: [0.9375     0.9375     1.         1.         0.80357143 0.86607143
 1.         1.         0.80357143 1.        ]

mean value: 0.9348214285714286

key: train_roc_auc
value: [1.         1.         1.         0.99275362 0.99275362 0.99275362
 1.         0.99264706 0.99264706 1.        ]

mean value: 0.9963554987212276

key: test_jcc
value: [0.88888889 0.875      1.         1.         0.66666667 0.75
 1.         1.         0.66666667 1.        ]

mean value: 0.8847222222222222

key: train_jcc
value: [1.         1.         1.         0.98550725 0.98550725 0.98550725
 1.         0.98529412 0.98529412 1.        ]

mean value: 0.9927109974424553

MCC on Blind test: 0.13

Accuracy on Blind test: 0.86

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.0346148  0.05789852 0.04053473 0.04679036 0.02014375 0.02017975
 0.02011824 0.0213201  0.04365182 0.05004883]

mean value: 0.03553009033203125

key: score_time
value: [0.02155566 0.02402639 0.01512194 0.02006745 0.01230836 0.01237655
 0.01236105 0.02265072 0.02061892 0.0222733 ]

mean value: 0.018336033821105956

key: test_mcc
value: [0.62994079 0.40451992 0.875      0.875      0.76376262 0.46428571
 0.49099025 0.87287156 0.19642857 0.6000992 ]

mean value: 0.6172898619490848

key: train_mcc
value: [0.98540068 0.98540068 0.98550418 0.98550418 0.98550418 0.98550418
 0.98550725 0.98550725 1.         0.98550725]

mean value: 0.9869339808783808

key: test_accuracy
value: [0.8125     0.6875     0.93333333 0.93333333 0.86666667 0.73333333
 0.73333333 0.93333333 0.6        0.8       ]

mean value: 0.8033333333333333

key: train_accuracy
value: [0.99264706 0.99264706 0.99270073 0.99270073 0.99270073 0.99270073
 0.99270073 0.99270073 1.         0.99270073]

mean value: 0.993419922713611

key: test_fscore
value: [0.82352941 0.73684211 0.93333333 0.93333333 0.875      0.71428571
 0.71428571 0.94117647 0.625      0.82352941]

mean value: 0.81203154946189

key: train_fscore
value: [0.99270073 0.99270073 0.99280576 0.99280576 0.99280576 0.99280576
 0.99270073 0.99270073 1.         0.99270073]

mean value: 0.9934726671217771

key: test_precision
value: [0.77777778 0.63636364 0.875      0.875      0.77777778 0.71428571
 0.83333333 0.88888889 0.625      0.77777778]

mean value: 0.7781204906204906

key: train_precision
value: [0.98550725 0.98550725 0.98571429 0.98571429 0.98571429 0.98571429
 0.98550725 0.98550725 1.         0.98550725]

mean value: 0.9870393374741201

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.71428571
 0.625      1.         0.625      0.875     ]

mean value: 0.8589285714285715

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.6875     0.9375     0.9375     0.875      0.73214286
 0.74107143 0.92857143 0.59821429 0.79464286]

mean value: 0.8044642857142857

key: train_roc_auc
value: [0.99264706 0.99264706 0.99264706 0.99264706 0.99264706 0.99264706
 0.99275362 0.99275362 1.         0.99275362]

mean value: 0.9934143222506394

key: test_jcc
value: [0.7        0.58333333 0.875      0.875      0.77777778 0.55555556
 0.55555556 0.88888889 0.45454545 0.7       ]

mean value: 0.6965656565656566

key: train_jcc
value: [0.98550725 0.98550725 0.98571429 0.98571429 0.98571429 0.98571429
 0.98550725 0.98550725 1.         0.98550725]

mean value: 0.9870393374741201

MCC on Blind test: -0.06

Accuracy on Blind test: 0.67

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.2400794  0.23439431 0.23425388 0.2315309  0.26206779 0.24720073
 0.23333836 0.23660707 0.22880101 0.19629478]

mean value: 0.23445682525634765

key: score_time
value: [0.00921178 0.00896645 0.00896502 0.00888395 0.00920105 0.00898576
 0.00909066 0.00913382 0.00911546 0.00896382]

mean value: 0.009051775932312012

key: test_mcc
value: [0.8819171  0.75       1.         1.         0.875      0.73214286
 1.         1.         0.60714286 0.87287156]

mean value: 0.871907437891788

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      1.         1.         0.93333333 0.86666667
 1.         1.         0.8        0.93333333]

mean value: 0.9345833333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.875      1.         1.         0.93333333 0.85714286
 1.         1.         0.8        0.94117647]

mean value: 0.9347829131652661

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.875      1.         1.         0.875      0.85714286
 1.         1.         0.85714286 0.88888889]

mean value: 0.9242063492063491

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9482142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.875      1.         1.         0.9375     0.86607143
 1.         1.         0.80357143 0.92857143]

mean value: 0.9348214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.77777778 1.         1.         0.875      0.75
 1.         1.         0.66666667 0.88888889]

mean value: 0.8847222222222222

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.83

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01696777 0.01680088 0.02019072 0.01669979 0.02372169 0.01697087
 0.01702762 0.016783   0.01727152 0.01699018]

mean value: 0.017942404747009276

key: score_time
value: [0.01224136 0.01189041 0.01206803 0.01199365 0.01204085 0.01464152
 0.0147686  0.01463866 0.01480031 0.0144105 ]

mean value: 0.013349390029907227

key: test_mcc
value: [0.77459667 0.8819171  0.75592895 0.75592895 0.75592895 0.75592895
 0.46770717 1.         0.56407607 0.66143783]

mean value: 0.7373450632934155

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.9375     0.86666667 0.86666667 0.86666667 0.86666667
 0.66666667 1.         0.73333333 0.8       ]

mean value: 0.8479166666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.93333333 0.83333333 0.83333333 0.83333333 0.83333333
 0.54545455 1.         0.66666667 0.76923077]

mean value: 0.8105161505161504

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.875      0.71428571 0.71428571 0.71428571 0.71428571
 0.375      1.         0.5        0.625     ]

mean value: 0.6982142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.9375     0.85714286 0.85714286 0.85714286 0.85714286
 0.6875     1.         0.75       0.8125    ]

mean value: 0.8491071428571428

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.875      0.71428571 0.71428571 0.71428571 0.71428571
 0.375      1.         0.5        0.625     ]

mean value: 0.6982142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.99

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0292964  0.04019451 0.03581357 0.03268957 0.03282976 0.0328927
 0.02812839 0.0422287  0.03068423 0.03335238]

mean value: 0.03381102085113526

key: score_time
value: [0.02089906 0.02192187 0.02337551 0.02003455 0.02072692 0.0224824
 0.02318764 0.01997781 0.02314067 0.01798964]

mean value: 0.021373605728149413

key: test_mcc
value: [0.62994079 0.75       1.         0.76376262 0.66143783 0.87287156
 1.         1.         0.46428571 0.87287156]

mean value: 0.8015170068114487

key: train_mcc
value: [0.97058824 0.97058824 0.97080136 0.97080136 0.97080136 0.97080136
 0.97080136 0.97080136 0.98550418 0.98550418]

mean value: 0.9736993013308314

key: test_accuracy
value: [0.8125     0.875      1.         0.86666667 0.8        0.93333333
 1.         1.         0.73333333 0.93333333]

mean value: 0.8954166666666666

key: train_accuracy
value: [0.98529412 0.98529412 0.98540146 0.98540146 0.98540146 0.98540146
 0.98540146 0.98540146 0.99270073 0.99270073]

mean value: 0.986839845427222

key: test_fscore
value: [0.82352941 0.875      1.         0.875      0.82352941 0.92307692
 1.         1.         0.75       0.94117647]

mean value: 0.901131221719457

key: train_fscore
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 0.99259259 0.99259259]

mean value: 0.9868390641280668

key: test_precision
value: [0.77777778 0.875      1.         0.77777778 0.7        1.
 1.         1.         0.75       0.88888889]

mean value: 0.8769444444444444

key: train_precision
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 1.         1.        ]

mean value: 0.9883205456095482

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9357142857142857

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:195: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_orig.py:198: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.98537936913896

key: test_roc_auc
value: [0.8125     0.875      1.         0.875      0.8125     0.92857143
 1.         1.         0.73214286 0.92857143]

mean value: 0.8964285714285715

key: train_roc_auc
value: [0.98529412 0.98529412 0.98540068 0.98540068 0.98540068 0.98540068
 0.98540068 0.98540068 0.99264706 0.99264706]

mean value: 0.9868286445012788

key: test_jcc
value: [0.7        0.77777778 1.         0.77777778 0.7        0.85714286
 1.         1.         0.6        0.88888889]

mean value: 0.8301587301587301

key: train_jcc
value: [0.97101449 0.97101449 0.97142857 0.97142857 0.97142857 0.97142857
 0.97101449 0.97101449 0.98529412 0.98529412]

mean value: 0.9740360492022896

MCC on Blind test: 0.12

Accuracy on Blind test: 0.67

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.26786852 0.21878886 0.17795444 0.20230484 0.09808445 0.19803524
 0.20462227 0.25141764 0.24799275 0.22460699]

mean value: 0.20916759967803955

key: score_time
value: [0.0210011  0.01702762 0.02308965 0.01346898 0.02083993 0.01430988
 0.01592922 0.01662874 0.02179027 0.02321339]

mean value: 0.01872987747192383

key: test_mcc
value: [0.62994079 0.75       1.         0.76376262 0.76376262 0.87287156
 1.         1.         0.46428571 0.87287156]

mean value: 0.8117494856174312

key: train_mcc
value: [0.97058824 0.97058824 0.97080136 0.97080136 0.98550418 0.97080136
 0.97080136 0.97080136 0.98550418 0.98550418]

mean value: 0.975169582857287

key: test_accuracy
value: [0.8125     0.875      1.         0.86666667 0.86666667 0.93333333
 1.         1.         0.73333333 0.93333333]

mean value: 0.9020833333333333

key: train_accuracy
value: [0.98529412 0.98529412 0.98540146 0.98540146 0.99270073 0.98540146
 0.98540146 0.98540146 0.99270073 0.99270073]

mean value: 0.9875697724345213

key: test_fscore
value: [0.82352941 0.875      1.         0.875      0.875      0.92307692
 1.         1.         0.75       0.94117647]

mean value: 0.9062782805429864

key: train_fscore
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.99280576 0.98550725
 0.98529412 0.98529412 0.99259259 0.99259259]

mean value: 0.9875689150299539

key: test_precision
value: [0.77777778 0.875      1.         0.77777778 0.77777778 1.
 1.         1.         0.75       0.88888889]

mean value: 0.8847222222222222

key: train_precision
value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98571429 0.98550725
 0.98529412 0.98529412 1.         1.        ]

mean value: 0.9883412495432956

key: test_recall
value: [0.875      0.875      1.         1.         1.         0.85714286
 1.         1.         0.75       1.        ]

mean value: 0.9357142857142857

key: train_recall
value: [0.98529412 0.98529412 0.98550725 0.98550725 1.         0.98550725
 0.98529412 0.98529412 0.98529412 0.98529412]

mean value: 0.9868286445012788

key: test_roc_auc
value: [0.8125     0.875      1.         0.875      0.875      0.92857143
 1.         1.         0.73214286 0.92857143]

mean value: 0.9026785714285714

key: train_roc_auc
value: [0.98529412 0.98529412 0.98540068 0.98540068 0.99264706 0.98540068
 0.98540068 0.98540068 0.99264706 0.99264706]

mean value: 0.9875532821824382

key: test_jcc
value: [0.7        0.77777778 1.         0.77777778 0.77777778 0.85714286
 1.         1.         0.6        0.88888889]

mean value: 0.837936507936508

key: train_jcc
value: [0.97101449 0.97101449 0.97142857 0.97142857 0.98571429 0.97142857
 0.97101449 0.97101449 0.98529412 0.98529412]

mean value: 0.975464620630861

MCC on Blind test: 0.12

Accuracy on Blind test: 0.67