LSHTM_analysis/scripts/ml/log_katg_7030.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_7030.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 817

PASS: my_features_df and aa_df successfully combined
nrows: 817
ncols: 269
count of NULL values before imputation

or_mychisq          244
log10_or_mychisq    244
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 168
No. of categorical features: 7

PASS: x_features has no target variable

No. of columns for x_features: 175

-------------------------------------------------------------
Successfully split data with stratification: 70/30
Input features data size: (467, 175)
Train data size: (312, 175)
Test data size: (155, 175)
y_train numbers: Counter({1: 206, 0: 106})
y_train ratio: 0.5145631067961165

y_test_numbers: Counter({1: 103, 0: 52})
y_test ratio: 0.5048543689320388
-------------------------------------------------------------

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True
Original Data
 Counter({1: 206, 0: 106}) Data dim: (312, 175)

Simple Random OverSampling
 Counter({1: 206, 0: 206})
(412, 175)

Simple Random UnderSampling
 Counter({0: 106, 1: 106})
(212, 175)

Simple Combined Over and UnderSampling
 Counter({0: 206, 1: 206})
(412, 175)

SMOTE_NC OverSampling
 Counter({1: 206, 0: 206})
(412, 175)

#####################################################################

Running ML analysis: 70/30 split
Gene name: katG
Drug name: isoniazid

Output directory: /home/tanu/git/Data/isoniazid/output/ml/tts_7030/

Sanity checks:
Total input features: 175

Training data size: (312, 175)
Test data size: (155, 175)

Target feature numbers (training data): Counter({1: 206, 0: 106})
Target features ratio (training data: 0.5145631067961165

Target feature numbers (test data): Counter({1: 103, 0: 52})
Target features ratio (test data): 0.5048543689320388

#####################################################################


================================================================

Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
These are:
 ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03233504 0.0333972  0.05516076 0.03526402 0.05350041 0.03630829
 0.0339694  0.04734373 0.03495312 0.03409076]

mean value: 0.03963227272033691

key: score_time
value: [0.01238108 0.01217294 0.01510072 0.01515746 0.0121963  0.01514292
 0.01524949 0.01520634 0.01533437 0.01805663]

mean value: 0.014599823951721191

key: test_mcc
value: [0.79844727 0.8643122  0.85465477 0.78262379 0.61758068 0.69695062
 0.78625916 0.41684569 0.71818182 0.64116449]

mean value: 0.717702048077937

key: train_mcc
value: [0.81579012 0.85703608 0.86517173 0.84883567 0.81654681 0.84252828
 0.856474   0.8816558  0.83375042 0.87296384]

mean value: 0.8490752749523592

key: test_accuracy
value: [0.90625    0.9375     0.93548387 0.90322581 0.83870968 0.87096774
 0.90322581 0.74193548 0.87096774 0.83870968]

mean value: 0.8746975806451613

key: train_accuracy
value: [0.91785714 0.93571429 0.93950178 0.93238434 0.91814947 0.92882562
 0.93594306 0.94661922 0.9252669  0.9430605 ]

mean value: 0.9323322318251144

key: test_fscore
value: [0.92682927 0.95454545 0.95454545 0.93333333 0.88888889 0.90909091
 0.92682927 0.80952381 0.9        0.88372093]

mean value: 0.9087307316745774

key: train_fscore
value: [0.94025974 0.953125   0.95538058 0.95013123 0.93994778 0.94818653
 0.953125   0.96103896 0.94601542 0.95833333]

mean value: 0.9505543578996442

key: test_precision
value: [0.95       0.91304348 0.91304348 0.875      0.83333333 0.86956522
 0.9047619  0.77272727 0.9        0.82608696]

mean value: 0.8757561641257293

key: train_precision
value: [0.905      0.91959799 0.92857143 0.92346939 0.90909091 0.91044776
 0.92424242 0.92964824 0.90640394 0.92929293]

mean value: 0.9185765012189302

key: test_recall
value: [0.9047619  1.         1.         1.         0.95238095 0.95238095
 0.95       0.85       0.9        0.95      ]

mean value: 0.9459523809523809

key: train_recall
value: [0.97837838 0.98918919 0.98378378 0.97837838 0.97297297 0.98918919
 0.98387097 0.99462366 0.98924731 0.98924731]

mean value: 0.984888113920372

key: test_roc_auc
value: [0.90692641 0.90909091 0.9        0.85       0.77619048 0.82619048
 0.88409091 0.69772727 0.85909091 0.79318182]

mean value: 0.8402489177489177

key: train_roc_auc
value: [0.88918919 0.91038407 0.91897523 0.91106419 0.89273649 0.90084459
 0.91298812 0.92362762 0.89462366 0.92093945]

mean value: 0.907537258714572

key: test_jcc
value: [0.86363636 0.91304348 0.91304348 0.875      0.8        0.83333333
 0.86363636 0.68       0.81818182 0.79166667]

mean value: 0.8351541501976285

key: train_jcc
value: [0.8872549  0.91044776 0.91457286 0.905      0.88669951 0.90147783
 0.91044776 0.925      0.89756098 0.92      ]

mean value: 0.9058461604181686

MCC on Blind test: 0.75

Accuracy on Blind test: 0.89

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.05972672 0.93337655 0.85979128 1.0101614  0.9706459  1.00876999
 0.84041023 0.88820314 0.87697887 0.80039692]

mean value: 0.92484610080719

key: score_time
value: [0.02311707 0.0123024  0.01528835 0.0165627  0.01583934 0.01552248
 0.01548195 0.01584959 0.01543498 0.01588607]

mean value: 0.01612849235534668

key: test_mcc
value: [0.93435318 0.8643122  0.85238095 0.69695062 1.         0.86831345
 0.78625916 0.68174942 0.93048421 0.71390814]

mean value: 0.8328711336410171

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96875    0.9375     0.93548387 0.87096774 1.         0.93548387
 0.90322581 0.83870968 0.96774194 0.87096774]

mean value: 0.922883064516129

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 0.95454545 0.95238095 0.90909091 1.         0.95
 0.92682927 0.86486486 0.97560976 0.9047619 ]

mean value: 0.941369286613189

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.91304348 0.95238095 0.86956522 1.         1.
 0.9047619  0.94117647 0.95238095 0.86363636]

mean value: 0.9396945339400582

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         0.95238095 0.95238095 1.         0.9047619
 0.95       0.8        1.         0.95      ]

mean value: 0.9461904761904761

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 0.90909091 0.92619048 0.82619048 1.         0.95238095
 0.88409091 0.85454545 0.95454545 0.83863636]

mean value: 0.9121861471861472

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 0.91304348 0.90909091 0.83333333 1.         0.9047619
 0.86363636 0.76190476 0.95238095 0.82608696]

mean value: 0.8916619612271786

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.91

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01394224 0.01148248 0.01090741 0.01061201 0.0105617  0.01047516
 0.0093956  0.0100913  0.009547   0.00971317]

mean value: 0.010672807693481445

key: score_time
value: [0.01587558 0.01017666 0.00976372 0.00977969 0.00974298 0.00903916
 0.00903893 0.00935555 0.00905657 0.00896811]

mean value: 0.010079693794250489

key: test_mcc
value: [0.47306844 0.47306844 0.77484502 0.26190476 0.85465477 0.62281846
 0.51793973 0.46277515 0.64116449 0.41684569]

mean value: 0.5499084954077021

key: train_mcc
value: [0.53347571 0.54579802 0.6239648  0.58476019 0.5592603  0.59365116
 0.57062818 0.57127446 0.6024335  0.59689593]

mean value: 0.5782142268403786

key: test_accuracy
value: [0.75       0.75       0.90322581 0.67741935 0.93548387 0.83870968
 0.77419355 0.74193548 0.83870968 0.74193548]

mean value: 0.7951612903225806

key: train_accuracy
value: [0.79285714 0.78214286 0.83274021 0.81494662 0.80071174 0.81850534
 0.80782918 0.80427046 0.82206406 0.82206406]

mean value: 0.8098131672597865

key: test_fscore
value: [0.8        0.8        0.93023256 0.76190476 0.95454545 0.88372093
 0.82051282 0.78947368 0.88372093 0.80952381]

mean value: 0.8433634949302024

key: train_fscore
value: [0.84491979 0.8252149  0.87466667 0.86096257 0.84782609 0.86327078
 0.85483871 0.84931507 0.8655914  0.86772487]

mean value: 0.8554330827502625

key: test_precision
value: [0.84210526 0.84210526 0.90909091 0.76190476 0.91304348 0.86363636
 0.84210526 0.83333333 0.82608696 0.77272727]

mean value: 0.8406138864948933

key: train_precision
value: [0.83597884 0.87804878 0.86315789 0.85185185 0.85245902 0.85638298
 0.85483871 0.86592179 0.8655914  0.85416667]

mean value: 0.8578397920075227

key: test_recall
value: [0.76190476 0.76190476 0.95238095 0.76190476 1.         0.9047619
 0.8        0.75       0.95       0.85      ]

mean value: 0.8492857142857143

key: train_recall
value: [0.85405405 0.77837838 0.88648649 0.87027027 0.84324324 0.87027027
 0.85483871 0.83333333 0.8655914  0.88172043]

mean value: 0.8538186573670445

key: test_roc_auc
value: [0.74458874 0.74458874 0.87619048 0.63095238 0.9        0.80238095
 0.76363636 0.73863636 0.79318182 0.69772727]

mean value: 0.7691883116883117

key: train_roc_auc
value: [0.76386913 0.78392603 0.80782658 0.7893018  0.78099662 0.79451014
 0.78531409 0.79035088 0.80121675 0.79349179]

mean value: 0.7890803813151012

key: test_jcc
value: [0.66666667 0.66666667 0.86956522 0.61538462 0.91304348 0.79166667
 0.69565217 0.65217391 0.79166667 0.68      ]

mean value: 0.7342486064659978

key: train_jcc
value: [0.73148148 0.70243902 0.77725118 0.75586854 0.73584906 0.75943396
 0.74647887 0.73809524 0.76303318 0.76635514]

mean value: 0.7476285681051753

MCC on Blind test: 0.46

Accuracy on Blind test: 0.75

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00983787 0.00968504 0.0097971  0.00964427 0.00997996 0.00982451
 0.01080513 0.01073122 0.01083469 0.01091099]

mean value: 0.010205078125

key: score_time
value: [0.00884271 0.00899935 0.00897145 0.0091238  0.00884986 0.00881553
 0.00971413 0.00975108 0.00984645 0.00914001]

mean value: 0.009205436706542969

key: test_mcc
value: [0.58441558 0.41281273 0.55714286 0.78262379 0.69695062 0.55714286
 0.64203411 0.48992888 0.56537691 0.40572206]

mean value: 0.5694150392239126

key: train_mcc
value: [0.63966715 0.64841162 0.63494589 0.63445555 0.63494589 0.65151226
 0.64020793 0.67456536 0.64134835 0.66575682]

mean value: 0.6465816821661099

key: test_accuracy
value: [0.8125     0.75       0.80645161 0.90322581 0.87096774 0.80645161
 0.83870968 0.77419355 0.80645161 0.74193548]

mean value: 0.8110887096774193

key: train_accuracy
value: [0.84285714 0.84642857 0.83985765 0.83985765 0.83985765 0.84697509
 0.84341637 0.85765125 0.84341637 0.85409253]

mean value: 0.8454410269445857

key: test_fscore
value: [0.85714286 0.82608696 0.85714286 0.93333333 0.90909091 0.85714286
 0.87804878 0.84444444 0.85714286 0.81818182]

mean value: 0.8637757670631477

key: train_fscore
value: [0.88717949 0.88888889 0.88311688 0.88372093 0.88311688 0.88831169
 0.8877551  0.89637306 0.88601036 0.89460154]

mean value: 0.8879074824992776

key: test_precision
value: [0.85714286 0.76       0.85714286 0.875      0.86956522 0.85714286
 0.85714286 0.76       0.81818182 0.75      ]

mean value: 0.8261318464144551

key: train_precision
value: [0.84390244 0.85148515 0.85       0.84653465 0.85       0.855
 0.84466019 0.865      0.855      0.85714286]

mean value: 0.8518725292322202

key: test_recall
value: [0.85714286 0.9047619  0.85714286 1.         0.95238095 0.85714286
 0.9        0.95       0.9        0.9       ]

mean value: 0.9078571428571428

key: train_recall
value: [0.93513514 0.92972973 0.91891892 0.92432432 0.91891892 0.92432432
 0.93548387 0.93010753 0.91935484 0.93548387]

mean value: 0.9271781458878233

key: test_roc_auc
value: [0.79220779 0.67965368 0.77857143 0.85       0.82619048 0.77857143
 0.81363636 0.70227273 0.76818182 0.67727273]

mean value: 0.7666558441558441

key: train_roc_auc
value: [0.79914651 0.80697013 0.80320946 0.80070383 0.80320946 0.8111205
 0.79932088 0.8229485  0.80704584 0.81511036]

mean value: 0.8068785466281222

key: test_jcc
value: [0.75       0.7037037  0.75       0.875      0.83333333 0.75
 0.7826087  0.73076923 0.75       0.69230769]

mean value: 0.7617722655766134

key: train_jcc
value: [0.79723502 0.8        0.79069767 0.79166667 0.79069767 0.79906542
 0.79816514 0.81220657 0.79534884 0.80930233]

mean value: 0.7984385332281427

MCC on Blind test: 0.52

Accuracy on Blind test: 0.79

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00937772 0.0105772  0.01056385 0.01048541 0.01017141 0.01007199
 0.01010633 0.0100348  0.01004696 0.00996089]

mean value: 0.010139656066894532

key: score_time
value: [0.0642364  0.01338673 0.01227045 0.01247025 0.01164246 0.0162518
 0.01173186 0.01165223 0.01177526 0.01154733]

mean value: 0.017696475982666014

key: test_mcc
value: [0.44588745 0.52223297 0.38154231 0.40952381 0.69695062 0.28749445
 0.46277515 0.04139187 0.40572206 0.40572206]

mean value: 0.4059242743051412

key: train_mcc
value: [0.60529458 0.64881553 0.64252679 0.61776405 0.60874123 0.63405604
 0.62561626 0.62320108 0.64874151 0.59936179]

mean value: 0.6254118846679162

key: test_accuracy
value: [0.75       0.78125    0.74193548 0.74193548 0.87096774 0.70967742
 0.74193548 0.61290323 0.74193548 0.74193548]

mean value: 0.7434475806451613

key: train_accuracy
value: [0.82857143 0.84642857 0.84341637 0.83274021 0.82918149 0.83985765
 0.83629893 0.83629893 0.84697509 0.82562278]

mean value: 0.8365391459074734

key: test_fscore
value: [0.80952381 0.85714286 0.81818182 0.80952381 0.90909091 0.8
 0.78947368 0.73913043 0.81818182 0.81818182]

mean value: 0.8168430958819974

key: train_fscore
value: [0.87817259 0.88831169 0.88717949 0.87855297 0.87692308 0.88491049
 0.88020833 0.88265306 0.89002558 0.87338501]

mean value: 0.8820322281681761

key: test_precision
value: [0.80952381 0.75       0.7826087  0.80952381 0.86956522 0.75
 0.83333333 0.65384615 0.75       0.75      ]

mean value: 0.7758401019270584

key: train_precision
value: [0.8277512  0.855      0.84390244 0.84158416 0.83414634 0.83980583
 0.85353535 0.83980583 0.84878049 0.84079602]

mean value: 0.8425107646802061

key: test_recall
value: [0.80952381 1.         0.85714286 0.80952381 0.95238095 0.85714286
 0.75       0.85       0.9        0.9       ]

mean value: 0.8685714285714285

key: train_recall
value: [0.93513514 0.92432432 0.93513514 0.91891892 0.92432432 0.93513514
 0.90860215 0.93010753 0.93548387 0.90860215]

mean value: 0.9255768671897704

key: test_roc_auc
value: [0.72294372 0.68181818 0.67857143 0.7047619  0.82619048 0.62857143
 0.73863636 0.51590909 0.67727273 0.67727273]

mean value: 0.6851948051948051

key: train_roc_auc
value: [0.77809388 0.80953058 0.8009009  0.79279279 0.78507883 0.79569257
 0.8016695  0.79136955 0.80458404 0.78588002]

mean value: 0.7945592669282185

key: test_jcc
value: [0.68       0.75       0.69230769 0.68       0.83333333 0.66666667
 0.65217391 0.5862069  0.69230769 0.69230769]

mean value: 0.6925303886518279

key: train_jcc
value: [0.78280543 0.79906542 0.79723502 0.78341014 0.78082192 0.79357798
 0.78604651 0.78995434 0.80184332 0.77522936]

mean value: 0.7889989436472885

MCC on Blind test: 0.38

Accuracy on Blind test: 0.73

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01544476 0.01519156 0.01600599 0.01545453 0.01499534 0.01443052
 0.014117   0.01572728 0.01578069 0.01611376]

mean value: 0.015326142311096191

key: score_time
value: [0.01091766 0.01000953 0.01041842 0.01047444 0.01041555 0.01081181
 0.01059771 0.01020598 0.01113415 0.01049471]

mean value: 0.010547995567321777

key: test_mcc
value: [0.64764278 0.59458839 0.85465477 0.70992957 0.53924646 0.53924646
 0.79524277 0.48992888 0.56537691 0.48992888]

mean value: 0.6225785881062988

key: train_mcc
value: [0.71193719 0.71193719 0.67923746 0.71282383 0.71427799 0.69833241
 0.6770099  0.73511086 0.68521564 0.70157594]

mean value: 0.7027458404320319

key: test_accuracy
value: [0.84375    0.8125     0.93548387 0.87096774 0.80645161 0.80645161
 0.90322581 0.77419355 0.80645161 0.77419355]

mean value: 0.833366935483871

key: train_accuracy
value: [0.87142857 0.87142857 0.85765125 0.87188612 0.87188612 0.86476868
 0.85765125 0.88256228 0.86120996 0.8683274 ]

mean value: 0.8678800203355364

key: test_fscore
value: [0.88372093 0.875      0.95454545 0.91304348 0.86363636 0.86363636
 0.93023256 0.84444444 0.85714286 0.84444444]

mean value: 0.8829846894482891

key: train_fscore
value: [0.90954774 0.90954774 0.89949749 0.90909091 0.90954774 0.905
 0.9        0.9164557  0.90225564 0.90680101]

mean value: 0.9067743955465448

key: test_precision
value: [0.86363636 0.77777778 0.91304348 0.84       0.82608696 0.82608696
 0.86956522 0.76       0.81818182 0.76      ]

mean value: 0.8254378568291612

key: train_precision
value: [0.84976526 0.84976526 0.84037559 0.85308057 0.84976526 0.84186047
 0.8411215  0.86602871 0.84507042 0.85308057]

mean value: 0.848991359005567

key: test_recall
value: [0.9047619 1.        1.        1.        0.9047619 0.9047619 1.
 0.95      0.9       0.95     ]

mean value: 0.9514285714285714

key: train_recall
value: [0.97837838 0.97837838 0.96756757 0.97297297 0.97837838 0.97837838
 0.96774194 0.97311828 0.96774194 0.96774194]

mean value: 0.973039814007556

key: test_roc_auc
value: [0.81601732 0.72727273 0.9        0.8        0.75238095 0.75238095
 0.86363636 0.70227273 0.76818182 0.70227273]

mean value: 0.7784415584415585

key: train_roc_auc
value: [0.82076814 0.82076814 0.80670045 0.82502815 0.82252252 0.81210586
 0.8049236  0.83919072 0.81018676 0.82071307]

mean value: 0.8182907403371114

key: test_jcc
value: [0.79166667 0.77777778 0.91304348 0.84       0.76       0.76
 0.86956522 0.73076923 0.75       0.73076923]

mean value: 0.7923591601635079

key: train_jcc
value: [0.83410138 0.83410138 0.8173516  0.83333333 0.83410138 0.82648402
 0.81818182 0.84579439 0.82191781 0.82949309]

mean value: 0.8294860203719092

MCC on Blind test: 0.64

Accuracy on Blind test: 0.85

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.38122225 1.35782647 1.14616299 1.49619937 1.3146069  1.23372722
 1.31832862 1.14557314 1.35891151 1.14897418]

mean value: 1.2901532649993896

key: score_time
value: [0.02321529 0.01550245 0.01513743 0.01537633 0.02473903 0.01361775
 0.01516891 0.0152483  0.01541519 0.01537943]

mean value: 0.016880011558532713

key: test_mcc
value: [0.87496729 0.73112616 0.77484502 0.69695062 0.70992957 0.72664126
 0.78625916 0.43636364 0.85909091 0.51793973]

mean value: 0.7114113350240479

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.875      0.90322581 0.87096774 0.87096774 0.87096774
 0.90322581 0.74193548 0.93548387 0.77419355]

mean value: 0.8683467741935483

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95       0.91304348 0.93023256 0.90909091 0.91304348 0.9
 0.92682927 0.8        0.95       0.82051282]

mean value: 0.9012752512557687

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.84       0.90909091 0.86956522 0.84       0.94736842
 0.9047619  0.8        0.95       0.84210526]

mean value: 0.8902891715454644

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9047619  1.         0.95238095 0.95238095 1.         0.85714286
 0.95       0.8        0.95       0.8       ]

mean value: 0.9166666666666666

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95238095 0.81818182 0.87619048 0.82619048 0.8        0.87857143
 0.88409091 0.71818182 0.92954545 0.76363636]

mean value: 0.8446969696969697

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9047619  0.84       0.86956522 0.83333333 0.84       0.81818182
 0.86363636 0.66666667 0.9047619  0.69565217]

mean value: 0.8236559382646339

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.68

Accuracy on Blind test: 0.86

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02104592 0.01962471 0.01438522 0.01636267 0.01385927 0.01335001
 0.01686931 0.01633525 0.01711345 0.0134964 ]

mean value: 0.016244220733642577

key: score_time
value: [0.01261473 0.00928545 0.00903654 0.00882006 0.00878167 0.00882888
 0.00888181 0.00890875 0.0089407  0.00888491]

mean value: 0.009298348426818847

key: test_mcc
value: [0.87496729 0.87496729 0.93048421 0.78625916 1.         0.86831345
 0.85909091 0.85909091 0.79476958 0.72821908]

mean value: 0.8576161893707721

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9375     0.9375     0.96774194 0.90322581 1.         0.93548387
 0.93548387 0.93548387 0.90322581 0.87096774]

mean value: 0.9326612903225806

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95       0.95       0.97560976 0.92682927 1.         0.95
 0.95       0.95       0.92307692 0.90909091]

mean value: 0.9484606856558077

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95       1.         1.
 0.95       0.95       0.94736842 0.83333333]

mean value: 0.9630701754385965

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9047619  0.9047619  0.95238095 0.9047619  1.         0.9047619
 0.95       0.95       0.9        1.        ]

mean value: 0.9371428571428572

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95238095 0.95238095 0.97619048 0.90238095 1.         0.95238095
 0.92954545 0.92954545 0.90454545 0.81818182]

mean value: 0.9317532467532467

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9047619  0.9047619  0.95238095 0.86363636 1.         0.9047619
 0.9047619  0.9047619  0.85714286 0.83333333]

mean value: 0.9030303030303031

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.86

Accuracy on Blind test: 0.94

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10296011 0.1014595  0.10357857 0.10675001 0.1063571  0.10523701
 0.10918045 0.10397196 0.10834646 0.10044217]

mean value: 0.1048283338546753

key: score_time
value: [0.01882553 0.01874995 0.01823163 0.01881695 0.01868272 0.01878595
 0.01914263 0.01873136 0.01930165 0.01734304]

mean value: 0.018661141395568848

key: test_mcc
value: [0.93154098 0.73112616 0.85465477 0.78262379 0.69695062 0.62281846
 0.78625916 0.33300791 0.85909091 0.56697057]

mean value: 0.7165043330318461

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96875    0.875      0.93548387 0.90322581 0.87096774 0.83870968
 0.90322581 0.70967742 0.93548387 0.80645161]

mean value: 0.8746975806451612

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97674419 0.91304348 0.95454545 0.93333333 0.90909091 0.88372093
 0.92682927 0.79069767 0.95       0.86363636]

mean value: 0.9101641597857287

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95454545 0.84       0.91304348 0.875      0.86956522 0.86363636
 0.9047619  0.73913043 0.95       0.79166667]

mean value: 0.8701349520045172

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.95238095 0.9047619
 0.95       0.85       0.95       0.95      ]

mean value: 0.9557142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.81818182 0.9        0.85       0.82619048 0.80238095
 0.88409091 0.65227273 0.92954545 0.74772727]

mean value: 0.8364935064935064

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95454545 0.84       0.91304348 0.875      0.83333333 0.79166667
 0.86363636 0.65384615 0.9047619  0.76      ]

mean value: 0.8389833355050746

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.75

Accuracy on Blind test: 0.89

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00974774 0.00936413 0.00939369 0.00947523 0.00935888 0.0093565
 0.00939965 0.00949359 0.00945926 0.00952435]

mean value: 0.00945730209350586

key: score_time
value: [0.00867534 0.008955   0.00860119 0.00857806 0.00861835 0.00871873
 0.00860906 0.00866818 0.00875998 0.00864625]

mean value: 0.008683013916015624

key: test_mcc
value: [0.71797362 0.57163505 0.31876536 0.36059915 0.30162467 0.26190476
 0.78625916 0.33300791 0.78625916 0.24110987]

mean value: 0.4679138713781422

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.8125     0.70967742 0.74193548 0.67741935 0.67741935
 0.90322581 0.70967742 0.90322581 0.64516129]

mean value: 0.7655241935483871

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.86363636 0.79069767 0.82608696 0.75       0.76190476
 0.92682927 0.79069767 0.92682927 0.71794872]

mean value: 0.8263721594525066

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.86956522 0.82608696 0.77272727 0.76       0.78947368 0.76190476
 0.9047619  0.73913043 0.9047619  0.73684211]

mean value: 0.806525424232518

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 0.9047619  0.80952381 0.9047619  0.71428571 0.76190476
 0.95       0.85       0.95       0.7       ]

mean value: 0.8497619047619047

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83982684 0.77056277 0.6547619  0.65238095 0.65714286 0.63095238
 0.88409091 0.65227273 0.88409091 0.62272727]

mean value: 0.7248809523809524

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.76       0.65384615 0.7037037  0.6        0.61538462
 0.86363636 0.65384615 0.86363636 0.56      ]

mean value: 0.7107386687386688

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.36

Accuracy on Blind test: 0.69

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.40060687 1.47927785 1.49632025 1.38435006 1.42914748 1.39843106
 1.60974932 1.43152332 1.42072606 1.414078  ]

mean value: 1.4464210271835327

key: score_time
value: [0.09651136 0.09766483 0.09662008 0.09672141 0.09631658 0.09969044
 0.11077142 0.09088016 0.09754944 0.0947938 ]

mean value: 0.09775195121765137

key: test_mcc
value: [1.         0.8643122  0.85465477 0.78262379 0.92687157 0.85238095
 0.86243936 0.71390814 0.93048421 0.79524277]

mean value: 0.8582917769162332

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9375     0.93548387 0.90322581 0.96774194 0.93548387
 0.93548387 0.87096774 0.96774194 0.90322581]

mean value: 0.9356854838709677

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95454545 0.95454545 0.93333333 0.97674419 0.95238095
 0.95238095 0.9047619  0.97560976 0.93023256]

mean value: 0.9534534552231659

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.91304348 0.91304348 0.875      0.95454545 0.95238095
 0.90909091 0.86363636 0.95238095 0.86956522]

mean value: 0.9202686805947675

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.95238095
 1.         0.95       1.         1.        ]

mean value: 0.9902380952380953

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.90909091 0.9        0.85       0.95       0.92619048
 0.90909091 0.83863636 0.95454545 0.86363636]

mean value: 0.9101190476190476

key: train_roc_auc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.91304348 0.91304348 0.875      0.95454545 0.90909091
 0.90909091 0.82608696 0.95238095 0.86956522]

mean value: 0.9121847355543008

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.8423326  0.92857409 1.05132771 0.89933038 0.9478097  0.90112305
 0.93446851 1.02135634 0.94023633 0.90203786]

mean value: 1.036859655380249

key: score_time
value: [0.26423621 0.21858096 0.18165159 0.18472934 0.222363   0.13671899
 0.13786745 0.13727784 0.21312284 0.21497941]

mean value: 0.19115276336669923

key: test_mcc
value: [0.93154098 0.8643122  0.85465477 0.69695062 0.85465477 0.69695062
 0.86243936 0.71390814 0.86243936 0.72821908]

mean value: 0.8066069905007348

key: train_mcc
value: [0.93692544 0.95258202 0.9296276  0.94513672 0.94513672 0.94513672
 0.95266247 0.95266247 0.96050414 0.944838  ]

mean value: 0.9465212313571784

key: test_accuracy
value: [0.96875    0.9375     0.93548387 0.87096774 0.93548387 0.87096774
 0.93548387 0.87096774 0.93548387 0.87096774]

mean value: 0.9132056451612903

key: train_accuracy
value: [0.97142857 0.97857143 0.96797153 0.97508897 0.97508897 0.97508897
 0.97864769 0.97864769 0.98220641 0.97508897]

mean value: 0.9757829181494662

key: test_fscore
value: [0.97674419 0.95454545 0.95454545 0.90909091 0.95454545 0.90909091
 0.95238095 0.9047619  0.95238095 0.90909091]

mean value: 0.9377177086479411

key: train_fscore
value: [0.97883598 0.98404255 0.9762533  0.98143236 0.98143236 0.98143236
 0.98412698 0.98412698 0.9867374  0.98153034]

mean value: 0.9819950624201007

key: test_precision
value: [0.95454545 0.91304348 0.91304348 0.86956522 0.91304348 0.86956522
 0.90909091 0.86363636 0.90909091 0.83333333]

mean value: 0.8947957839262187

key: train_precision
value: [0.95854922 0.96858639 0.95360825 0.96354167 0.96354167 0.96354167
 0.96875    0.96875    0.97382199 0.96373057]

mean value: 0.9646421417132145

key: test_recall
value: [1.         1.         1.         0.95238095 1.         0.95238095
 1.         0.95       1.         1.        ]

mean value: 0.9854761904761905

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.90909091 0.9        0.82619048 0.9        0.82619048
 0.90909091 0.83863636 0.90909091 0.81818182]

mean value: 0.8791017316017316

key: train_roc_auc
value: [0.95789474 0.96842105 0.953125   0.96354167 0.96354167 0.96354167
 0.96842105 0.96842105 0.97368421 0.96315789]

mean value: 0.964375

key: test_jcc
value: [0.95454545 0.91304348 0.91304348 0.83333333 0.91304348 0.83333333
 0.90909091 0.82608696 0.90909091 0.83333333]

mean value: 0.8837944664031621

key: train_jcc
value: [0.95854922 0.96858639 0.95360825 0.96354167 0.96354167 0.96354167
 0.96875    0.96875    0.97382199 0.96373057]

mean value: 0.9646421417132145

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02401733 0.00943089 0.00983357 0.01020646 0.00944304 0.01021004
 0.0097549  0.00990272 0.00948167 0.00955868]

mean value: 0.011183929443359376

key: score_time
value: [0.01194978 0.00885653 0.00887227 0.00913167 0.00884962 0.00915885
 0.0091846  0.00890708 0.00913548 0.00879622]

mean value: 0.009284210205078126

key: test_mcc
value: [0.58441558 0.41281273 0.55714286 0.78262379 0.69695062 0.55714286
 0.64203411 0.48992888 0.56537691 0.40572206]

mean value: 0.5694150392239126

key: train_mcc
value: [0.63966715 0.64841162 0.63494589 0.63445555 0.63494589 0.65151226
 0.64020793 0.67456536 0.64134835 0.66575682]

mean value: 0.6465816821661099

key: test_accuracy
value: [0.8125     0.75       0.80645161 0.90322581 0.87096774 0.80645161
 0.83870968 0.77419355 0.80645161 0.74193548]

mean value: 0.8110887096774193

key: train_accuracy
value: [0.84285714 0.84642857 0.83985765 0.83985765 0.83985765 0.84697509
 0.84341637 0.85765125 0.84341637 0.85409253]

mean value: 0.8454410269445857

key: test_fscore
value: [0.85714286 0.82608696 0.85714286 0.93333333 0.90909091 0.85714286
 0.87804878 0.84444444 0.85714286 0.81818182]

mean value: 0.8637757670631477

key: train_fscore
value: [0.88717949 0.88888889 0.88311688 0.88372093 0.88311688 0.88831169
 0.8877551  0.89637306 0.88601036 0.89460154]

mean value: 0.8879074824992776

key: test_precision
value: [0.85714286 0.76       0.85714286 0.875      0.86956522 0.85714286
 0.85714286 0.76       0.81818182 0.75      ]

mean value: 0.8261318464144551

key: train_precision
value: [0.84390244 0.85148515 0.85       0.84653465 0.85       0.855
 0.84466019 0.865      0.855      0.85714286]

mean value: 0.8518725292322202

key: test_recall
value: [0.85714286 0.9047619  0.85714286 1.         0.95238095 0.85714286
 0.9        0.95       0.9        0.9       ]

mean value: 0.9078571428571428

key: train_recall
value: [0.93513514 0.92972973 0.91891892 0.92432432 0.91891892 0.92432432
 0.93548387 0.93010753 0.91935484 0.93548387]

mean value: 0.9271781458878233

key: test_roc_auc
value: [0.79220779 0.67965368 0.77857143 0.85       0.82619048 0.77857143
 0.81363636 0.70227273 0.76818182 0.67727273]

mean value: 0.7666558441558441

key: train_roc_auc
value: [0.79914651 0.80697013 0.80320946 0.80070383 0.80320946 0.8111205
 0.79932088 0.8229485  0.80704584 0.81511036]

mean value: 0.8068785466281222

key: test_jcc
value: [0.75       0.7037037  0.75       0.875      0.83333333 0.75
 0.7826087  0.73076923 0.75       0.69230769]

mean value: 0.7617722655766134

key: train_jcc
value: [0.79723502 0.8        0.79069767 0.79166667 0.79069767 0.79906542
 0.79816514 0.81220657 0.79534884 0.80930233]

mean value: 0.7984385332281427

MCC on Blind test: 0.52

Accuracy on Blind test: 0.79

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09754491 0.05398941 0.05331159 0.07382202 0.05403852 0.07611632
 0.05666065 0.05597878 0.05435252 0.06311226]

mean value: 0.06389269828796387

key: score_time
value: [0.01084304 0.01037145 0.01035023 0.01094937 0.01031685 0.01092744
 0.0106082  0.01048756 0.01035881 0.01045704]

mean value: 0.010566997528076171

key: test_mcc
value: [1.         1.         1.         0.85238095 0.93048421 0.93048421
 0.85909091 0.93048421 0.85909091 0.72821908]

mean value: 0.9090234483012603

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         1.         0.93548387 0.96774194 0.96774194
 0.93548387 0.96774194 0.93548387 0.87096774]

mean value: 0.9580645161290322

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         1.         0.95238095 0.97560976 0.97560976
 0.95       0.97560976 0.95       0.90909091]

mean value: 0.9688301129764545

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95238095 1.         1.
 0.95       0.95238095 0.95       0.83333333]

mean value: 0.9638095238095238

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 0.95238095 0.95238095
 0.95       1.         0.95       1.        ]

mean value: 0.9757142857142856

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         1.         0.92619048 0.97619048 0.97619048
 0.92954545 0.95454545 0.92954545 0.81818182]

mean value: 0.951038961038961

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         1.         0.90909091 0.95238095 0.95238095
 0.9047619  0.95238095 0.9047619  0.83333333]

mean value: 0.9409090909090909

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.88

Accuracy on Blind test: 0.95

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.04327488 0.05089211 0.07965779 0.06433558 0.03197861 0.06568432
 0.03882575 0.05931139 0.10036445 0.07465005]

mean value: 0.06089749336242676

key: score_time
value: [0.01218843 0.02063107 0.02385974 0.01202154 0.01201749 0.01207352
 0.02089715 0.02120233 0.02075076 0.02160144]

mean value: 0.01772434711456299

key: test_mcc
value: [0.93154098 0.79844727 0.78625916 0.77484502 0.78625916 0.64203411
 0.79476958 0.54627358 0.79524277 0.72821908]

mean value: 0.7583890697728666

key: train_mcc
value: [0.9760722  0.9760722  0.98417793 0.98417793 0.98417793 0.99210029
 0.97611544 0.98409734 0.98409734 0.98409734]

mean value: 0.982518594350976

key: test_accuracy
value: [0.96875    0.90625    0.90322581 0.90322581 0.90322581 0.83870968
 0.90322581 0.77419355 0.90322581 0.87096774]

mean value: 0.8875

key: train_accuracy
value: [0.98928571 0.98928571 0.99288256 0.99288256 0.99288256 0.99644128
 0.98932384 0.99288256 0.99288256 0.99288256]

mean value: 0.9921631926792069

key: test_fscore
value: [0.97674419 0.92682927 0.92682927 0.93023256 0.92682927 0.87804878
 0.92307692 0.81081081 0.93023256 0.90909091]

mean value: 0.9138724530670078

key: train_fscore
value: [0.99191375 0.99191375 0.99459459 0.99459459 0.99459459 0.99730458
 0.9919571  0.99462366 0.99462366 0.99462366]

mean value: 0.9940743931555058

key: test_precision
value: [0.95454545 0.95       0.95       0.90909091 0.95       0.9
 0.94736842 0.88235294 0.86956522 0.83333333]

mean value: 0.9146256276590103

key: train_precision
value: [0.98924731 0.98924731 0.99459459 0.99459459 0.99459459 0.99462366
 0.98930481 0.99462366 0.99462366 0.99462366]

mean value: 0.9930077843929837

key: test_recall
value: [1.         0.9047619  0.9047619  0.95238095 0.9047619  0.85714286
 0.9        0.75       1.         1.        ]

mean value: 0.9173809523809524

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.99459459 0.99459459 1.
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9951467596628887

key: test_roc_auc
value: [0.95454545 0.90692641 0.90238095 0.87619048 0.90238095 0.82857143
 0.90454545 0.78409091 0.86363636 0.81818182]

mean value: 0.8741450216450216

key: train_roc_auc
value: [0.98677098 0.98677098 0.99208896 0.99208896 0.99208896 0.99479167
 0.98678551 0.99204867 0.99204867 0.99204867]

mean value: 0.990753204392848

key: test_jcc
value: [0.95454545 0.86363636 0.86363636 0.86956522 0.86363636 0.7826087
 0.85714286 0.68181818 0.86956522 0.83333333]

mean value: 0.8439488048183701

key: train_jcc
value: [0.98395722 0.98395722 0.98924731 0.98924731 0.98924731 0.99462366
 0.98404255 0.98930481 0.98930481 0.98930481]

mean value: 0.9882237021594686

MCC on Blind test: 0.8

Accuracy on Blind test: 0.91

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02474356 0.00953603 0.0093832  0.00936699 0.00912547 0.00907516
 0.00918412 0.00904965 0.00911117 0.0090394 ]

mean value: 0.010761475563049317

key: score_time
value: [0.00934124 0.00899363 0.00895071 0.00882339 0.0085876  0.00857353
 0.00863934 0.00857663 0.00860953 0.00857806]

mean value: 0.008767366409301758

key: test_mcc
value: [0.44588745 0.39072951 0.85465477 0.62281846 0.69695062 0.64203411
 0.71390814 0.4870862  0.64203411 0.48992888]

mean value: 0.5986032237928653

key: train_mcc
value: [0.67373058 0.63158537 0.67624759 0.63494589 0.65206677 0.64299145
 0.6421061  0.65286643 0.69099047 0.6421061 ]

mean value: 0.6539636746974534

key: test_accuracy
value: [0.75       0.71875    0.93548387 0.83870968 0.87096774 0.83870968
 0.87096774 0.77419355 0.83870968 0.77419355]

mean value: 0.8210685483870968

key: train_accuracy
value: [0.85714286 0.83928571 0.85765125 0.83985765 0.84697509 0.84341637
 0.84341637 0.84697509 0.86476868 0.84341637]

mean value: 0.8482905439755973

key: test_fscore
value: [0.80952381 0.7804878  0.95454545 0.88372093 0.90909091 0.87804878
 0.9047619  0.8372093  0.87804878 0.84444444]

mean value: 0.867988212077832

key: train_fscore
value: [0.89637306 0.88372093 0.89637306 0.88311688 0.88772846 0.88601036
 0.88541667 0.88654354 0.9025641  0.88541667]

mean value: 0.8893263721080894

key: test_precision
value: [0.80952381 0.8        0.91304348 0.86363636 0.86956522 0.9
 0.86363636 0.7826087  0.85714286 0.76      ]

mean value: 0.8419156785243742

key: train_precision
value: [0.86069652 0.84653465 0.86069652 0.85       0.85858586 0.85074627
 0.85858586 0.87046632 0.8627451  0.85858586]

mean value: 0.8577642951988248

key: test_recall
value: [0.80952381 0.76190476 1.         0.9047619  0.95238095 0.85714286
 0.95       0.9        0.9        0.95      ]

mean value: 0.8985714285714286

key: train_recall
value: [0.93513514 0.92432432 0.93513514 0.91891892 0.91891892 0.92432432
 0.91397849 0.90322581 0.94623656 0.91397849]

mean value: 0.9234176111595467

key: test_roc_auc
value: [0.72294372 0.6991342  0.9        0.80238095 0.82619048 0.82857143
 0.83863636 0.72272727 0.81363636 0.70227273]

mean value: 0.7856493506493506

key: train_roc_auc
value: [0.82019915 0.79900427 0.82173423 0.80320946 0.81362613 0.80591216
 0.80962083 0.82003396 0.82574986 0.80962083]

mean value: 0.8128710862815277

key: test_jcc
value: [0.68       0.64       0.91304348 0.79166667 0.83333333 0.7826087
 0.82608696 0.72       0.7826087  0.73076923]

mean value: 0.7700117056856187

key: train_jcc
value: [0.81220657 0.79166667 0.81220657 0.79069767 0.79812207 0.79534884
 0.79439252 0.79620853 0.82242991 0.79439252]

mean value: 0.8007671873638894

MCC on Blind test: 0.57

Accuracy on Blind test: 0.81

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01220751 0.01751304 0.01987648 0.01620412 0.01534367 0.02069497
 0.01594377 0.0216136  0.01784015 0.0183599 ]

mean value: 0.01755971908569336

key: score_time
value: [0.00858617 0.01100492 0.01094866 0.01182079 0.01149249 0.01157546
 0.01156306 0.01166034 0.01157475 0.01156259]

mean value: 0.011178922653198243

key: test_mcc
value: [0.93154098 0.8643122  0.76041521 0.32857143 0.71269665 0.6681531
 0.71818182 0.73603286 0.79476958 0.57727273]

mean value: 0.709194655123876

key: train_mcc
value: [0.92133213 0.98411246 0.94698073 0.52715368 0.74626689 0.83439572
 0.8969264  1.         0.95325088 0.96021134]

mean value: 0.8770630246730166

key: test_accuracy
value: [0.96875    0.9375     0.87096774 0.58064516 0.83870968 0.80645161
 0.87096774 0.87096774 0.90322581 0.80645161]

mean value: 0.8454637096774194

key: train_accuracy
value: [0.96428571 0.99285714 0.97508897 0.69039146 0.86476868 0.91459075
 0.95373665 1.         0.97864769 0.98220641]

mean value: 0.9316573462125064

key: test_fscore
value: [0.97674419 0.95454545 0.89473684 0.58064516 0.86486486 0.83333333
 0.9        0.89473684 0.92307692 0.85      ]

mean value: 0.8672683607367936

key: train_fscore
value: [0.97368421 0.99462366 0.98071625 0.69257951 0.88690476 0.93063584
 0.96495957 1.         0.98369565 0.98666667]

mean value: 0.9394466112812958

key: test_precision
value: [0.95454545 0.91304348 1.         0.9        1.         1.
 0.9        0.94444444 0.94736842 0.85      ]

mean value: 0.9409401798303401

key: train_precision
value: [0.94871795 0.98930481 1.         1.         0.98675497 1.
 0.96756757 1.         0.99450549 0.97883598]

mean value: 0.9865686769348632

key: test_recall
value: [1.         1.         0.80952381 0.42857143 0.76190476 0.71428571
 0.9        0.85       0.9        0.85      ]

mean value: 0.8214285714285714

key: train_recall
value: [1.         1.         0.96216216 0.52972973 0.80540541 0.87027027
 0.96236559 1.         0.97311828 0.99462366]

mean value: 0.9097675094449288

key: test_roc_auc
value: [0.95454545 0.90909091 0.9047619  0.66428571 0.88095238 0.85714286
 0.85909091 0.87954545 0.90454545 0.78863636]

mean value: 0.8602597402597403

key: train_roc_auc
value: [0.94736842 0.98947368 0.98108108 0.76486486 0.89228604 0.93513514
 0.94960385 1.         0.98129598 0.9762592 ]

mean value: 0.941736824897903

key: test_jcc
value: [0.95454545 0.91304348 0.80952381 0.40909091 0.76190476 0.71428571
 0.81818182 0.80952381 0.85714286 0.73913043]

mean value: 0.7786373047242613

key: train_jcc
value: [0.94871795 0.98930481 0.96216216 0.52972973 0.79679144 0.87027027
 0.93229167 1.         0.96791444 0.97368421]

mean value: 0.8970866683260259

MCC on Blind test: 0.57

Accuracy on Blind test: 0.75

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01702142 0.01537132 0.01510477 0.01628637 0.01565433 0.0167551
 0.0152967  0.0162673  0.01603389 0.01621771]

mean value: 0.016000890731811525

key: score_time
value: [0.0117135  0.01158285 0.01149917 0.01162124 0.01162434 0.01160312
 0.0116601  0.01162314 0.01154995 0.01157784]

mean value: 0.011605525016784668

key: test_mcc
value: [0.93154098 0.79772404 0.67215385 0.85465477 0.51176632 0.78625916
 0.78625916 0.5375332  0.85909091 0.79524277]

mean value: 0.7532225152872797

key: train_mcc
value: [0.9284967  0.83638515 0.85422716 0.9218965  0.68288051 0.98417793
 0.91545327 0.89775184 0.93617969 0.9136949 ]

mean value: 0.8871143639501542

key: test_accuracy
value: [0.96875    0.90625    0.83870968 0.93548387 0.67741935 0.90322581
 0.90322581 0.74193548 0.93548387 0.90322581]

mean value: 0.8713709677419355

key: train_accuracy
value: [0.96785714 0.925      0.92882562 0.96441281 0.81494662 0.99288256
 0.96085409 0.95017794 0.97153025 0.96085409]

mean value: 0.9437341128622267

key: test_fscore
value: [0.97674419 0.93333333 0.87179487 0.95454545 0.6875     0.92682927
 0.92682927 0.76470588 0.95       0.93023256]

mean value: 0.8922514822798013

key: train_fscore
value: [0.97612732 0.94629156 0.94350282 0.97368421 0.83647799 0.99459459
 0.96986301 0.96089385 0.97860963 0.97127937]

mean value: 0.9551324365942089

key: test_precision
value: [0.95454545 0.875      0.94444444 0.91304348 1.         0.95
 0.9047619  0.92857143 0.95       0.86956522]

mean value: 0.9289931927975406

key: train_precision
value: [0.95833333 0.89805825 0.98816568 0.94871795 1.         0.99459459
 0.98882682 1.         0.97340426 0.94416244]

mean value: 0.9694263317056264

key: test_recall
value: [1.         1.         0.80952381 1.         0.52380952 0.9047619
 0.95       0.65       0.95       1.        ]

mean value: 0.8788095238095238

key: train_recall
value: [0.99459459 1.         0.9027027  1.         0.71891892 0.99459459
 0.9516129  0.92473118 0.98387097 1.        ]

mean value: 0.9471025864574252

key: test_roc_auc
value: [0.95454545 0.86363636 0.8547619  0.9        0.76190476 0.90238095
 0.88409091 0.77954545 0.92954545 0.86363636]

mean value: 0.8694047619047619

key: train_roc_auc
value: [0.95519203 0.88947368 0.94093468 0.94791667 0.85945946 0.99208896
 0.96528014 0.96236559 0.96561969 0.94210526]

mean value: 0.9420436177901161

key: test_jcc
value: [0.95454545 0.875      0.77272727 0.91304348 0.52380952 0.86363636
 0.86363636 0.61904762 0.9047619  0.86956522]

mean value: 0.8159773197816677

key: train_jcc
value: [0.95336788 0.89805825 0.89304813 0.94871795 0.71891892 0.98924731
 0.94148936 0.92473118 0.95811518 0.94416244]

mean value: 0.9169856600174047

MCC on Blind test: 0.7

Accuracy on Blind test: 0.86

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.14682198 0.13001108 0.12968946 0.130548   0.13054299 0.13108706
 0.13012147 0.14023495 0.13049984 0.13098478]

mean value: 0.13305416107177734

key: score_time
value: [0.01479316 0.0149045  0.01485991 0.01515651 0.01488829 0.01515222
 0.01562381 0.01490235 0.0150342  0.01483464]

mean value: 0.015014958381652833

key: test_mcc
value: [0.93435318 0.93435318 1.         0.92687157 0.93048421 0.93048421
 0.93048421 0.85909091 0.93048421 0.72821908]

mean value: 0.9104824771523825

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96875    0.96875    1.         0.96774194 0.96774194 0.96774194
 0.96774194 0.93548387 0.96774194 0.87096774]

mean value: 0.958266129032258

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 0.97560976 1.         0.97674419 0.97560976 0.97560976
 0.97560976 0.95       0.97560976 0.90909091]

mean value: 0.9689493631722786

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95454545 1.         1.
 0.95238095 0.95       0.95238095 0.83333333]

mean value: 0.9642640692640693

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 0.95238095 1.         1.         0.95238095 0.95238095
 1.         0.95       1.         1.        ]

mean value: 0.9759523809523809

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 0.97619048 1.         0.95       0.97619048 0.97619048
 0.95454545 0.92954545 0.95454545 0.81818182]

mean value: 0.9511580086580086

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 0.95238095 1.         0.95454545 0.95238095 0.95238095
 0.95238095 0.9047619  0.95238095 0.83333333]

mean value: 0.9406926406926407

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04166436 0.03705072 0.04192185 0.03513145 0.04678941 0.04064345
 0.04421425 0.05140901 0.0332458  0.04983377]

mean value: 0.04219040870666504

key: score_time
value: [0.01687479 0.01946425 0.02428985 0.01744056 0.02443838 0.02561164
 0.02391243 0.02650714 0.01740432 0.01979828]

mean value: 0.02157416343688965

key: test_mcc
value: [1.         0.93435318 1.         0.92687157 0.93048421 0.93048421
 0.93048421 0.79476958 0.79476958 0.72821908]

mean value: 0.8970435636403364

key: train_mcc
value: [0.9920858  0.98411246 0.99213963 1.         1.         0.99210029
 1.         1.         0.99205967 0.99205967]

mean value: 0.9944557518725569

key: test_accuracy
value: [1.         0.96875    1.         0.96774194 0.96774194 0.96774194
 0.96774194 0.90322581 0.90322581 0.87096774]

mean value: 0.9517137096774193

key: train_accuracy
value: [0.99642857 0.99285714 0.99644128 1.         1.         0.99644128
 1.         1.         0.99644128 0.99644128]

mean value: 0.9975050838840874

key: test_fscore
value: [1.         0.97560976 1.         0.97674419 0.97560976 0.97560976
 0.97560976 0.92307692 0.92307692 0.90909091]

mean value: 0.9634427965681511

key: train_fscore
value: [0.99728997 0.99462366 0.99728997 1.         1.         0.99730458
 1.         1.         0.99731903 0.99731903]

mean value: 0.9981146253628773

key: test_precision
value: [1.         1.         1.         0.95454545 1.         1.
 0.95238095 0.94736842 0.94736842 0.83333333]

mean value: 0.9634996582365003

key: train_precision
value: [1.         0.98930481 1.         1.         1.         0.99462366
 1.         1.         0.99465241 0.99465241]

mean value: 0.9973233281582428

key: test_recall
value: [1.         0.95238095 1.         1.         0.95238095 0.95238095
 1.         0.9        0.9        1.        ]

mean value: 0.9657142857142857

key: train_recall
value: [0.99459459 1.         0.99459459 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9989189189189189

key: test_roc_auc
value: [1.         0.97619048 1.         0.95       0.97619048 0.97619048
 0.95454545 0.90454545 0.90454545 0.81818182]

mean value: 0.946038961038961

key: train_roc_auc
value: [0.9972973  0.98947368 0.9972973  1.         1.         0.99479167
 1.         1.         0.99473684 0.99473684]

mean value: 0.9968333629682314

key: test_jcc
value: [1.         0.95238095 1.         0.95454545 0.95238095 0.95238095
 0.95238095 0.85714286 0.85714286 0.83333333]

mean value: 0.9311688311688311

key: train_jcc
value: [0.99459459 0.98930481 0.99459459 1.         1.         0.99462366
 1.         1.         0.99465241 0.99465241]

mean value: 0.9962422470771617

MCC on Blind test: 0.84

Accuracy on Blind test: 0.93

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03371096 0.06597567 0.09045339 0.05891442 0.08815241 0.07290912
 0.1127162  0.09216666 0.08394241 0.04576588]

mean value: 0.07447071075439453

key: score_time
value: [0.01326203 0.01323938 0.01386213 0.01348066 0.02233791 0.02737832
 0.02631879 0.02382731 0.01327252 0.01324177]

mean value: 0.01802208423614502

key: test_mcc
value: [0.41281273 0.49517597 0.40952381 0.38154231 0.61758068 0.44786837
 0.51793973 0.14863011 0.64116449 0.40572206]

mean value: 0.44779602562273146

key: train_mcc
value: [0.96830875 0.99204533 0.97623798 0.97636634 0.98422269 0.98422269
 0.97624243 0.97624243 0.97624243 0.97624243]

mean value: 0.9786373500460502

key: test_accuracy
value: [0.75       0.78125    0.74193548 0.74193548 0.83870968 0.77419355
 0.77419355 0.64516129 0.83870968 0.74193548]

mean value: 0.7628024193548387

key: train_accuracy
value: [0.98571429 0.99642857 0.98932384 0.98932384 0.99288256 0.99288256
 0.98932384 0.98932384 0.98932384 0.98932384]

mean value: 0.9903851042196238

key: test_fscore
value: [0.82608696 0.85106383 0.80952381 0.81818182 0.88888889 0.85106383
 0.82051282 0.75555556 0.88372093 0.81818182]

mean value: 0.8322780257173477

key: train_fscore
value: [0.98930481 0.99730458 0.99191375 0.9919571  0.99462366 0.99462366
 0.992      0.992      0.992      0.992     ]

mean value: 0.9927727558060793

key: test_precision
value: [0.76       0.76923077 0.80952381 0.7826087  0.83333333 0.76923077
 0.84210526 0.68       0.82608696 0.75      ]

mean value: 0.782211959665049

key: train_precision
value: [0.97883598 0.99462366 0.98924731 0.98404255 0.98930481 0.98930481
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9861867061945789

key: test_recall
value: [0.9047619  0.95238095 0.80952381 0.85714286 0.95238095 0.95238095
 0.8        0.85       0.95       0.9       ]

mean value: 0.8928571428571428

key: train_recall
value: [1.         1.         0.99459459 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9994594594594595

key: test_roc_auc
value: [0.67965368 0.7034632  0.7047619  0.67857143 0.77619048 0.67619048
 0.76363636 0.56136364 0.79318182 0.67727273]

mean value: 0.7014285714285714

key: train_roc_auc
value: [0.97894737 0.99473684 0.98688063 0.984375   0.98958333 0.98958333
 0.98421053 0.98421053 0.98421053 0.98421053]

mean value: 0.9860948613086771

key: test_jcc
value: [0.7037037  0.74074074 0.68       0.69230769 0.8        0.74074074
 0.69565217 0.60714286 0.79166667 0.69230769]

mean value: 0.7144262267523137

key: train_jcc
value: [0.97883598 0.99462366 0.98395722 0.98404255 0.98930481 0.98930481
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9856576969369168

MCC on Blind test: 0.46

Accuracy on Blind test: 0.77

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.53377724 0.52432537 0.47125101 0.45838308 0.44565415 0.44443941
 0.44144034 0.46182108 0.45414662 0.43897247]

mean value: 0.4674210786819458

key: score_time
value: [0.01514387 0.01880264 0.00988698 0.00940108 0.00947022 0.0095377
 0.00945544 0.0099566  0.00943661 0.00944352]

mean value: 0.011053466796875

key: test_mcc
value: [1.         1.         1.         0.85238095 1.         0.93048421
 0.93048421 0.93048421 0.85909091 0.72821908]

mean value: 0.9231143573921693

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         1.         1.         0.93548387 1.         0.96774194
 0.96774194 0.96774194 0.93548387 0.87096774]

mean value: 0.964516129032258

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         1.         1.         0.95238095 1.         0.97560976
 0.97560976 0.97560976 0.95       0.90909091]

mean value: 0.9738301129764544

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95238095 1.         1.
 0.95238095 0.95238095 0.95       0.83333333]

mean value: 0.964047619047619

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 1.         0.95238095
 1.         1.         0.95       1.        ]

mean value: 0.9854761904761905

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         1.         1.         0.92619048 1.         0.97619048
 0.95454545 0.95454545 0.92954545 0.81818182]

mean value: 0.9559199134199134

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         1.         1.         0.90909091 1.         0.95238095
 0.95238095 0.95238095 0.9047619  0.83333333]

mean value: 0.9504329004329004

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.87

Accuracy on Blind test: 0.94

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02589202 0.02847075 0.0404582  0.02659345 0.02673697 0.03226662
 0.03573513 0.02714109 0.05389953 0.03467345]

mean value: 0.033186721801757815

key: score_time
value: [0.01283598 0.01288366 0.01362491 0.01272559 0.01264453 0.01271176
 0.01251125 0.01660681 0.01272869 0.01582789]

mean value: 0.01351010799407959

key: test_mcc
value: [ 0.21867346  0.0849412  -0.26560636  0.09967105 -0.05976143  0.00752923
  0.01363636 -0.23927198  0.14863011  0.22469871]

mean value: 0.0233140353744436

key: train_mcc
value: [0.36577134 0.33200663 0.37383194 0.35226764 0.35226764 0.34110438
 0.34382047 0.39766525 0.32040778 0.35507261]

mean value: 0.3534215663451066

key: test_accuracy
value: [0.6875     0.65625    0.5483871  0.67741935 0.61290323 0.64516129
 0.5483871  0.48387097 0.64516129 0.67741935]

mean value: 0.6182459677419354

key: train_accuracy
value: [0.725      0.71428571 0.72597865 0.71886121 0.71886121 0.71530249
 0.71886121 0.7366548  0.71174377 0.72241993]

mean value: 0.7207968988307066

key: test_fscore
value: [0.8        0.78431373 0.70833333 0.8        0.75       0.7755102
 0.65       0.63636364 0.75555556 0.7826087 ]

mean value: 0.7442685150476528

key: train_fscore
value: [0.82774049 0.82222222 0.82774049 0.82405345 0.82405345 0.82222222
 0.8248337  0.83408072 0.82119205 0.82666667]

mean value: 0.8254805473034187

key: test_precision
value: [0.68965517 0.66666667 0.62962963 0.68965517 0.66666667 0.67857143
 0.65       0.58333333 0.68       0.69230769]

mean value: 0.6626485762003004

key: train_precision
value: [0.70610687 0.69811321 0.70610687 0.70075758 0.70075758 0.69811321
 0.70188679 0.71538462 0.69662921 0.70454545]

mean value: 0.7028401382933552

key: test_recall
value: [0.95238095 0.95238095 0.80952381 0.95238095 0.85714286 0.9047619
 0.65       0.7        0.85       0.9       ]

mean value: 0.8528571428571429

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.56709957 0.52164502 0.4047619  0.52619048 0.47857143 0.50238095
 0.50681818 0.39545455 0.56136364 0.58636364]

mean value: 0.505064935064935

key: train_roc_auc
value: [0.59473684 0.57894737 0.59895833 0.58854167 0.58854167 0.58333333
 0.58421053 0.61052632 0.57368421 0.58947368]

mean value: 0.5890953947368421

key: test_jcc
value: [0.66666667 0.64516129 0.5483871  0.66666667 0.6        0.63333333
 0.48148148 0.46666667 0.60714286 0.64285714]

mean value: 0.5958363201911588

key: train_jcc
value: [0.70610687 0.69811321 0.70610687 0.70075758 0.70075758 0.69811321
 0.70188679 0.71538462 0.69662921 0.70454545]

mean value: 0.7028401382933552

MCC on Blind test: 0.19

Accuracy on Blind test: 0.68

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02688026 0.03328443 0.03506446 0.03774476 0.03506637 0.0350914
 0.03563261 0.03496313 0.03501749 0.02813625]

mean value: 0.0336881160736084

key: score_time
value: [0.02236223 0.0209496  0.02134967 0.02306938 0.02363062 0.02382088
 0.02356005 0.0199604  0.02020645 0.0209074 ]

mean value: 0.02198166847229004

key: test_mcc
value: [1.         0.86147186 0.85238095 0.85465477 1.         0.93048421
 0.78625916 0.49780905 0.85909091 0.64116449]

mean value: 0.828331540294447

key: train_mcc
value: [0.95220382 0.94428471 0.95253998 0.95241514 0.95253998 0.94467837
 0.96021134 0.96021134 0.96021134 0.96021134]

mean value: 0.9539507359002783

key: test_accuracy
value: [1.         0.9375     0.93548387 0.93548387 1.         0.96774194
 0.90322581 0.77419355 0.93548387 0.83870968]

mean value: 0.9227822580645161

key: train_accuracy
value: [0.97857143 0.975      0.97864769 0.97864769 0.97864769 0.97508897
 0.98220641 0.98220641 0.98220641 0.98220641]

mean value: 0.979342907981698

key: test_fscore
value: [1.         0.95238095 0.95238095 0.95454545 1.         0.97560976
 0.92682927 0.82926829 0.95       0.88372093]

mean value: 0.9424735606613088

key: train_fscore
value: [0.98395722 0.98133333 0.98395722 0.98387097 0.98395722 0.98133333
 0.98666667 0.98666667 0.98666667 0.98666667]

mean value: 0.9845075958829279

key: test_precision
value: [1.         0.95238095 0.95238095 0.91304348 1.         1.
 0.9047619  0.80952381 0.95       0.82608696]

mean value: 0.9308178053830227

key: train_precision
value: [0.97354497 0.96842105 0.97354497 0.97860963 0.97354497 0.96842105
 0.97883598 0.97883598 0.97883598 0.97883598]

mean value: 0.9751430566910443

key: test_recall
value: [1.         0.95238095 0.95238095 1.         1.         0.95238095
 0.95       0.85       0.95       0.95      ]

mean value: 0.9557142857142857

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.98918919 0.99459459 0.99459459
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9940656785818076

key: test_roc_auc
value: [1.         0.93073593 0.92619048 0.9        1.         0.97619048
 0.88409091 0.74318182 0.92954545 0.79318182]

mean value: 0.9083116883116883

key: train_roc_auc
value: [0.97098151 0.96571835 0.97125563 0.97376126 0.97125563 0.9660473
 0.9762592  0.9762592  0.9762592  0.9762592 ]

mean value: 0.9724056463084476

key: test_jcc
value: [1.         0.90909091 0.90909091 0.91304348 1.         0.95238095
 0.86363636 0.70833333 0.9047619  0.79166667]

mean value: 0.8952004517221909

key: train_jcc
value: [0.96842105 0.96335079 0.96842105 0.96825397 0.96842105 0.96335079
 0.97368421 0.97368421 0.97368421 0.97368421]

mean value: 0.9694955538934596

MCC on Blind test: 0.82

Accuracy on Blind test: 0.92

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.27341199 0.26561522 0.33326602 0.29521275 0.33132577 0.29219532
 0.31099606 0.33863211 0.34531784 0.34489107]

mean value: 0.3130864143371582

key: score_time
value: [0.01944685 0.02101326 0.02376032 0.02346349 0.02274227 0.02088618
 0.02771783 0.02203941 0.02380085 0.02210379]

mean value: 0.02269742488861084

key: test_mcc
value: [1.         0.86147186 0.85238095 0.85465477 0.78625916 0.78625916
 0.85909091 0.54627358 0.93048421 0.66057826]

mean value: 0.8137452858509517

key: train_mcc
value: [0.95220382 0.94428471 0.97623798 0.95241514 0.98417793 0.96831892
 0.97611544 0.97611544 0.98409734 0.98409734]

mean value: 0.9698064067154124

key: test_accuracy
value: [1.         0.9375     0.93548387 0.93548387 0.90322581 0.90322581
 0.93548387 0.77419355 0.96774194 0.83870968]

mean value: 0.9131048387096774

key: train_accuracy
value: [0.97857143 0.975      0.98932384 0.97864769 0.99288256 0.98576512
 0.98932384 0.98932384 0.99288256 0.99288256]

mean value: 0.986460345704118

key: test_fscore
value: [1.         0.95238095 0.95238095 0.95454545 0.92682927 0.92682927
 0.95       0.81081081 0.97560976 0.88888889]

mean value: 0.9338275351689985

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.98395722 0.98133333 0.99191375 0.98387097 0.99459459 0.98924731
 0.9919571  0.9919571  0.99462366 0.99462366]

mean value: 0.9898078694323124

key: test_precision
value: [1.         0.95238095 0.95238095 0.91304348 0.95       0.95
 0.95       0.88235294 0.95238095 0.8       ]

mean value: 0.9302539276580197

key: train_precision
value: [0.97354497 0.96842105 0.98924731 0.97860963 0.99459459 0.98395722
 0.98930481 0.98930481 0.99462366 0.99462366]

mean value: 0.9856231715015297

key: test_recall
value: [1.         0.95238095 0.95238095 1.         0.9047619  0.9047619
 0.95       0.75       1.         1.        ]

mean value: 0.9414285714285714

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.98918919 0.99459459 0.99459459
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9940656785818076

key: test_roc_auc
value: [1.         0.93073593 0.92619048 0.9        0.90238095 0.90238095
 0.92954545 0.78409091 0.95454545 0.77272727]

mean value: 0.9002597402597402

key: train_roc_auc
value: [0.97098151 0.96571835 0.98688063 0.97376126 0.99208896 0.9816723
 0.98678551 0.98678551 0.99204867 0.99204867]

mean value: 0.9828771375365178

key: test_jcc
value: [1.         0.90909091 0.90909091 0.91304348 0.86363636 0.86363636
 0.9047619  0.68181818 0.95238095 0.8       ]

mean value: 0.8797459062676454

key: train_jcc
value: [0.96842105 0.96335079 0.98395722 0.96825397 0.98924731 0.9787234
 0.98404255 0.98404255 0.98930481 0.98930481]

mean value: 0.9798648473611902

MCC on Blind test: 0.85

Accuracy on Blind test: 0.94

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03805447 0.03569889 0.03429389 0.04928493 0.0660429  0.04097486
 0.10463309 0.07833409 0.0353334  0.04143572]

mean value: 0.05240862369537354

key: score_time
value: [0.01335335 0.01258039 0.01363444 0.01591015 0.01217866 0.01498747
 0.01492548 0.01208353 0.01185942 0.01482725]

mean value: 0.013634014129638671

key: test_mcc
value: [0.76277007 0.90889326 0.80817439 0.80817439 0.8047619  0.70714286
 0.90238095 0.65952381 0.8047619  0.8047619 ]

mean value: 0.7971345447661506

key: train_mcc
value: [0.88154484 0.88712176 0.88680616 0.87612986 0.871086   0.89769524
 0.89238376 0.91925359 0.88220797 0.8869027 ]

mean value: 0.8881131867238982

key: test_accuracy
value: [0.88095238 0.95238095 0.90243902 0.90243902 0.90243902 0.85365854
 0.95121951 0.82926829 0.90243902 0.90243902]

mean value: 0.8979674796747967

key: train_accuracy
value: [0.94054054 0.94324324 0.94339623 0.93800539 0.93530997 0.94878706
 0.94609164 0.95956873 0.94070081 0.94339623]

mean value: 0.943903984847381

key: test_fscore
value: [0.88372093 0.95454545 0.90909091 0.90909091 0.9047619  0.85714286
 0.95       0.82926829 0.9        0.9       ]

mean value: 0.8997621257547519

key: train_fscore
value: [0.94148936 0.94429708 0.94339623 0.9383378  0.93617021 0.94906166
 0.94680851 0.96       0.94210526 0.944     ]

mean value: 0.944566612071446

key: test_precision
value: [0.86363636 0.91304348 0.86956522 0.86956522 0.9047619  0.85714286
 0.95       0.80952381 0.9        0.9       ]

mean value: 0.8837238848108413

key: train_precision
value: [0.92670157 0.92708333 0.94086022 0.93085106 0.92146597 0.94148936
 0.93684211 0.95238095 0.92268041 0.93650794]

mean value: 0.9336862919709208

key: test_recall
value: [0.9047619  1.         0.95238095 0.95238095 0.9047619  0.85714286
 0.95       0.85       0.9        0.9       ]

mean value: 0.9171428571428571

key: train_recall
value: [0.95675676 0.96216216 0.94594595 0.94594595 0.95135135 0.95675676
 0.95698925 0.96774194 0.96236559 0.9516129 ]

mean value: 0.9557628596338275

key: test_roc_auc
value: [0.88095238 0.95238095 0.90119048 0.90119048 0.90238095 0.85357143
 0.95119048 0.8297619  0.90238095 0.90238095]

mean value: 0.8977380952380951

key: train_roc_auc
value: [0.94054054 0.94324324 0.94340308 0.93802674 0.9353531  0.94880849
 0.94606219 0.95954664 0.94064226 0.94337402]

mean value: 0.9439000290613194

key: test_jcc
value: [0.79166667 0.91304348 0.83333333 0.83333333 0.82608696 0.75
 0.9047619  0.70833333 0.81818182 0.81818182]

mean value: 0.8196922642574817

key: train_jcc
value: [0.88944724 0.89447236 0.89285714 0.88383838 0.88       0.90306122
 0.8989899  0.92307692 0.89054726 0.89393939]

mean value: 0.8950229828863081

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.39968538 1.02785754 1.49144053 1.42981839 0.8560338  1.02810216
 0.97955227 0.91374874 1.23398352 1.15036035]

mean value: 1.151058268547058

key: score_time
value: [0.0152576  0.01563644 0.01735854 0.01563406 0.01553392 0.01554656
 0.01516581 0.01547575 0.01566935 0.01563621]

mean value: 0.015691423416137697

key: test_mcc
value: [0.85811633 0.90889326 0.95238095 0.90238095 1.         0.8047619
 0.95227002 0.70714286 0.95227002 0.85441771]

mean value: 0.8892633994360876

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.95238095 0.97560976 0.95121951 1.         0.90243902
 0.97560976 0.85365854 0.97560976 0.92682927]

mean value: 0.9441927990708479

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93023256 0.95454545 0.97560976 0.95238095 1.         0.9047619
 0.97435897 0.85       0.97435897 0.92307692]

mean value: 0.9439325497720279

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 0.91304348 1.         0.95238095 1.         0.9047619
 1.         0.85       1.         0.94736842]

mean value: 0.9476645665547268

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         0.95238095 0.95238095 1.         0.9047619
 0.95       0.85       0.95       0.9       ]

mean value: 0.9411904761904761

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.95238095 0.97619048 0.95119048 1.         0.90238095
 0.975      0.85357143 0.975      0.92619048]

mean value: 0.944047619047619

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.86956522 0.91304348 0.95238095 0.90909091 1.         0.82608696
 0.95       0.73913043 0.95       0.85714286]

mean value: 0.896644080557124

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.77

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01401162 0.01052999 0.01121998 0.01657534 0.00958681 0.00979447
 0.01485133 0.01169181 0.00982499 0.00981498]

mean value: 0.011790132522583008

key: score_time
value: [0.01367188 0.00954771 0.01246524 0.01591039 0.00917268 0.00907016
 0.01428008 0.00912118 0.00907564 0.00906348]

mean value: 0.011137843132019043

key: test_mcc
value: [0.57207755 0.53357838 0.66432098 0.46623254 0.86240942 0.61152662
 0.8047619  0.53864117 0.72229808 0.75714286]

mean value: 0.6532989494622181

key: train_mcc
value: [0.64703542 0.67504003 0.716066   0.6942959  0.71499964 0.76337838
 0.67722017 0.72381482 0.7197458  0.71480337]

mean value: 0.7046399531141702

key: test_accuracy
value: [0.78571429 0.76190476 0.82926829 0.73170732 0.92682927 0.80487805
 0.90243902 0.75609756 0.85365854 0.87804878]

mean value: 0.8230545876887341

key: train_accuracy
value: [0.82162162 0.83243243 0.85444744 0.84366577 0.85444744 0.8787062
 0.83557951 0.85983827 0.85714286 0.85444744]

mean value: 0.8492328986668609

key: test_fscore
value: [0.7804878  0.7826087  0.84444444 0.75555556 0.93333333 0.81818182
 0.9        0.7826087  0.86363636 0.87804878]

mean value: 0.8338905491821716

key: train_fscore
value: [0.83076923 0.84577114 0.86363636 0.85353535 0.86294416 0.88549618
 0.84634761 0.86734694 0.86582278 0.86363636]

mean value: 0.8585306132137107

key: test_precision
value: [0.8        0.72       0.79166667 0.70833333 0.875      0.7826087
 0.9        0.69230769 0.79166667 0.85714286]

mean value: 0.791872591176939

key: train_precision
value: [0.7902439  0.78341014 0.81042654 0.80094787 0.81339713 0.83653846
 0.79620853 0.82524272 0.81818182 0.81428571]

mean value: 0.8088882820715697

key: test_recall
value: [0.76190476 0.85714286 0.9047619  0.80952381 1.         0.85714286
 0.9        0.9        0.95       0.9       ]

mean value: 0.8840476190476191

key: train_recall
value: [0.87567568 0.91891892 0.92432432 0.91351351 0.91891892 0.94054054
 0.90322581 0.91397849 0.91935484 0.91935484]

mean value: 0.9147805870386516

key: test_roc_auc
value: [0.78571429 0.76190476 0.82738095 0.7297619  0.925      0.80357143
 0.90238095 0.75952381 0.85595238 0.87857143]

mean value: 0.8229761904761904

key: train_roc_auc
value: [0.82162162 0.83243243 0.85463528 0.84385353 0.85462075 0.87887242
 0.83539669 0.85969195 0.85697472 0.85427201]

mean value: 0.8492371403661726

key: test_jcc
value: [0.64       0.64285714 0.73076923 0.60714286 0.875      0.69230769
 0.81818182 0.64285714 0.76       0.7826087 ]

mean value: 0.7191724579768058

key: train_jcc
value: [0.71052632 0.73275862 0.76       0.74449339 0.75892857 0.79452055
 0.73362445 0.76576577 0.76339286 0.76      ]

mean value: 0.7524010524980485

MCC on Blind test: 0.55

Accuracy on Blind test: 0.8

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01205778 0.01103449 0.01065707 0.01740766 0.01130915 0.01000571
 0.01102018 0.01014924 0.01479101 0.01021481]

mean value: 0.011864709854125976

key: score_time
value: [0.01010084 0.00973463 0.00967646 0.01493192 0.00972748 0.00886583
 0.00895524 0.01499701 0.00933695 0.00921488]

mean value: 0.010554122924804687

key: test_mcc
value: [0.57735027 0.57735027 0.75714286 0.67700771 0.7633652  0.65871309
 0.7633652  0.71121921 0.70714286 0.90238095]

mean value: 0.7095037610692975

key: train_mcc
value: [0.75417724 0.74123391 0.71558817 0.74235478 0.74235478 0.74235478
 0.72167661 0.72031226 0.73194294 0.73194294]

mean value: 0.7343938376347271

key: test_accuracy
value: [0.78571429 0.78571429 0.87804878 0.82926829 0.87804878 0.82926829
 0.87804878 0.85365854 0.85365854 0.95121951]

mean value: 0.8522648083623693

key: train_accuracy
value: [0.87567568 0.87027027 0.85714286 0.87061995 0.87061995 0.87061995
 0.85983827 0.85983827 0.86522911 0.86522911]

mean value: 0.8665083412253224

key: test_fscore
value: [0.8        0.8        0.87804878 0.85106383 0.88888889 0.8372093
 0.86486486 0.85714286 0.85       0.95      ]

mean value: 0.8577218523497231

key: train_fscore
value: [0.88082902 0.87301587 0.86089239 0.87368421 0.87368421 0.87368421
 0.86528497 0.86315789 0.86979167 0.86979167]

mean value: 0.8703816110753745

key: test_precision
value: [0.75       0.75       0.9        0.76923077 0.83333333 0.81818182
 0.94117647 0.81818182 0.85       0.95      ]

mean value: 0.8380104209515974

key: train_precision
value: [0.84577114 0.85492228 0.83673469 0.85128205 0.85128205 0.85128205
 0.835      0.84536082 0.84343434 0.84343434]

mean value: 0.8458503783406013

key: test_recall
value: [0.85714286 0.85714286 0.85714286 0.95238095 0.95238095 0.85714286
 0.8        0.9        0.85       0.95      ]

mean value: 0.8833333333333333

key: train_recall
value: [0.91891892 0.89189189 0.88648649 0.8972973  0.8972973  0.8972973
 0.89784946 0.88172043 0.89784946 0.89784946]

mean value: 0.8964458006393491

key: test_roc_auc
value: [0.78571429 0.78571429 0.87857143 0.82619048 0.87619048 0.82857143
 0.87619048 0.8547619  0.85357143 0.95119048]

mean value: 0.8516666666666667

key: train_roc_auc
value: [0.87567568 0.87027027 0.85722174 0.87069166 0.87069166 0.87069166
 0.85973554 0.85977913 0.86514095 0.86514095]

mean value: 0.8665039232781169

key: test_jcc
value: [0.66666667 0.66666667 0.7826087  0.74074074 0.8        0.72
 0.76190476 0.75       0.73913043 0.9047619 ]

mean value: 0.7532479871175524

key: train_jcc
value: [0.78703704 0.77464789 0.75576037 0.77570093 0.77570093 0.77570093
 0.76255708 0.75925926 0.76958525 0.76958525]

mean value: 0.7705534940560166

MCC on Blind test: 0.59

Accuracy on Blind test: 0.82

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01108479 0.0093708  0.01117444 0.01171565 0.01109958 0.0105772
 0.01115322 0.01098776 0.00986218 0.01298475]

mean value: 0.011001038551330566

key: score_time
value: [0.03188109 0.01672149 0.02224278 0.01918387 0.01870203 0.01756144
 0.01790571 0.01775837 0.01715755 0.02098441]

mean value: 0.020009875297546387

key: test_mcc
value: [0.43052839 0.62187434 0.56086079 0.51190476 0.71121921 0.65871309
 0.61152662 0.56086079 0.60952381 0.51320273]

mean value: 0.5790214513441565

key: train_mcc
value: [0.76327807 0.80010521 0.73634484 0.75792591 0.75307912 0.74160356
 0.75239189 0.74565731 0.74718674 0.75274878]

mean value: 0.7550321433011067

key: test_accuracy
value: [0.71428571 0.80952381 0.7804878  0.75609756 0.85365854 0.82926829
 0.80487805 0.7804878  0.80487805 0.75609756]

mean value: 0.7889663182346109

key: train_accuracy
value: [0.88108108 0.9        0.86792453 0.8787062  0.87601078 0.87061995
 0.87601078 0.87061995 0.87331536 0.87601078]

mean value: 0.8770299409922051

key: test_fscore
value: [0.7        0.8        0.79069767 0.76190476 0.85       0.8372093
 0.78947368 0.76923077 0.8        0.73684211]

mean value: 0.7835358297353401

key: train_fscore
value: [0.87777778 0.89918256 0.86501377 0.87603306 0.87222222 0.86813187
 0.87830688 0.86363636 0.87123288 0.87362637]

mean value: 0.8745163753677637

key: test_precision
value: [0.73684211 0.84210526 0.77272727 0.76190476 0.89473684 0.81818182
 0.83333333 0.78947368 0.8        0.77777778]

mean value: 0.8027082858661806

key: train_precision
value: [0.90285714 0.90659341 0.88202247 0.89325843 0.89714286 0.88268156
 0.86458333 0.91566265 0.88826816 0.89325843]

mean value: 0.8926328437042237

key: test_recall
value: [0.66666667 0.76190476 0.80952381 0.76190476 0.80952381 0.85714286
 0.75       0.75       0.8        0.7       ]

mean value: 0.7666666666666666

key: train_recall
value: [0.85405405 0.89189189 0.84864865 0.85945946 0.84864865 0.85405405
 0.89247312 0.8172043  0.85483871 0.85483871]

mean value: 0.8576111595466435

key: test_roc_auc
value: [0.71428571 0.80952381 0.7797619  0.75595238 0.8547619  0.82857143
 0.80357143 0.7797619  0.8047619  0.7547619 ]

mean value: 0.7885714285714286

key: train_roc_auc
value: [0.88108108 0.9        0.86787271 0.87865446 0.87593723 0.87057541
 0.87596629 0.87076431 0.8733653  0.876068  ]

mean value: 0.8770284800929963

key: test_jcc
value: [0.53846154 0.66666667 0.65384615 0.61538462 0.73913043 0.72
 0.65217391 0.625      0.66666667 0.58333333]

mean value: 0.6460663322185061

key: train_jcc
value: [0.78217822 0.81683168 0.76213592 0.77941176 0.77339901 0.76699029
 0.78301887 0.76       0.77184466 0.77560976]

mean value: 0.7771420178282804

MCC on Blind test: 0.41

Accuracy on Blind test: 0.72

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01698852 0.01849985 0.01615834 0.01625133 0.01616049 0.0162015
 0.01634336 0.01601434 0.01653647 0.01660895]

mean value: 0.016576313972473146

key: score_time
value: [0.01195359 0.01049948 0.01058865 0.01097417 0.01060414 0.01036382
 0.01065779 0.01078796 0.01075959 0.0114882 ]

mean value: 0.010867738723754882

key: test_mcc
value: [0.58834841 0.78446454 0.81975606 0.65915306 0.57570364 0.65871309
 0.90692382 0.76500781 0.8047619  0.8547619 ]

mean value: 0.7417594238667976

key: train_mcc
value: [0.76980251 0.7855844  0.79896877 0.76934606 0.7608309  0.79108463
 0.76918835 0.81470293 0.78607065 0.7751856 ]

mean value: 0.7820764804941465

key: test_accuracy
value: [0.78571429 0.88095238 0.90243902 0.80487805 0.7804878  0.82926829
 0.95121951 0.87804878 0.90243902 0.92682927]

mean value: 0.8642276422764228

key: train_accuracy
value: [0.88108108 0.88918919 0.89757412 0.88140162 0.87601078 0.89218329
 0.88140162 0.90566038 0.88948787 0.88409704]

mean value: 0.8878086981860567

key: test_fscore
value: [0.80851064 0.89361702 0.91304348 0.84       0.80851064 0.8372093
 0.95238095 0.88372093 0.9        0.92682927]

mean value: 0.8763822229364985

key: train_fscore
value: [0.88888889 0.89620253 0.90206186 0.88832487 0.88442211 0.89847716
 0.88888889 0.91002571 0.89672544 0.89168766]

mean value: 0.8945705111280717

key: test_precision
value: [0.73076923 0.80769231 0.84       0.72413793 0.73076923 0.81818182
 0.90909091 0.82608696 0.9        0.9047619 ]

mean value: 0.8191490288821623

key: train_precision
value: [0.83412322 0.84285714 0.86206897 0.83732057 0.82629108 0.84688995
 0.83809524 0.87192118 0.8436019  0.83886256]

mean value: 0.8442031812588747

key: test_recall
value: [0.9047619  1.         1.         1.         0.9047619  0.85714286
 1.         0.95       0.9        0.95      ]

mean value: 0.9466666666666667

key: train_recall
value: [0.95135135 0.95675676 0.94594595 0.94594595 0.95135135 0.95675676
 0.94623656 0.9516129  0.95698925 0.9516129 ]

mean value: 0.9514559721011334

key: test_roc_auc
value: [0.78571429 0.88095238 0.9        0.8        0.77738095 0.82857143
 0.95238095 0.8797619  0.90238095 0.92738095]

mean value: 0.863452380952381

key: train_roc_auc
value: [0.88108108 0.88918919 0.89770416 0.88157512 0.87621331 0.89235687
 0.88122639 0.90553618 0.88930543 0.88391456]

mean value: 0.8878102295844231

key: test_jcc
value: [0.67857143 0.80769231 0.84       0.72413793 0.67857143 0.72
 0.90909091 0.79166667 0.81818182 0.86363636]

mean value: 0.7831548853445405

key: train_jcc
value: [0.8        0.81192661 0.82159624 0.79908676 0.79279279 0.8156682
 0.8        0.83490566 0.81278539 0.80454545]

mean value: 0.8093307106235347

MCC on Blind test: 0.68

Accuracy on Blind test: 0.86

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.47419095 1.40329194 1.42739582 1.49662685 1.34318018 1.49213672
 1.31603861 1.42225814 1.39964223 1.34260798]

mean value: 1.4117369413375855

key: score_time
value: [0.01513195 0.02292132 0.0230453  0.01247382 0.01239777 0.0147655
 0.01543188 0.01502585 0.01510143 0.01240325]

mean value: 0.015869808197021485

key: test_mcc
value: [0.76277007 0.95346259 0.90692382 0.86240942 0.95238095 0.70714286
 0.90649828 0.51320273 0.85441771 0.8047619 ]

mean value: 0.8223970331207965

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88095238 0.97619048 0.95121951 0.92682927 0.97560976 0.85365854
 0.95121951 0.75609756 0.92682927 0.90243902]

mean value: 0.9101045296167247

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88372093 0.97674419 0.95       0.93333333 0.97560976 0.85714286
 0.94736842 0.73684211 0.92307692 0.9       ]

mean value: 0.9083838512245533

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.86363636 0.95454545 1.         0.875      1.         0.85714286
 1.         0.77777778 0.94736842 0.9       ]

mean value: 0.9175470874155085

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9047619  1.         0.9047619  1.         0.95238095 0.85714286
 0.9        0.7        0.9        0.9       ]

mean value: 0.9019047619047619

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.88095238 0.97619048 0.95238095 0.925      0.97619048 0.85357143
 0.95       0.7547619  0.92619048 0.90238095]

mean value: 0.9097619047619048

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.79166667 0.95454545 0.9047619  0.875      0.95238095 0.75
 0.9        0.58333333 0.85714286 0.81818182]

mean value: 0.8387012987012987

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.68

Accuracy on Blind test: 0.85

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.03340149 0.01821494 0.01930451 0.02136612 0.01852131 0.01884985
 0.02576661 0.01696181 0.01858449 0.01900506]

mean value: 0.02099761962890625

key: score_time
value: [0.01251078 0.00937796 0.01435971 0.00949097 0.00899172 0.00927854
 0.01029158 0.01122713 0.00910568 0.00921607]

mean value: 0.01038501262664795

key: test_mcc
value: [0.71754731 0.85811633 1.         0.86240942 0.90692382 0.8547619
 0.86240942 0.86333169 0.85441771 0.90238095]

mean value: 0.8682298554829512

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.92857143 1.         0.92682927 0.95121951 0.92682927
 0.92682927 0.92682927 0.92682927 0.95121951]

mean value: 0.9322299651567945

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85       0.92682927 1.         0.93333333 0.95       0.92682927
 0.91891892 0.93023256 0.92307692 0.95      ]

mean value: 0.9309220270054076

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.89473684 0.95       1.         0.875      1.         0.95
 1.         0.86956522 0.94736842 0.95      ]

mean value: 0.9436670480549199

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.80952381 0.9047619  1.         1.         0.9047619  0.9047619
 0.85       1.         0.9        0.95      ]

mean value: 0.9223809523809524

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.92857143 1.         0.925      0.95238095 0.92738095
 0.925      0.92857143 0.92619048 0.95119048]

mean value: 0.9321428571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73913043 0.86363636 1.         0.875      0.9047619  0.86363636
 0.85       0.86956522 0.85714286 0.9047619 ]

mean value: 0.8727635046113307

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.12875319 0.12269926 0.12715673 0.13435197 0.13244057 0.13502121
 0.12934875 0.12360287 0.13284302 0.13551545]

mean value: 0.13017330169677735

key: score_time
value: [0.01931357 0.01789641 0.01895833 0.02062035 0.02520704 0.02190733
 0.01751018 0.01830721 0.02011347 0.01819706]

mean value: 0.0198030948638916

key: test_mcc
value: [0.64597519 0.8660254  0.80817439 0.80817439 0.8547619  0.80907152
 1.         0.71121921 0.80817439 0.95238095]

mean value: 0.8263957356100416

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.80952381 0.92857143 0.90243902 0.90243902 0.92682927 0.90243902
 1.         0.85365854 0.90243902 0.97560976]

mean value: 0.9103948896631824

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.93333333 0.90909091 0.90909091 0.92682927 0.9
 1.         0.85714286 0.89473684 0.97560976]

mean value: 0.9139167208486849

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.74074074 0.875      0.86956522 0.86956522 0.95       0.94736842
 1.         0.81818182 0.94444444 0.95238095]

mean value: 0.8967246811583196

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         0.95238095 0.95238095 0.9047619  0.85714286
 1.         0.9        0.85       1.        ]

mean value: 0.9369047619047619

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.80952381 0.92857143 0.90119048 0.90119048 0.92738095 0.90357143
 1.         0.8547619  0.90119048 0.97619048]

mean value: 0.9103571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.875      0.83333333 0.83333333 0.86363636 0.81818182
 1.         0.75       0.80952381 0.95238095]

mean value: 0.8449675324675325

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.71

Accuracy on Blind test: 0.87

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00980496 0.00954533 0.00995851 0.01084757 0.0097611  0.00990272
 0.00961041 0.00956297 0.01108313 0.01018977]

mean value: 0.010026645660400391

key: score_time
value: [0.00882721 0.00858307 0.00972176 0.00881481 0.00876117 0.00892162
 0.0086     0.00859857 0.00930023 0.00956869]

mean value: 0.008969712257385253

key: test_mcc
value: [0.47673129 0.49029034 0.51190476 0.56190476 0.6133669  0.6133669
 0.66432098 0.46428571 0.51966679 0.46428571]

mean value: 0.5380124156067321

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73809524 0.73809524 0.75609756 0.7804878  0.80487805 0.80487805
 0.82926829 0.73170732 0.75609756 0.73170732]

mean value: 0.7671312427409989

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.73170732 0.76595745 0.76190476 0.7804878  0.8        0.8
 0.81081081 0.73170732 0.72222222 0.73170732]

mean value: 0.7636504997843866

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.69230769 0.76190476 0.8        0.84210526 0.84210526
 0.88235294 0.71428571 0.8125     0.71428571]

mean value: 0.7811847350276143

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 0.76190476 0.76190476 0.76190476 0.76190476
 0.75       0.75       0.65       0.75      ]

mean value: 0.7519047619047619

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73809524 0.73809524 0.75595238 0.78095238 0.80595238 0.80595238
 0.82738095 0.73214286 0.75357143 0.73214286]

mean value: 0.7670238095238096

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57692308 0.62068966 0.61538462 0.64       0.66666667 0.66666667
 0.68181818 0.57692308 0.56521739 0.57692308]

mean value: 0.6187212407782122

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.36

Accuracy on Blind test: 0.7

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.62729216 1.63516164 1.67529535 1.64418316 1.65478873 1.56067038
 1.59187913 1.81973457 1.68453097 1.64458179]

mean value: 1.65381178855896

key: score_time
value: [0.09210157 0.09104776 0.0897491  0.08998299 0.09835529 0.09500146
 0.10104084 0.10681438 0.09972382 0.09916043]

mean value: 0.0962977647781372

key: test_mcc
value: [0.90889326 0.90889326 0.85441771 0.86240942 1.         0.90238095
 0.95227002 0.8547619  0.95238095 0.95238095]

mean value: 0.9148788418773526

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95238095 0.95238095 0.92682927 0.92682927 1.         0.95121951
 0.97560976 0.92682927 0.97560976 0.97560976]

mean value: 0.9563298490127758

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95454545 0.95454545 0.93023256 0.93333333 1.         0.95238095
 0.97435897 0.92682927 0.97560976 0.97560976]

mean value: 0.9577445507791509

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.91304348 0.90909091 0.875      1.         0.95238095
 1.         0.9047619  0.95238095 0.95238095]

mean value: 0.937208262751741

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.95238095 1.         1.         0.95238095
 0.95       0.95       1.         1.        ]

mean value: 0.9804761904761905

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95238095 0.95238095 0.92619048 0.925      1.         0.95119048
 0.975      0.92738095 0.97619048 0.97619048]

mean value: 0.9561904761904761

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91304348 0.91304348 0.86956522 0.875      1.         0.90909091
 0.95       0.86363636 0.95238095 0.95238095]

mean value: 0.919814135140222

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.88

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.991997   0.92924547 0.92070866 0.94155884 0.96088862 0.94280195
 0.9203403  0.98444152 1.03350067 1.02090788]

mean value: 0.9646390914916992

key: score_time
value: [0.24833751 0.24513054 0.12675023 0.24418807 0.22218466 0.22113061
 0.27262068 0.13829207 0.15273929 0.18978667]

mean value: 0.20611603260040284

key: test_mcc
value: [0.95346259 0.90889326 0.85441771 0.86240942 0.95227002 0.90238095
 1.         0.76500781 0.90238095 0.95238095]

mean value: 0.9053603651273313

key: train_mcc
value: [0.96779381 0.96779381 0.978494   0.97317407 0.978494   0.9734012
 0.97317174 0.97849275 0.96787795 0.97849275]

mean value: 0.973718608698558

key: test_accuracy
value: [0.97619048 0.95238095 0.92682927 0.92682927 0.97560976 0.95121951
 1.         0.87804878 0.95121951 0.97560976]

mean value: 0.9513937282229965

key: train_accuracy
value: [0.98378378 0.98378378 0.98921833 0.98652291 0.98921833 0.98652291
 0.98652291 0.98921833 0.98382749 0.98921833]

mean value: 0.9867837109346543

key: test_fscore
value: [0.97674419 0.95454545 0.93023256 0.93333333 0.97674419 0.95238095
 1.         0.88372093 0.95       0.97560976]

mean value: 0.9533311356822418

key: train_fscore
value: [0.98395722 0.98395722 0.98924731 0.98659517 0.98924731 0.98666667
 0.98666667 0.98930481 0.98404255 0.98930481]

mean value: 0.9868989748614594

key: test_precision
value: [0.95454545 0.91304348 0.90909091 0.875      0.95454545 0.95238095
 1.         0.82608696 0.95       0.95238095]

mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
0.9287074157726332

key: train_precision
value: [0.97354497 0.97354497 0.98395722 0.9787234  0.98395722 0.97368421
 0.97883598 0.98404255 0.97368421 0.98404255]

mean value: 0.9788017296119529

key: test_recall
value: [1.         1.         0.95238095 1.         1.         0.95238095
 1.         0.95       0.95       1.        ]

mean value: 0.9804761904761905

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.99459459 0.99459459 1.
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9951467596628887

key: test_roc_auc
value: [0.97619048 0.95238095 0.92619048 0.925      0.975      0.95119048
 1.         0.8797619  0.95119048 0.97619048]

mean value: 0.9513095238095238

key: train_roc_auc
value: [0.98378378 0.98378378 0.98923278 0.98654461 0.98923278 0.98655914
 0.98650102 0.98920372 0.98379831 0.98920372]

mean value: 0.9867843650101715

key: test_jcc
value: [0.95454545 0.91304348 0.86956522 0.875      0.95454545 0.90909091
 1.         0.79166667 0.9047619  0.95238095]

mean value: 0.9124600037643515

key: train_jcc
value: [0.96842105 0.96842105 0.9787234  0.97354497 0.9787234  0.97368421
 0.97368421 0.97883598 0.96858639 0.97883598]

mean value: 0.9741460653477914

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02595615 0.00980139 0.010777   0.00991344 0.01007867 0.01106524
 0.01026869 0.010144   0.01002526 0.01101446]

mean value: 0.011904430389404298

key: score_time
value: [0.01006055 0.00894856 0.01005125 0.0089395  0.00916576 0.00964999
 0.0100596  0.00945497 0.00976562 0.0090816 ]

mean value: 0.009517741203308106

key: test_mcc
value: [0.57735027 0.57735027 0.75714286 0.67700771 0.7633652  0.65871309
 0.7633652  0.71121921 0.70714286 0.90238095]

mean value: 0.7095037610692975

key: train_mcc
value: [0.75417724 0.74123391 0.71558817 0.74235478 0.74235478 0.74235478
 0.72167661 0.72031226 0.73194294 0.73194294]

mean value: 0.7343938376347271

key: test_accuracy
value: [0.78571429 0.78571429 0.87804878 0.82926829 0.87804878 0.82926829
 0.87804878 0.85365854 0.85365854 0.95121951]

mean value: 0.8522648083623693

key: train_accuracy
value: [0.87567568 0.87027027 0.85714286 0.87061995 0.87061995 0.87061995
 0.85983827 0.85983827 0.86522911 0.86522911]

mean value: 0.8665083412253224

key: test_fscore
value: [0.8        0.8        0.87804878 0.85106383 0.88888889 0.8372093
 0.86486486 0.85714286 0.85       0.95      ]

mean value: 0.8577218523497231

key: train_fscore
value: [0.88082902 0.87301587 0.86089239 0.87368421 0.87368421 0.87368421
 0.86528497 0.86315789 0.86979167 0.86979167]

mean value: 0.8703816110753745

key: test_precision
value: [0.75       0.75       0.9        0.76923077 0.83333333 0.81818182
 0.94117647 0.81818182 0.85       0.95      ]

mean value: 0.8380104209515974

key: train_precision
value: [0.84577114 0.85492228 0.83673469 0.85128205 0.85128205 0.85128205
 0.835      0.84536082 0.84343434 0.84343434]

mean value: 0.8458503783406013

key: test_recall
value: [0.85714286 0.85714286 0.85714286 0.95238095 0.95238095 0.85714286
 0.8        0.9        0.85       0.95      ]

mean value: 0.8833333333333333

key: train_recall
value: [0.91891892 0.89189189 0.88648649 0.8972973  0.8972973  0.8972973
 0.89784946 0.88172043 0.89784946 0.89784946]

mean value: 0.8964458006393491

key: test_roc_auc
value: [0.78571429 0.78571429 0.87857143 0.82619048 0.87619048 0.82857143
 0.87619048 0.8547619  0.85357143 0.95119048]

mean value: 0.8516666666666667

key: train_roc_auc
value: [0.87567568 0.87027027 0.85722174 0.87069166 0.87069166 0.87069166
 0.85973554 0.85977913 0.86514095 0.86514095]

mean value: 0.8665039232781169

key: test_jcc
value: [0.66666667 0.66666667 0.7826087  0.74074074 0.8        0.72
 0.76190476 0.75       0.73913043 0.9047619 ]

mean value: 0.7532479871175524

key: train_jcc
value: [0.78703704 0.77464789 0.75576037 0.77570093 0.77570093 0.77570093
 0.76255708 0.75925926 0.76958525 0.76958525]

mean value: 0.7705534940560166

MCC on Blind test: 0.59

Accuracy on Blind test: 0.82

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.12928247 0.29347372 0.07189345 0.06359673 0.07391357 0.07862544
 0.0591352  0.05616426 0.06642604 0.08556604]

mean value: 0.09780769348144532

key: score_time
value: [0.01163054 0.01359677 0.01183581 0.01162839 0.01110625 0.01166606
 0.01094699 0.01067758 0.01141667 0.01115537]

mean value: 0.01156604290008545

key: test_mcc
value: [1.         0.95346259 1.         0.80817439 0.95238095 0.95238095
 0.90649828 0.86333169 0.90238095 0.95238095]

mean value: 0.9290990764599092

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         0.90243902 0.97560976 0.97560976
 0.95121951 0.92682927 0.95121951 0.97560976]

mean value: 0.963472706155633

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 1.         0.90909091 0.97560976 0.97560976
 0.94736842 0.93023256 0.95       0.97560976]

mean value: 0.964026534262227

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         0.86956522 1.         1.
 1.         0.86956522 0.95       0.95238095]

mean value: 0.9596056841709015

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 0.95238095 0.95238095
 0.9        1.         0.95       1.        ]

mean value: 0.9707142857142856

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         0.90119048 0.97619048 0.97619048
 0.95       0.92857143 0.95119048 0.97619048]

mean value: 0.9635714285714285

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 1.         0.83333333 0.95238095 0.95238095
 0.9        0.86956522 0.9047619  0.95238095]

mean value: 0.9319348767174854

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.05552244 0.11581016 0.08573031 0.08327365 0.06015062 0.04256988
 0.04870296 0.08871055 0.0668242  0.07970572]

mean value: 0.07270004749298095

key: score_time
value: [0.02346492 0.02336931 0.0217948  0.02209783 0.01211452 0.01214552
 0.02227259 0.02448988 0.02263021 0.01856112]

mean value: 0.02029407024383545

key: test_mcc
value: [1.         0.90889326 0.8547619  0.90238095 0.90692382 0.80907152
 0.86240942 0.65952381 0.95227002 0.7565654 ]

mean value: 0.861280009770891

key: train_mcc
value: [0.98379816 0.99460913 0.98921825 0.98921825 0.98921825 1.
 0.97849275 0.98384144 0.98921825 0.99462366]

mean value: 0.9892238129029853

key: test_accuracy
value: [1.         0.95238095 0.92682927 0.95121951 0.95121951 0.90243902
 0.92682927 0.82926829 0.97560976 0.87804878]

mean value: 0.9293844367015098

key: train_accuracy
value: [0.99189189 0.9972973  0.99460916 0.99460916 0.99460916 1.
 0.98921833 0.99191375 0.99460916 0.99730458]

mean value: 0.994606250455307

key: test_fscore
value: [1.         0.95       0.92682927 0.95238095 0.95       0.9
 0.91891892 0.82926829 0.97435897 0.87179487]

mean value: 0.9273551278429327

key: train_fscore
value: [0.99191375 0.99728997 0.99459459 0.99459459 0.99459459 1.
 0.98930481 0.9919571  0.99462366 0.99730458]

mean value: 0.9946177658830327

key: test_precision
value: [1.         1.         0.95       0.95238095 1.         0.94736842
 1.         0.80952381 1.         0.89473684]

mean value: 0.9554010025062657

key: train_precision
value: [0.98924731 1.         0.99459459 0.99459459 0.99459459 1.
 0.98404255 0.98930481 0.99462366 1.        ]

mean value: 0.9941002117551433

key: test_recall
value: [1.         0.9047619  0.9047619  0.95238095 0.9047619  0.85714286
 0.85       0.85       0.95       0.85      ]

mean value: 0.9023809523809524

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.99459459 0.99459459 1.
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9951467596628887

key: test_roc_auc
value: [1.         0.95238095 0.92738095 0.95119048 0.95238095 0.90357143
 0.925      0.8297619  0.975      0.87738095]

mean value: 0.9294047619047618

key: train_roc_auc
value: [0.99189189 0.9972973  0.99460913 0.99460913 0.99460913 1.
 0.98920372 0.99190642 0.99460913 0.99731183]

mean value: 0.9946047660563789

key: test_jcc
value: [1.         0.9047619  0.86363636 0.90909091 0.9047619  0.81818182
 0.85       0.70833333 0.95       0.77272727]

mean value: 0.8681493506493506

key: train_jcc
value: [0.98395722 0.99459459 0.98924731 0.98924731 0.98924731 1.
 0.97883598 0.98404255 0.98930481 0.99462366]

mean value: 0.9893100750105474

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01387548 0.01265335 0.01496863 0.0118649  0.01619577 0.01203036
 0.01009297 0.01275992 0.01414347 0.01146722]

mean value: 0.013005208969116212

key: score_time
value: [0.01421428 0.0107317  0.01549006 0.00962782 0.01458716 0.01043081
 0.00931191 0.01516914 0.01056194 0.01001906]

mean value: 0.012014389038085938

key: test_mcc
value: [0.66742381 0.66742381 0.86240942 0.57570364 0.56527676 0.60952381
 0.8047619  0.71121921 0.8047619  0.90238095]

mean value: 0.7170885216941957

key: train_mcc
value: [0.73041298 0.74267016 0.74754216 0.7583352  0.77584118 0.78734807
 0.73148107 0.75792591 0.75307912 0.75412101]

mean value: 0.7538756859843339

key: test_accuracy
value: [0.83333333 0.83333333 0.92682927 0.7804878  0.7804878  0.80487805
 0.90243902 0.85365854 0.90243902 0.95121951]

mean value: 0.8569105691056911

key: train_accuracy
value: [0.86486486 0.87027027 0.87331536 0.8787062  0.88679245 0.89218329
 0.86522911 0.8787062  0.87601078 0.87601078]

mean value: 0.8762089313032709

key: test_fscore
value: [0.8372093  0.82926829 0.93333333 0.80851064 0.8        0.80952381
 0.9        0.85714286 0.9        0.95      ]

mean value: 0.8624988233306381

key: train_fscore
value: [0.86772487 0.875      0.87598945 0.88126649 0.890625   0.89637306
 0.86910995 0.88126649 0.87958115 0.88082902]

mean value: 0.87977654671808

key: test_precision
value: [0.81818182 0.85       0.875      0.73076923 0.75       0.80952381
 0.9        0.81818182 0.9        0.95      ]

mean value: 0.8401656676656677

key: train_precision
value: [0.84974093 0.84422111 0.8556701  0.86082474 0.85929648 0.86069652
 0.84693878 0.86528497 0.85714286 0.85      ]

mean value: 0.8549816490102271

key: test_recall
value: [0.85714286 0.80952381 1.         0.9047619  0.85714286 0.80952381
 0.9        0.9        0.9        0.95      ]

mean value: 0.8888095238095238

key: train_recall
value: [0.88648649 0.90810811 0.8972973  0.9027027  0.92432432 0.93513514
 0.89247312 0.89784946 0.90322581 0.91397849]

mean value: 0.9061580935774485

key: test_roc_auc
value: [0.83333333 0.83333333 0.925      0.77738095 0.77857143 0.8047619
 0.90238095 0.8547619  0.90238095 0.95119048]

mean value: 0.8563095238095237

key: train_roc_auc
value: [0.86486486 0.87027027 0.87337983 0.87877071 0.88689334 0.89229875
 0.86515548 0.87865446 0.87593723 0.87590817]

mean value: 0.8762133100842778

key: test_jcc
value: [0.72       0.70833333 0.875      0.67857143 0.66666667 0.68
 0.81818182 0.75       0.81818182 0.9047619 ]

mean value: 0.761969696969697

key: train_jcc
value: [0.76635514 0.77777778 0.77934272 0.78773585 0.8028169  0.81220657
 0.76851852 0.78773585 0.78504673 0.78703704]

mean value: 0.7854573097788518

MCC on Blind test: 0.62

Accuracy on Blind test: 0.83

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.02087641 0.01695037 0.02051234 0.02458811 0.01624537 0.02017117
 0.0228157  0.018888   0.01997352 0.02516794]

mean value: 0.020618891716003417

key: score_time
value: [0.01020479 0.01029229 0.01238561 0.01190782 0.01196027 0.01182055
 0.01182747 0.01213884 0.01230955 0.0123229 ]

mean value: 0.011717009544372558

key: test_mcc
value: [0.82462113 0.90889326 0.90238095 0.90649828 0.77831178 0.7633652
 0.90649828 0.72229808 0.90649828 0.85441771]

mean value: 0.8473782943864396

key: train_mcc
value: [0.96285167 0.94733093 0.96788166 0.98395676 0.77640338 0.93222307
 0.97849275 0.87315103 0.92718645 0.98927544]

mean value: 0.9338753151800493

key: test_accuracy
value: [0.9047619  0.95238095 0.95121951 0.95121951 0.87804878 0.87804878
 0.95121951 0.85365854 0.95121951 0.92682927]

mean value: 0.9198606271777003

key: train_accuracy
value: [0.98108108 0.97297297 0.98382749 0.99191375 0.87601078 0.96495957
 0.98921833 0.93261456 0.96226415 0.99460916]

mean value: 0.9649471843811467

key: test_fscore
value: [0.91304348 0.95454545 0.95238095 0.95454545 0.89361702 0.88888889
 0.94736842 0.86363636 0.94736842 0.92307692]

mean value: 0.9238471378716766

key: train_fscore
value: [0.98143236 0.97368421 0.98395722 0.9919571  0.88942308 0.96605744
 0.98930481 0.93702771 0.96089385 0.99465241]

mean value: 0.9668390195062844

key: test_precision
value: [0.84       0.91304348 0.95238095 0.91304348 0.80769231 0.83333333
 1.         0.79166667 1.         0.94736842]

mean value: 0.899852863764763

key: train_precision
value: [0.96354167 0.94871795 0.97354497 0.98404255 0.8008658  0.93434343
 0.98404255 0.88151659 1.         0.9893617 ]

mean value: 0.9459977220327187

key: test_recall
value: [1.         1.         0.95238095 1.         1.         0.95238095
 0.9        0.95       0.9        0.9       ]

mean value: 0.9554761904761905

key: train_recall
value: [1.         1.         0.99459459 1.         1.         1.
 0.99462366 1.         0.92473118 1.        ]

mean value: 0.9913949433304272

key: test_roc_auc
value: [0.9047619  0.95238095 0.95119048 0.95       0.875      0.87619048
 0.95       0.85595238 0.95       0.92619048]

mean value: 0.9191666666666667

key: train_roc_auc
value: [0.98108108 0.97297297 0.98385644 0.99193548 0.87634409 0.96505376
 0.98920372 0.93243243 0.96236559 0.99459459]

mean value: 0.9649840162743388

key: test_jcc
value: [0.84       0.91304348 0.90909091 0.91304348 0.80769231 0.8
 0.9        0.76       0.9        0.85714286]

mean value: 0.8600013030447813

key: train_jcc
value: [0.96354167 0.94871795 0.96842105 0.98404255 0.8008658  0.93434343
 0.97883598 0.88151659 0.92473118 0.9893617 ]

mean value: 0.9374377907853981

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01795006 0.0186522  0.01706672 0.01866984 0.01695228 0.01767445
 0.01763296 0.01684928 0.01647019 0.01648283]

mean value: 0.017440080642700195

key: score_time
value: [0.01651978 0.01239181 0.01181412 0.01196337 0.01207924 0.01195359
 0.01196837 0.01217914 0.01193547 0.01189327]

mean value: 0.012469816207885741

key: test_mcc
value: [0.8660254  0.5956834  0.86333169 0.86240942 0.95227002 0.75714286
 0.95227002 0.38363297 0.8547619  0.7197263 ]

mean value: 0.7807253975409502

key: train_mcc
value: [0.97837838 0.57110846 0.9048433  0.97866529 0.93728335 0.98384191
 0.95737027 0.77195645 0.92138789 0.93618785]

mean value: 0.8941023146571723

key: test_accuracy
value: [0.92857143 0.76190476 0.92682927 0.92682927 0.97560976 0.87804878
 0.97560976 0.68292683 0.92682927 0.85365854]

mean value: 0.8836817653890825

key: train_accuracy
value: [0.98918919 0.74594595 0.95148248 0.98921833 0.96765499 0.99191375
 0.97843666 0.87331536 0.95956873 0.96765499]

mean value: 0.9414380418154003

key: test_fscore
value: [0.93333333 0.80769231 0.92307692 0.93333333 0.97674419 0.87804878
 0.97435897 0.60606061 0.92682927 0.83333333]

mean value: 0.879281104601581

key: train_fscore
value: [0.98918919 0.79741379 0.94972067 0.98930481 0.96858639 0.99191375
 0.97883598 0.85538462 0.96103896 0.96703297]

mean value: 0.9448421121875729

key: test_precision
value: [0.875      0.67741935 1.         0.875      0.95454545 0.9
 1.         0.76923077 0.9047619  0.9375    ]

mean value: 0.8893457483376839

key: train_precision
value: [0.98918919 0.66308244 0.98265896 0.97883598 0.93908629 0.98924731
 0.96354167 1.         0.92964824 0.98876404]

mean value: 0.9424054123899444

key: test_recall
value: [1.         1.         0.85714286 1.         1.         0.85714286
 0.95       0.5        0.95       0.75      ]

mean value: 0.8864285714285715

key: train_recall
value: [0.98918919 1.         0.91891892 1.         1.         0.99459459
 0.99462366 0.74731183 0.99462366 0.94623656]

mean value: 0.9585498401627434

key: test_roc_auc
value: [0.92857143 0.76190476 0.92857143 0.925      0.975      0.87857143
 0.975      0.67857143 0.92738095 0.85119048]

mean value: 0.8829761904761905

key: train_roc_auc
value: [0.98918919 0.74594595 0.95139494 0.98924731 0.96774194 0.99192095
 0.97839291 0.87365591 0.95947399 0.96771287]

mean value: 0.941467596628887

key: test_jcc
value: [0.875      0.67741935 0.85714286 0.875      0.95454545 0.7826087
 0.95       0.43478261 0.86363636 0.71428571]

mean value: 0.7984421048796926

key: train_jcc
value: [0.97860963 0.66308244 0.90425532 0.97883598 0.93908629 0.98395722
 0.95854922 0.74731183 0.925      0.93617021]

mean value: 0.9014858138117805

MCC on Blind test: 0.67

Accuracy on Blind test: 0.83

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.20878315 0.1760838  0.18187737 0.17748427 0.20082045 0.19634724
 0.19840193 0.17302775 0.16932535 0.17095423]

mean value: 0.18531055450439454

key: score_time
value: [0.01604414 0.01630116 0.01627135 0.01687264 0.01652217 0.02443409
 0.014992   0.01500082 0.01513028 0.01563883]

mean value: 0.01672074794769287

key: test_mcc
value: [1.         0.95346259 1.         0.7633652  0.95238095 0.90238095
 1.         0.70714286 1.         0.95238095]

mean value: 0.9231113500473697

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         0.87804878 0.97560976 0.95121951
 1.         0.85365854 1.         0.97560976]

mean value: 0.9610336817653891

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 1.         0.88888889 0.97560976 0.95238095
 1.         0.85       1.         0.97560976]

mean value: 0.9619233539511475

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         0.83333333 1.         0.95238095
 1.         0.85       1.         0.95238095]

mean value: 0.9542640692640693

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 0.95238095 0.95238095
 1.         0.85       1.         1.        ]

mean value: 0.9707142857142856

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         0.87619048 0.97619048 0.95119048
 1.         0.85357143 1.         0.97619048]

mean value: 0.960952380952381

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 1.         0.8        0.95238095 0.90909091
 1.         0.73913043 1.         0.95238095]

mean value: 0.9307528703180877

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.87

Accuracy on Blind test: 0.94

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.06194448 0.06513834 0.06491184 0.06082225 0.07302451 0.05870032
 0.05307627 0.06909204 0.07765031 0.0584054 ]

mean value: 0.06427657604217529

key: score_time
value: [0.01991296 0.02684236 0.03728247 0.02274609 0.03553915 0.02886772
 0.03747249 0.0306232  0.03088164 0.0238471 ]

mean value: 0.029401516914367674

key: test_mcc
value: [1.         0.9047619  0.95238095 0.81975606 0.90692382 0.8547619
 0.90649828 0.86333169 0.90238095 0.95238095]

mean value: 0.9063176525865053

key: train_mcc
value: [0.98918919 0.989247   0.98921825 0.9946235  0.99462366 0.98921825
 1.         0.9946235  0.99462366 0.99462366]

mean value: 0.9929990657442348

key: test_accuracy
value: [1.         0.95238095 0.97560976 0.90243902 0.95121951 0.92682927
 0.95121951 0.92682927 0.95121951 0.97560976]

mean value: 0.951335656213705

key: train_accuracy
value: [0.99459459 0.99459459 0.99460916 0.99730458 0.99730458 0.99460916
 1.         0.99730458 0.99730458 0.99730458]

mean value: 0.9964930429081372

key: test_fscore
value: [1.         0.95238095 0.97560976 0.91304348 0.95       0.92682927
 0.94736842 0.93023256 0.95       0.97560976]

mean value: 0.9521074190321793

key: train_fscore
value: [0.99459459 0.99462366 0.99459459 0.99728997 0.99730458 0.99459459
 1.         0.99731903 0.99730458 0.99730458]

mean value: 0.9964930194080766

key: test_precision
value: [1.         0.95238095 1.         0.84       1.         0.95
 1.         0.86956522 0.95       0.95238095]

mean value: 0.9514327122153209

key: train_precision
value: [0.99459459 0.98930481 0.99459459 1.         0.99462366 0.99459459
 1.         0.99465241 1.         1.        ]

mean value: 0.9962364658949099

key: test_recall
value: [1.         0.95238095 0.95238095 1.         0.9047619  0.9047619
 0.9        1.         0.95       1.        ]

mean value: 0.9564285714285714

key: train_recall
value: [0.99459459 1.         0.99459459 0.99459459 1.         0.99459459
 1.         1.         0.99462366 0.99462366]

mean value: 0.9967625690206335

key: test_roc_auc
value: [1.         0.95238095 0.97619048 0.9        0.95238095 0.92738095
 0.95       0.92857143 0.95119048 0.97619048]

mean value: 0.9514285714285714

key: train_roc_auc
value: [0.99459459 0.99459459 0.99460913 0.9972973  0.99731183 0.99460913
 1.         0.9972973  0.99731183 0.99731183]

mean value: 0.9964937518163325

key: test_jcc
value: [1.         0.90909091 0.95238095 0.84       0.9047619  0.86363636
 0.9        0.86956522 0.9047619  0.95238095]

mean value: 0.9096578204404291

key: train_jcc
value: [0.98924731 0.98930481 0.98924731 0.99459459 0.99462366 0.98924731
 1.         0.99465241 0.99462366 0.99462366]

mean value: 0.9930164717071738

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.18906832 0.13127685 0.1263175  0.1358242  0.13411975 0.15867543
 0.19670749 0.14597034 0.14964938 0.18066359]

mean value: 0.15482728481292723

key: score_time
value: [0.02278852 0.02293873 0.02779603 0.02318168 0.02501178 0.0236268
 0.02434683 0.0294075  0.02881098 0.0234859 ]

mean value: 0.025139474868774415

key: test_mcc
value: [0.66742381 0.66742381 0.56190476 0.60952381 0.85441771 0.56190476
 0.66432098 0.56086079 0.70714286 0.8047619 ]

mean value: 0.6659685192278054

key: train_mcc
value: [0.98379816 0.99460913 0.98395537 0.98395537 0.98921825 1.
 0.9946235  0.978494   0.98927606 0.98384191]

mean value: 0.9881771737535285

key: test_accuracy
value: [0.83333333 0.83333333 0.7804878  0.80487805 0.92682927 0.7804878
 0.82926829 0.7804878  0.85365854 0.90243902]

mean value: 0.832520325203252

key: train_accuracy
value: [0.99189189 0.9972973  0.99191375 0.99191375 0.99460916 1.
 0.99730458 0.98921833 0.99460916 0.99191375]

mean value: 0.9940671668973555

key: test_fscore
value: [0.8372093  0.82926829 0.7804878  0.80952381 0.93023256 0.7804878
 0.81081081 0.76923077 0.85       0.9       ]

mean value: 0.829725115246953

key: train_fscore
value: [0.99186992 0.99728997 0.99182561 0.99182561 0.99459459 1.
 0.99731903 0.98918919 0.99459459 0.99191375]

mean value: 0.9940422277618608

key: test_precision
value: [0.81818182 0.85       0.8        0.80952381 0.90909091 0.8
 0.88235294 0.78947368 0.85       0.9       ]

mean value: 0.8408623162183534

key: train_precision
value: [0.99456522 1.         1.         1.         0.99459459 1.
 0.99465241 0.99456522 1.         0.99459459]

mean value: 0.997297203038891

key: test_recall
value: [0.85714286 0.80952381 0.76190476 0.80952381 0.95238095 0.76190476
 0.75       0.75       0.85       0.9       ]

mean value: 0.8202380952380952

key: train_recall
value: [0.98918919 0.99459459 0.98378378 0.98378378 0.99459459 1.
 1.         0.98387097 0.98924731 0.98924731]

mean value: 0.9908311537343796

key: test_roc_auc
value: [0.83333333 0.83333333 0.78095238 0.8047619  0.92619048 0.78095238
 0.82738095 0.7797619  0.85357143 0.90238095]

mean value: 0.8322619047619048

key: train_roc_auc
value: [0.99189189 0.9972973  0.99189189 0.99189189 0.99460913 1.
 0.9972973  0.98923278 0.99462366 0.99192095]

mean value: 0.9940656785818076

key: test_jcc
value: [0.72       0.70833333 0.64       0.68       0.86956522 0.64
 0.68181818 0.625      0.73913043 0.81818182]

mean value: 0.7122028985507246

key: train_jcc
value: [0.98387097 0.99459459 0.98378378 0.98378378 0.98924731 1.
 0.99465241 0.97860963 0.98924731 0.98395722]

mean value: 0.988174700489691

MCC on Blind test: 0.46

Accuracy on Blind test: 0.75

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.75021386 0.74320316 0.67933774 0.65103602 0.65639186 0.61880779
 0.64705563 0.64216733 0.64391065 0.64939094]

mean value: 0.6681514978408813

key: score_time
value: [0.01284075 0.00922751 0.00918388 0.00923729 0.00981402 0.00916934
 0.00927424 0.00915313 0.0092957  0.00910234]

mean value: 0.00962982177734375

key: test_mcc
value: [1.         0.95346259 1.         0.80817439 0.95238095 0.95238095
 0.90649828 0.86333169 0.90238095 0.95238095]

mean value: 0.9290990764599092

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         0.90243902 0.97560976 0.97560976
 0.95121951 0.92682927 0.95121951 0.97560976]

mean value: 0.963472706155633

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 1.         0.90909091 0.97560976 0.97560976
 0.94736842 0.93023256 0.95       0.97560976]

mean value: 0.964026534262227

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         0.86956522 1.         1.
 1.         0.86956522 0.95       0.95238095]

mean value: 0.9596056841709015

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 0.95238095 0.95238095
 0.9        1.         0.95       1.        ]

mean value: 0.9707142857142856

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         0.90119048 0.97619048 0.97619048
 0.95       0.92857143 0.95119048 0.97619048]

mean value: 0.9635714285714285

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 1.         0.83333333 0.95238095 0.95238095
 0.9        0.86956522 0.9047619  0.95238095]

mean value: 0.9319348767174854

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.03049493 0.03051066 0.03970337 0.02893615 0.02855396 0.02817822
 0.02765751 0.02851939 0.02899027 0.02894664]

mean value: 0.03004910945892334

key: score_time
value: [0.0124836  0.01509047 0.02614284 0.01539373 0.02472258 0.01539445
 0.01547289 0.01532865 0.01533842 0.01527309]

mean value: 0.01706407070159912

key: test_mcc
value: [0.76980036 0.76277007 0.57570364 0.58066054 0.62325386 0.61152662
 0.75714286 0.60952381 0.8547619  0.7098505 ]

mean value: 0.6854994165877901

key: train_mcc
value: [0.91196665 0.93710863 0.96294605 0.91714558 0.91217304 0.88278505
 1.         0.98395676 0.97866529 0.94236768]

mean value: 0.9429114712067205

key: test_accuracy
value: [0.88095238 0.88095238 0.7804878  0.7804878  0.80487805 0.80487805
 0.87804878 0.80487805 0.92682927 0.85365854]

mean value: 0.8396051103368176

key: train_accuracy
value: [0.95405405 0.96756757 0.98113208 0.95687332 0.9541779  0.93800539
 1.         0.99191375 0.98921833 0.9703504 ]

mean value: 0.9703292780651271

key: test_fscore
value: [0.87179487 0.88372093 0.80851064 0.75675676 0.78947368 0.81818182
 0.87804878 0.8        0.92682927 0.84210526]

mean value: 0.8375422011412786

key: train_fscore
value: [0.95184136 0.96648045 0.98071625 0.95480226 0.95184136 0.93371758
 1.         0.99186992 0.98913043 0.96952909]

mean value: 0.9689928698409741

key: test_precision
value: [0.94444444 0.86363636 0.73076923 0.875      0.88235294 0.7826087
 0.85714286 0.8        0.9047619  0.88888889]

mean value: 0.8529605326472334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.80952381 0.9047619  0.9047619  0.66666667 0.71428571 0.85714286
 0.9        0.8        0.95       0.8       ]

mean value: 0.8307142857142857

key: train_recall
value: [0.90810811 0.93513514 0.96216216 0.91351351 0.90810811 0.87567568
 1.         0.98387097 0.97849462 0.94086022]

mean value: 0.9405928509154315

key: test_roc_auc
value: [0.88095238 0.88095238 0.77738095 0.78333333 0.80714286 0.80357143
 0.87857143 0.8047619  0.92738095 0.85238095]

mean value: 0.8396428571428571

key: train_roc_auc
value: [0.95405405 0.96756757 0.98108108 0.95675676 0.95405405 0.93783784
 1.         0.99193548 0.98924731 0.97043011]

mean value: 0.9702964254577158

key: test_jcc
value: [0.77272727 0.79166667 0.67857143 0.60869565 0.65217391 0.69230769
 0.7826087  0.66666667 0.86363636 0.72727273]

mean value: 0.7236327078718383

key: train_jcc
value: [0.90810811 0.93513514 0.96216216 0.91351351 0.90810811 0.87567568
 1.         0.98387097 0.97849462 0.94086022]

mean value: 0.9405928509154315

MCC on Blind test: 0.21

Accuracy on Blind test: 0.66

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01553535 0.01531935 0.01529002 0.0152123  0.01528549 0.01499939
 0.03626537 0.03717756 0.02597475 0.01628184]

mean value: 0.020734143257141114

key: score_time
value: [0.01195645 0.01196551 0.01195502 0.01191282 0.01187205 0.01197076
 0.02354455 0.02118874 0.01196527 0.01193905]

mean value: 0.014027023315429687

key: test_mcc
value: [0.81322028 0.95346259 0.95238095 0.90649828 0.95227002 0.85441771
 0.95227002 0.75714286 0.90649828 0.90238095]

mean value: 0.8950541932543747

key: train_mcc
value: [0.97310093 0.96779381 0.96788166 0.96239138 0.96261632 0.9734012
 0.96261094 0.97849275 0.96787795 0.96787795]

mean value: 0.9684044882348298

key: test_accuracy
value: [0.9047619  0.97619048 0.97560976 0.95121951 0.97560976 0.92682927
 0.97560976 0.87804878 0.95121951 0.95121951]

mean value: 0.9466318234610918

key: train_accuracy
value: [0.98648649 0.98378378 0.98382749 0.98113208 0.98113208 0.98652291
 0.98113208 0.98921833 0.98382749 0.98382749]

mean value: 0.9840890216361915

key: test_fscore
value: [0.90909091 0.97674419 0.97560976 0.95454545 0.97674419 0.93023256
 0.97435897 0.87804878 0.94736842 0.95      ]

mean value: 0.9472743225865894

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98659517 0.98395722 0.98395722 0.98123324 0.98133333 0.98666667
 0.98143236 0.98930481 0.98404255 0.98404255]

mean value: 0.9842565136693145

key: test_precision
value: [0.86956522 0.95454545 1.         0.91304348 0.95454545 0.90909091
 1.         0.85714286 1.         0.95      ]

mean value: 0.9407933370976849

key: train_precision
value: [0.9787234  0.97354497 0.97354497 0.97340426 0.96842105 0.97368421
 0.96858639 0.98404255 0.97368421 0.97368421]

mean value: 0.9741320231500986

key: test_recall
value: [0.95238095 1.         0.95238095 1.         1.         0.95238095
 0.95       0.9        0.9        0.95      ]

mean value: 0.9557142857142857

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.98918919 0.99459459 1.
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9946062191223481

key: test_roc_auc
value: [0.9047619  0.97619048 0.97619048 0.95       0.975      0.92619048
 0.975      0.87857143 0.95       0.95119048]

mean value: 0.9463095238095238

key: train_roc_auc
value: [0.98648649 0.98378378 0.98385644 0.98115373 0.98116827 0.98655914
 0.98109561 0.98920372 0.98379831 0.98379831]

mean value: 0.9840903807032839

key: test_jcc
value: [0.83333333 0.95454545 0.95238095 0.91304348 0.95454545 0.86956522
 0.95       0.7826087  0.9        0.9047619 ]

mean value: 0.9014784490871447

key: train_jcc
value: [0.97354497 0.96842105 0.96842105 0.96315789 0.96335079 0.97368421
 0.96354167 0.97883598 0.96858639 0.96858639]

mean value: 0.9690130389783359

MCC on Blind test: 0.81

Accuracy on Blind test: 0.92

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.16026402 0.28309011 0.2706182  0.31864095 0.2611227  0.25655651
 0.25506783 0.29842615 0.24984217 0.2597127 ]

mean value: 0.26133413314819337

key: score_time
value: [0.02092099 0.02027893 0.02040696 0.02240181 0.02012062 0.02204394
 0.02170444 0.02185845 0.02282166 0.02177119]

mean value: 0.021432900428771974

key: test_mcc
value: [0.95346259 0.9047619  0.95238095 0.90238095 0.95238095 0.90238095
 0.95227002 0.60952381 0.95227002 0.80817439]

mean value: 0.8889986535315402

key: train_mcc
value: [0.97843556 0.98918919 0.96788166 0.98384191 0.98384191 0.98395676
 0.97849275 0.98921825 0.97849275 0.98384144]

mean value: 0.9817192164189735

key: test_accuracy
value: [0.97619048 0.95238095 0.97560976 0.95121951 0.97560976 0.95121951
 0.97560976 0.80487805 0.97560976 0.90243902]

mean value: 0.9440766550522648

key: train_accuracy
value: [0.98918919 0.99459459 0.98382749 0.99191375 0.99191375 0.99191375
 0.98921833 0.99460916 0.98921833 0.99191375]

mean value: 0.9908312085670576

key: test_fscore
value: [0.97674419 0.95238095 0.97560976 0.95238095 0.97560976 0.95238095
 0.97435897 0.8        0.97435897 0.89473684]

mean value: 0.9428561346207702

key: train_fscore
value: [0.98924731 0.99459459 0.98395722 0.99191375 0.99191375 0.9919571
 0.98930481 0.99462366 0.98930481 0.9919571 ]

mean value: 0.9908774109633054

key: test_precision
value: [0.95454545 0.95238095 1.         0.95238095 1.         0.95238095
 1.         0.8        1.         0.94444444]

mean value: 0.9556132756132756

key: train_precision
value: [0.98395722 0.99459459 0.97354497 0.98924731 0.98924731 0.98404255
 0.98404255 0.99462366 0.98404255 0.98930481]

mean value: 0.9866647539369491

key: test_recall
value: [1.         0.95238095 0.95238095 0.95238095 0.95238095 0.95238095
 0.95       0.8        0.95       0.85      ]

mean value: 0.9311904761904761

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.99459459 0.99459459 1.
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9951467596628887

key: test_roc_auc
value: [0.97619048 0.95238095 0.97619048 0.95119048 0.97619048 0.95119048
 0.975      0.8047619  0.975      0.90119048]

mean value: 0.9439285714285715

key: train_roc_auc
value: [0.98918919 0.99459459 0.98385644 0.99192095 0.99192095 0.99193548
 0.98920372 0.99460913 0.98920372 0.99190642]

mean value: 0.9908340598663179

key: test_jcc
value: [0.95454545 0.90909091 0.95238095 0.90909091 0.95238095 0.90909091
 0.95       0.66666667 0.95       0.80952381]

mean value: 0.8962770562770562

key: train_jcc
value: [0.9787234  0.98924731 0.96842105 0.98395722 0.98395722 0.98404255
 0.97883598 0.98930481 0.97883598 0.98404255]

mean value: 0.981936808410669

MCC on Blind test: 0.81

Accuracy on Blind test: 0.92

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02619624 0.03526211 0.03441119 0.03048706 0.0443604  0.03741527
 0.0345211  0.0420258  0.04124093 0.04728723]

mean value: 0.037320733070373535

key: score_time
value: [0.00984049 0.01518893 0.01197791 0.01190042 0.01453376 0.01460385
 0.01199198 0.01444697 0.01451516 0.01206589]

mean value: 0.013106536865234376

key: test_mcc
value: [0.85811633 0.90889326 0.90238095 0.80817439 0.90692382 0.8047619
 0.90238095 0.65952381 0.71121921 0.7565654 ]

mean value: 0.8218940034262068

key: train_mcc
value: [0.9135669  0.8974153  0.90317988 0.87612986 0.89239625 0.91380162
 0.92993112 0.92452775 0.89264025 0.90846996]

mean value: 0.9052058897197073

key: test_accuracy
value: [0.92857143 0.95238095 0.95121951 0.90243902 0.95121951 0.90243902
 0.95121951 0.82926829 0.85365854 0.87804878]

mean value: 0.9100464576074332

key: train_accuracy
value: [0.95675676 0.94864865 0.95148248 0.93800539 0.94609164 0.95687332
 0.96495957 0.96226415 0.94609164 0.9541779 ]

mean value: 0.952535149704961

key: test_fscore
value: [0.92682927 0.95454545 0.95238095 0.90909091 0.95       0.9047619
 0.95       0.82926829 0.85714286 0.87179487]

mean value: 0.9105814510692559

key: train_fscore
value: [0.95698925 0.94906166 0.95187166 0.9383378  0.94652406 0.95698925
 0.96514745 0.96236559 0.94708995 0.95466667]

mean value: 0.9529043338593334

key: test_precision
value: [0.95       0.91304348 0.95238095 0.86956522 1.         0.9047619
 0.95       0.80952381 0.81818182 0.89473684]

mean value: 0.9062194022605922

key: train_precision
value: [0.95187166 0.94148936 0.94179894 0.93085106 0.93650794 0.95187166
 0.96256684 0.96236559 0.93229167 0.94708995]

mean value: 0.9458704669421064

key: test_recall
value: [0.9047619  1.         0.95238095 0.95238095 0.9047619  0.9047619
 0.95       0.85       0.9        0.85      ]

mean value: 0.9169047619047619

key: train_recall
value: [0.96216216 0.95675676 0.96216216 0.94594595 0.95675676 0.96216216
 0.96774194 0.96236559 0.96236559 0.96236559]

mean value: 0.9600784655623366

key: test_roc_auc
value: [0.92857143 0.95238095 0.95119048 0.90119048 0.95238095 0.90238095
 0.95119048 0.8297619  0.8547619  0.87738095]

mean value: 0.9101190476190476

key: train_roc_auc
value: [0.95675676 0.94864865 0.95151119 0.93802674 0.94612031 0.95688753
 0.96495205 0.96226388 0.94604766 0.95415577]

mean value: 0.9525370531822145

key: test_jcc
value: [0.86363636 0.91304348 0.90909091 0.83333333 0.9047619  0.82608696
 0.9047619  0.70833333 0.75       0.77272727]

mean value: 0.838577545642763

key: train_jcc
value: [0.91752577 0.90306122 0.90816327 0.88383838 0.89847716 0.91752577
 0.93264249 0.92746114 0.89949749 0.91326531]

mean value: 0.9101457997889101

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.94551182 1.15909123 1.07203174 1.16645551 0.98121929 1.43863583
 1.14272809 1.11828685 1.16250324 1.51433372]

mean value: 1.170079731941223

key: score_time
value: [0.01489878 0.01527166 0.01541162 0.01551175 0.01527882 0.0123558
 0.01570058 0.01530385 0.01655555 0.01252007]

mean value: 0.014880847930908204

key: test_mcc
value: [0.95346259 0.95346259 0.95238095 1.         1.         0.8047619
 0.95227002 0.61969655 0.90649828 0.86240942]

mean value: 0.9004942292006795

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97619048 0.97619048 0.97560976 1.         1.         0.90243902
 0.97560976 0.80487805 0.95121951 0.92682927]

mean value: 0.9488966318234611

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 0.97674419 0.97560976 1.         1.         0.9047619
 0.97435897 0.77777778 0.94736842 0.91891892]

mean value: 0.9451149695111841

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         1.         1.         0.9047619
 1.         0.875      1.         1.        ]

mean value: 0.9734307359307359

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         0.95238095 1.         1.         0.9047619
 0.95       0.7        0.9        0.85      ]

mean value: 0.920952380952381

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 0.97619048 0.97619048 1.         1.         0.90238095
 0.975      0.80238095 0.95       0.925     ]

mean value: 0.9483333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 0.95454545 0.95238095 1.         1.         0.82608696
 0.95       0.63636364 0.9        0.85      ]

mean value: 0.9021757952192735

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01404858 0.01055026 0.01084566 0.00979495 0.01662517 0.01392055
 0.01371741 0.00998712 0.01174212 0.01603532]

mean value: 0.012726712226867675

key: score_time
value: [0.01750827 0.00957966 0.00959706 0.00905037 0.01484847 0.01271057
 0.01184988 0.00949311 0.01281643 0.01073074]

mean value: 0.011818456649780273

key: test_mcc
value: [0.47673129 0.43656413 0.57570364 0.36718832 0.86240942 0.66432098
 0.38060103 0.37171226 0.58066054 0.6133669 ]

mean value: 0.5329258503356614

key: train_mcc
value: [0.60428805 0.56402679 0.57872595 0.56557984 0.58046478 0.60326615
 0.60448508 0.57118216 0.6277874  0.64661665]

mean value: 0.5946422853951159

key: test_accuracy
value: [0.73809524 0.71428571 0.7804878  0.68292683 0.92682927 0.82926829
 0.68292683 0.68292683 0.7804878  0.80487805]

mean value: 0.7623112659698026

key: train_accuracy
value: [0.8        0.77567568 0.78706199 0.77897574 0.78706199 0.79784367
 0.80053908 0.78167116 0.81132075 0.81940701]

mean value: 0.7939557077292926

key: test_fscore
value: [0.74418605 0.73913043 0.80851064 0.71111111 0.93333333 0.84444444
 0.71111111 0.69767442 0.8        0.80952381]

mean value: 0.779902534772057

key: train_fscore
value: [0.81122449 0.79706601 0.79898219 0.795      0.80100756 0.81203008
 0.81122449 0.79900744 0.82323232 0.83291771]

mean value: 0.808169228755668

key: test_precision
value: [0.72727273 0.68       0.73076923 0.66666667 0.875      0.79166667
 0.64       0.65217391 0.72       0.77272727]

mean value: 0.7256276477146042

key: train_precision
value: [0.76811594 0.72767857 0.75480769 0.73953488 0.75       0.75700935
 0.77184466 0.74193548 0.77619048 0.77674419]

mean value: 0.7563861241582702

key: test_recall
value: [0.76190476 0.80952381 0.9047619  0.76190476 1.         0.9047619
 0.8        0.75       0.9        0.85      ]

mean value: 0.8442857142857143

key: train_recall
value: [0.85945946 0.88108108 0.84864865 0.85945946 0.85945946 0.87567568
 0.85483871 0.8655914  0.87634409 0.89784946]

mean value: 0.8678407439697762

key: test_roc_auc
value: [0.73809524 0.71428571 0.77738095 0.68095238 0.925      0.82738095
 0.68571429 0.68452381 0.78333333 0.80595238]

mean value: 0.7622619047619048

key: train_roc_auc
value: [0.8        0.77567568 0.78722755 0.7791921  0.78725661 0.79805289
 0.80039233 0.78144435 0.81114502 0.819195  ]

mean value: 0.7939581517000872

key: test_jcc
value: [0.59259259 0.5862069  0.67857143 0.55172414 0.875      0.73076923
 0.55172414 0.53571429 0.66666667 0.68      ]

mean value: 0.6448969376727998

key: train_jcc
value: [0.68240343 0.66260163 0.66525424 0.65975104 0.66806723 0.6835443
 0.68240343 0.66528926 0.69957082 0.71367521]

mean value: 0.6782560583614013

MCC on Blind test: 0.47

Accuracy on Blind test: 0.76

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00981188 0.01756334 0.01116681 0.00998116 0.01008105 0.01002264
 0.01777577 0.01780438 0.01079798 0.00976753]

mean value: 0.012477254867553711

key: score_time
value: [0.00883985 0.01474833 0.0101068  0.00893354 0.00892401 0.00922704
 0.01536894 0.00982237 0.00946569 0.00879431]

mean value: 0.010423088073730468

key: test_mcc
value: [0.52620136 0.4472136  0.66668392 0.7197263  0.80907152 0.41963703
 0.46428571 0.6133669  0.51190476 0.6133669 ]

mean value: 0.5791458007819006

key: train_mcc
value: [0.6606283  0.64358181 0.66088006 0.6661434  0.65009172 0.67407311
 0.66454603 0.65225276 0.67687355 0.64866961]

mean value: 0.6597740339245326

key: test_accuracy
value: [0.76190476 0.71428571 0.82926829 0.85365854 0.90243902 0.70731707
 0.73170732 0.80487805 0.75609756 0.80487805]

mean value: 0.7866434378629501

key: train_accuracy
value: [0.82972973 0.82162162 0.83018868 0.8328841  0.82479784 0.83557951
 0.83018868 0.82479784 0.83827493 0.82210243]

mean value: 0.8290165367523858

key: test_fscore
value: [0.77272727 0.75       0.82051282 0.86956522 0.9        0.73913043
 0.73170732 0.80952381 0.75       0.80952381]

mean value: 0.7952690681534796

key: train_fscore
value: [0.83464567 0.82446809 0.83289125 0.83510638 0.82758621 0.84237726
 0.83969466 0.83290488 0.84126984 0.83248731]

mean value: 0.8343431543661086

key: test_precision
value: [0.73913043 0.66666667 0.88888889 0.8        0.94736842 0.68
 0.71428571 0.77272727 0.75       0.77272727]

mean value: 0.7731794671131056

key: train_precision
value: [0.81122449 0.81151832 0.81770833 0.82198953 0.8125     0.80693069
 0.79710145 0.79802956 0.828125   0.78846154]

mean value: 0.8093588913988847

key: test_recall
value: [0.80952381 0.85714286 0.76190476 0.95238095 0.85714286 0.80952381
 0.75       0.85       0.75       0.85      ]

mean value: 0.8247619047619047

key: train_recall
value: [0.85945946 0.83783784 0.84864865 0.84864865 0.84324324 0.88108108
 0.88709677 0.87096774 0.85483871 0.88172043]

mean value: 0.8613542574832898

key: test_roc_auc
value: [0.76190476 0.71428571 0.83095238 0.85119048 0.90357143 0.7047619
 0.73214286 0.80595238 0.75595238 0.80595238]

mean value: 0.7866666666666666

key: train_roc_auc
value: [0.82972973 0.82162162 0.8302383  0.83292647 0.82484743 0.83570183
 0.83003487 0.82467306 0.83823017 0.8219413 ]

mean value: 0.828994478349317

key: test_jcc
value: [0.62962963 0.6        0.69565217 0.76923077 0.81818182 0.5862069
 0.57692308 0.68       0.6        0.68      ]

mean value: 0.6635824364430062

key: train_jcc
value: [0.71621622 0.70135747 0.71363636 0.71689498 0.70588235 0.72767857
 0.72368421 0.71365639 0.7260274  0.71304348]

mean value: 0.7158077421167284

MCC on Blind test: 0.6

Accuracy on Blind test: 0.81

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00926328 0.0102694  0.01025152 0.01143193 0.01130676 0.01162267
 0.01116467 0.01086497 0.01144218 0.01224303]

mean value: 0.010986042022705079

key: score_time
value: [0.01719236 0.01383495 0.01737165 0.0197041  0.02015567 0.01965952
 0.01807666 0.01768923 0.02050591 0.02003956]

mean value: 0.018422961235046387

key: test_mcc
value: [0.38138504 0.33485541 0.52420964 0.51551459 0.65871309 0.66668392
 0.53206577 0.36515617 0.27179142 0.2681441 ]

mean value: 0.451851915569425

key: train_mcc
value: [0.65140475 0.63994485 0.65225276 0.65953152 0.62870716 0.6516517
 0.65692704 0.65817862 0.60723254 0.62568858]

mean value: 0.6431519525144919

key: test_accuracy
value: [0.69047619 0.66666667 0.75609756 0.75609756 0.82926829 0.82926829
 0.75609756 0.68292683 0.63414634 0.63414634]

mean value: 0.7235191637630662

key: train_accuracy
value: [0.82432432 0.81891892 0.82479784 0.82749326 0.81132075 0.82479784
 0.82749326 0.82749326 0.8032345  0.81132075]

mean value: 0.8201194725723028

key: test_fscore
value: [0.68292683 0.65       0.73684211 0.75       0.8372093  0.82051282
 0.70588235 0.66666667 0.57142857 0.59459459]

mean value: 0.7016063243000862

key: train_fscore
value: [0.81586402 0.81126761 0.81586402 0.81609195 0.79651163 0.81690141
 0.82122905 0.81920904 0.79889807 0.80225989]

mean value: 0.8114096689798598

key: test_precision
value: [0.7        0.68421053 0.82352941 0.78947368 0.81818182 0.88888889
 0.85714286 0.68421053 0.66666667 0.64705882]

mean value: 0.7559363203016454

key: train_precision
value: [0.85714286 0.84705882 0.85714286 0.87116564 0.86163522 0.85294118
 0.85465116 0.86309524 0.81920904 0.8452381 ]

mean value: 0.8529280114255333

key: test_recall
value: [0.66666667 0.61904762 0.66666667 0.71428571 0.85714286 0.76190476
 0.6        0.65       0.5        0.55      ]

mean value: 0.6585714285714286

key: train_recall
value: [0.77837838 0.77837838 0.77837838 0.76756757 0.74054054 0.78378378
 0.79032258 0.77956989 0.77956989 0.76344086]

mean value: 0.7739930252833479

key: test_roc_auc
value: [0.69047619 0.66666667 0.75833333 0.75714286 0.82857143 0.83095238
 0.75238095 0.68214286 0.63095238 0.63214286]

mean value: 0.7229761904761904

key: train_roc_auc
value: [0.82432432 0.81891892 0.82467306 0.82733217 0.81113049 0.82468759
 0.82759372 0.82762278 0.80329846 0.81145016]

mean value: 0.8201031676838129

key: test_jcc
value: [0.51851852 0.48148148 0.58333333 0.6        0.72       0.69565217
 0.54545455 0.5        0.4        0.42307692]

mean value: 0.5467516975777845

key: train_jcc
value: [0.68899522 0.68246445 0.68899522 0.68932039 0.66183575 0.69047619
 0.69668246 0.6937799  0.66513761 0.66981132]

mean value: 0.6827498517411101

MCC on Blind test: 0.33

Accuracy on Blind test: 0.66

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01881528 0.01888156 0.01720691 0.01661062 0.02828383 0.01732945
 0.01692677 0.01839519 0.01678467 0.01690912]

mean value: 0.01861433982849121

key: score_time
value: [0.01185322 0.01096296 0.01071048 0.01054764 0.01315546 0.01148176
 0.0106287  0.01134944 0.01050234 0.01056862]

mean value: 0.011176061630249024

key: test_mcc
value: [0.63059263 0.74535599 0.7098505  0.61152662 0.8547619  0.61969655
 0.8213423  0.56190476 0.66668392 0.6133669 ]

mean value: 0.6835082074285657

key: train_mcc
value: [0.80625522 0.84682617 0.8025478  0.81479313 0.78510219 0.78137629
 0.81394491 0.83137227 0.8097008  0.8097008 ]

mean value: 0.8101619588671509

key: test_accuracy
value: [0.80952381 0.85714286 0.85365854 0.80487805 0.92682927 0.80487805
 0.90243902 0.7804878  0.82926829 0.80487805]

mean value: 0.8373983739837398

key: train_accuracy
value: [0.9        0.92162162 0.90026954 0.90566038 0.88948787 0.88679245
 0.90566038 0.91374663 0.90296496 0.90296496]

mean value: 0.9029168791432942

key: test_fscore
value: [0.82608696 0.875      0.86363636 0.81818182 0.92682927 0.82608696
 0.90909091 0.7804878  0.8372093  0.80952381]

mean value: 0.8472133188972693

key: train_fscore
value: [0.90585242 0.9250646  0.90339426 0.90956072 0.8956743  0.89393939
 0.90956072 0.91794872 0.90769231 0.90769231]

mean value: 0.9076379747216281

key: test_precision
value: [0.76       0.77777778 0.82608696 0.7826087  0.95       0.76
 0.83333333 0.76190476 0.7826087  0.77272727]

mean value: 0.8007047493569233

key: train_precision
value: [0.85576923 0.88613861 0.87373737 0.87128713 0.84615385 0.83886256
 0.87562189 0.87745098 0.86764706 0.86764706]

mean value: 0.8660315741062894

key: test_recall
value: [0.9047619  1.         0.9047619  0.85714286 0.9047619  0.9047619
 1.         0.8        0.9        0.85      ]

mean value: 0.9026190476190477

key: train_recall
value: [0.96216216 0.96756757 0.93513514 0.95135135 0.95135135 0.95675676
 0.94623656 0.96236559 0.9516129  0.9516129 ]

mean value: 0.9536152281313572

key: test_roc_auc
value: [0.80952381 0.85714286 0.85238095 0.80357143 0.92738095 0.80238095
 0.9047619  0.78095238 0.83095238 0.80595238]

mean value: 0.8374999999999999

key: train_roc_auc
value: [0.9        0.92162162 0.90036327 0.9057832  0.88965417 0.88698053
 0.90555071 0.91361523 0.90283348 0.90283348]

mean value: 0.9029235687300203

key: test_jcc
value: [0.7037037  0.77777778 0.76       0.69230769 0.86363636 0.7037037
 0.83333333 0.64       0.72       0.68      ]

mean value: 0.7374462574462575

key: train_jcc
value: [0.82790698 0.86057692 0.82380952 0.83412322 0.81105991 0.80821918
 0.83412322 0.84834123 0.83098592 0.83098592]

mean value: 0.831013201825796

MCC on Blind test: 0.72

Accuracy on Blind test: 0.88

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.42117834 1.50192738 1.74208856 1.39239717 1.81255579 1.65309405
 2.01920605 2.15930629 2.04894686 1.53648686]

mean value: 1.7287187337875367

key: score_time
value: [0.01234365 0.02283001 0.01450133 0.01485205 0.01476645 0.01467419
 0.01476502 0.03691173 0.01803923 0.01702905]

mean value: 0.01807126998901367

key: test_mcc
value: [0.85811633 1.         0.75714286 0.8547619  1.         0.75714286
 0.90649828 0.66432098 0.90649828 0.81975606]

mean value: 0.852423754766439

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 1.         0.87804878 0.92682927 1.         0.87804878
 0.95121951 0.82926829 0.95121951 0.90243902]

mean value: 0.9245644599303136

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.92682927 1.         0.87804878 0.92682927 1.         0.87804878
 0.94736842 0.81081081 0.94736842 0.88888889]

mean value: 0.9204192639365938

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       1.         0.9        0.95       1.         0.9
 1.         0.88235294 1.         1.        ]

mean value: 0.9582352941176471

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9047619  1.         0.85714286 0.9047619  1.         0.85714286
 0.9        0.75       0.9        0.8       ]

mean value: 0.8873809523809524

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 1.         0.87857143 0.92738095 1.         0.87857143
 0.95       0.82738095 0.95       0.9       ]

mean value: 0.924047619047619

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.86363636 1.         0.7826087  0.86363636 1.         0.7826087
 0.9        0.68181818 0.9        0.8       ]

mean value: 0.8574308300395257

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02964616 0.01735234 0.02308035 0.01720619 0.01867843 0.01552367
 0.02575731 0.01813507 0.01478839 0.01864028]

mean value: 0.01988081932067871

key: score_time
value: [0.01226211 0.00949502 0.01432991 0.00976801 0.00871301 0.00870228
 0.01428056 0.00927043 0.0092001  0.00945854]

mean value: 0.010547995567321777

key: test_mcc
value: [0.90889326 0.9047619  1.         0.90238095 0.90692382 0.95238095
 0.95227002 0.8047619  0.95227002 0.95227002]

mean value: 0.9236912842968128

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95238095 0.95238095 1.         0.95121951 0.95121951 0.97560976
 0.97560976 0.90243902 0.97560976 0.97560976]

mean value: 0.9612078977932637

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95       0.95238095 1.         0.95238095 0.95       0.97560976
 0.97435897 0.9        0.97435897 0.97435897]

mean value: 0.9603448583936389

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95238095 1.         0.95238095 1.         1.
 1.         0.9        1.         1.        ]

mean value: 0.9804761904761905

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9047619  0.95238095 1.         0.95238095 0.9047619  0.95238095
 0.95       0.9        0.95       0.95      ]

mean value: 0.9416666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95238095 0.95238095 1.         0.95119048 0.95238095 0.97619048
 0.975      0.90238095 0.975      0.975     ]

mean value: 0.9611904761904762

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9047619  0.90909091 1.         0.90909091 0.9047619  0.95238095
 0.95       0.81818182 0.95       0.95      ]

mean value: 0.9248268398268398

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.82

Accuracy on Blind test: 0.92

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.11855721 0.12169838 0.12727523 0.12549639 0.12631321 0.12417126
 0.13257003 0.12751269 0.13062501 0.1096518 ]

mean value: 0.12438712120056153

key: score_time
value: [0.02752948 0.02243638 0.01942682 0.01884174 0.01758575 0.01780415
 0.01768422 0.01769805 0.01859951 0.01751542]

mean value: 0.019512152671813963

key: test_mcc
value: [0.90889326 0.8660254  0.90649828 0.90238095 0.95238095 0.75714286
 1.         0.8547619  0.85441771 1.        ]

mean value: 0.9002501316790256

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95238095 0.92857143 0.95121951 0.95121951 0.97560976 0.87804878
 1.         0.92682927 0.92682927 1.        ]

mean value: 0.9490708478513357

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95454545 0.93333333 0.95454545 0.95238095 0.97560976 0.87804878
 1.         0.92682927 0.92307692 1.        ]

mean value: 0.9498369922760166

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.875      0.91304348 0.95238095 1.         0.9
 1.         0.9047619  0.94736842 1.        ]

mean value: 0.9405598234717227

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 0.95238095 0.85714286
 1.         0.95       0.9        1.        ]

mean value: 0.9611904761904762

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95238095 0.92857143 0.95       0.95119048 0.97619048 0.87857143
 1.         0.92738095 0.92619048 1.        ]

mean value: 0.949047619047619

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91304348 0.875      0.91304348 0.90909091 0.95238095 0.7826087
 1.         0.86363636 0.85714286 1.        ]

mean value: 0.9065946734424996

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.77

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00960135 0.00979376 0.00971293 0.00982785 0.00957465 0.00961232
 0.00974131 0.00961852 0.00998235 0.01082802]

mean value: 0.009829306602478027

key: score_time
value: [0.00871539 0.00880337 0.00862288 0.00876284 0.00870943 0.00873494
 0.00865674 0.00874162 0.00948977 0.00937748]

mean value: 0.008861446380615234

key: test_mcc
value: [0.71754731 0.58834841 0.62325386 0.59982886 0.6806903  0.56190476
 0.65871309 0.7633652  0.81975606 0.65915306]

mean value: 0.6672560912173882

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.78571429 0.80487805 0.7804878  0.82926829 0.7804878
 0.82926829 0.87804878 0.90243902 0.80487805]

mean value: 0.8252613240418119

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85       0.75675676 0.78947368 0.74285714 0.81081081 0.7804878
 0.82051282 0.86486486 0.88888889 0.75      ]

mean value: 0.805465277377986

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.89473684 0.875      0.88235294 0.92857143 0.9375     0.8
 0.84210526 0.94117647 1.         1.        ]

mean value: 0.9101442945599292

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.80952381 0.66666667 0.71428571 0.61904762 0.71428571 0.76190476
 0.8        0.8        0.8        0.6       ]

mean value: 0.7285714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.78571429 0.80714286 0.78452381 0.83214286 0.78095238
 0.82857143 0.87619048 0.9        0.8       ]

mean value: 0.8252380952380952

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73913043 0.60869565 0.65217391 0.59090909 0.68181818 0.64
 0.69565217 0.76190476 0.8        0.6       ]

mean value: 0.6770284208545078

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.42

Accuracy on Blind test: 0.75

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.48892093 1.47670484 1.47442937 1.47451377 1.4959383  1.47596216
 1.51960182 1.68420792 1.62056398 1.53635812]

mean value: 1.5247201204299927

key: score_time
value: [0.09144354 0.09082866 0.09075522 0.09119344 0.09084129 0.09078074
 0.09067225 0.10559177 0.09852886 0.0922277 ]

mean value: 0.09328634738922119

key: test_mcc
value: [1.         0.95346259 0.90649828 0.95227002 1.         0.90238095
 1.         0.90692382 0.95238095 1.        ]

mean value: 0.9573916612509499

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 0.95121951 0.97560976 1.         0.95121951
 1.         0.95121951 0.97560976 1.        ]

mean value: 0.9781068524970964

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 0.95454545 0.97674419 1.         0.95238095
 1.         0.95238095 0.97560976 1.        ]

mean value: 0.9788405487497943

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 0.91304348 0.95454545 1.         0.95238095
 1.         0.90909091 0.95238095 1.        ]

mean value: 0.9635987201204592

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         0.95238095
 1.         1.         1.         1.        ]

mean value: 0.9952380952380953

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 0.95       0.975      1.         0.95119048
 1.         0.95238095 0.97619048 1.        ]

mean value: 0.9780952380952381

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 0.91304348 0.95454545 1.         0.90909091
 1.         0.90909091 0.95238095 1.        ]

mean value: 0.9592697157914549

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.9516983  0.93607759 0.93637466 0.98142076 1.0112915  0.97042155
 0.91884851 1.02991462 1.1556201  0.91718888]

mean value: 0.980885648727417

key: score_time
value: [0.19083285 0.22782016 0.21047091 0.20850635 0.26981235 0.23706317
 0.16170216 0.16438246 0.1959908  0.26817775]

mean value: 0.21347589492797853

key: test_mcc
value: [0.95346259 0.95346259 0.85441771 0.95238095 1.         0.90238095
 0.95238095 0.80907152 1.         0.95238095]

mean value: 0.9329938212534088

key: train_mcc
value: [0.96807684 0.96779381 0.97866529 0.96788166 0.96788166 0.9734012
 0.97866283 0.96787795 0.97339739 0.98395537]

mean value: 0.9727593997295828

key: test_accuracy
value: [0.97619048 0.97619048 0.92682927 0.97560976 1.         0.95121951
 0.97560976 0.90243902 1.         0.97560976]

mean value: 0.9659698025551684

key: train_accuracy
value: [0.98378378 0.98378378 0.98921833 0.98382749 0.98382749 0.98652291
 0.98921833 0.98382749 0.98652291 0.99191375]

mean value: 0.9862446273767029

key: test_fscore
value: [0.97674419 0.97674419 0.93023256 0.97560976 1.         0.95238095
 0.97560976 0.9047619  1.         0.97560976]

mean value: 0.9667693055668098

key: train_fscore
value: [0.98404255 0.98395722 0.98930481 0.98395722 0.98395722 0.98666667
 0.9893617  0.98404255 0.9867374  0.992     ]

mean value: 0.9864027346296044

key: test_precision
value: [0.95454545 0.95454545 0.90909091 1.         1.         0.95238095
 0.95238095 0.86363636 1.         0.95238095]

mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
0.9538961038961039

key: train_precision
value: [0.96858639 0.97354497 0.97883598 0.97354497 0.97354497 0.97368421
 0.97894737 0.97368421 0.97382199 0.98412698]

mean value: 0.9752322050034918

key: test_recall
value: [1.         1.         0.95238095 0.95238095 1.         0.95238095
 1.         0.95       1.         1.        ]

mean value: 0.9807142857142856

key: train_recall
value: [1.         0.99459459 1.         0.99459459 0.99459459 1.
 1.         0.99462366 1.         1.        ]

mean value: 0.9978407439697763

key: test_roc_auc
value: [0.97619048 0.97619048 0.92619048 0.97619048 1.         0.95119048
 0.97619048 0.90357143 1.         0.97619048]

mean value: 0.9661904761904762

key: train_roc_auc
value: [0.98378378 0.98378378 0.98924731 0.98385644 0.98385644 0.98655914
 0.98918919 0.98379831 0.98648649 0.99189189]

mean value: 0.9862452775356001

key: test_jcc
value: [0.95454545 0.95454545 0.86956522 0.95238095 1.         0.90909091
 0.95238095 0.82608696 1.         0.95238095]

mean value: 0.9370976849237719

key: train_jcc
value: [0.96858639 0.96842105 0.97883598 0.96842105 0.96842105 0.97368421
 0.97894737 0.96858639 0.97382199 0.98412698]

mean value: 0.9731852464202974

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0236361  0.00996685 0.00973201 0.00977063 0.01012659 0.00971603
 0.00966501 0.00985694 0.00972986 0.00974369]

mean value: 0.011194372177124023

key: score_time
value: [0.01239753 0.00912547 0.00883961 0.00879169 0.00874281 0.00875664
 0.00875926 0.00876093 0.00876856 0.00884151]

mean value: 0.009178400039672852

key: test_mcc
value: [0.52620136 0.4472136  0.66668392 0.7197263  0.80907152 0.41963703
 0.46428571 0.6133669  0.51190476 0.6133669 ]

mean value: 0.5791458007819006

key: train_mcc
value: [0.6606283  0.64358181 0.66088006 0.6661434  0.65009172 0.67407311
 0.66454603 0.65225276 0.67687355 0.64866961]

mean value: 0.6597740339245326

key: test_accuracy
value: [0.76190476 0.71428571 0.82926829 0.85365854 0.90243902 0.70731707
 0.73170732 0.80487805 0.75609756 0.80487805]

mean value: 0.7866434378629501

key: train_accuracy
value: [0.82972973 0.82162162 0.83018868 0.8328841  0.82479784 0.83557951
 0.83018868 0.82479784 0.83827493 0.82210243]

mean value: 0.8290165367523858

key: test_fscore
value: [0.77272727 0.75       0.82051282 0.86956522 0.9        0.73913043
 0.73170732 0.80952381 0.75       0.80952381]

mean value: 0.7952690681534796

key: train_fscore
value: [0.83464567 0.82446809 0.83289125 0.83510638 0.82758621 0.84237726
 0.83969466 0.83290488 0.84126984 0.83248731]

mean value: 0.8343431543661086

key: test_precision
value: [0.73913043 0.66666667 0.88888889 0.8        0.94736842 0.68
 0.71428571 0.77272727 0.75       0.77272727]

mean value: 0.7731794671131056

key: train_precision
value: [0.81122449 0.81151832 0.81770833 0.82198953 0.8125     0.80693069
 0.79710145 0.79802956 0.828125   0.78846154]

mean value: 0.8093588913988847

key: test_recall
value: [0.80952381 0.85714286 0.76190476 0.95238095 0.85714286 0.80952381
 0.75       0.85       0.75       0.85      ]

mean value: 0.8247619047619047

key: train_recall
value: [0.85945946 0.83783784 0.84864865 0.84864865 0.84324324 0.88108108
 0.88709677 0.87096774 0.85483871 0.88172043]

mean value: 0.8613542574832898

key: test_roc_auc
value: [0.76190476 0.71428571 0.83095238 0.85119048 0.90357143 0.7047619
 0.73214286 0.80595238 0.75595238 0.80595238]

mean value: 0.7866666666666666

key: train_roc_auc
value: [0.82972973 0.82162162 0.8302383  0.83292647 0.82484743 0.83570183
 0.83003487 0.82467306 0.83823017 0.8219413 ]

mean value: 0.828994478349317

key: test_jcc
value: [0.62962963 0.6        0.69565217 0.76923077 0.81818182 0.5862069
 0.57692308 0.68       0.6        0.68      ]

mean value: 0.6635824364430062

key: train_jcc
value: [0.71621622 0.70135747 0.71363636 0.71689498 0.70588235 0.72767857
 0.72368421 0.71365639 0.7260274  0.71304348]

mean value: 0.7158077421167284

MCC on Blind test: 0.6

Accuracy on Blind test: 0.81

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.13729072 0.0542841  0.18945885 0.05321908 0.05603051 0.0593791
 0.06020927 0.05859399 0.06958175 0.06030631]

mean value: 0.07983536720275879

key: score_time
value: [0.01096654 0.01153803 0.01101851 0.01092124 0.01061153 0.01044726
 0.01049089 0.01065683 0.01051927 0.01184368]

mean value: 0.010901379585266113

key: test_mcc
value: [1.         0.95346259 1.         0.95227002 0.90692382 0.95238095
 0.90649828 0.90692382 0.95227002 1.        ]

mean value: 0.9530729499296946

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         0.97560976 0.95121951 0.97560976
 0.95121951 0.95121951 0.97560976 1.        ]

mean value: 0.9756678281068525

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 1.         0.97674419 0.95       0.97560976
 0.94736842 0.95238095 0.97435897 1.        ]

mean value: 0.9753206475983143

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         0.95454545 1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9818181818181818

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.9047619  0.95238095
 0.9        1.         0.95       1.        ]

mean value: 0.9707142857142858

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         0.975      0.95238095 0.97619048
 0.95       0.95238095 0.975      1.        ]

mean value: 0.9757142857142856

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 1.         0.95454545 0.9047619  0.95238095
 0.9        0.90909091 0.95       1.        ]

mean value: 0.9525324675324676

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.88

Accuracy on Blind test: 0.95

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03848815 0.0564611  0.04007053 0.07608032 0.04673767 0.09112334
 0.07648802 0.09771609 0.07483935 0.0377171 ]

mean value: 0.06357216835021973

key: score_time
value: [0.0230732  0.01212573 0.02329516 0.01226211 0.02332783 0.03573036
 0.02186012 0.03916335 0.01241779 0.02289486]

mean value: 0.02261505126953125

key: test_mcc
value: [0.9047619  0.90889326 0.90692382 0.8547619  0.90692382 0.76500781
 0.85441771 0.65871309 0.90649828 0.7633652 ]

mean value: 0.8430266801146862

key: train_mcc
value: [0.98379816 0.99460913 0.98921825 0.98921825 0.98921825 0.99462366
 0.98384144 0.99462366 0.98921825 0.98384144]

mean value: 0.9892210469904663

key: test_accuracy
value: [0.95238095 0.95238095 0.95121951 0.92682927 0.95121951 0.87804878
 0.92682927 0.82926829 0.95121951 0.87804878]

mean value: 0.9197444831591173

key: train_accuracy
value: [0.99189189 0.9972973  0.99460916 0.99460916 0.99460916 0.99730458
 0.99191375 0.99730458 0.99460916 0.99191375]

mean value: 0.994606250455307

key: test_fscore
value: [0.95238095 0.95       0.95       0.92682927 0.95       0.87179487
 0.92307692 0.82051282 0.94736842 0.86486486]

mean value: 0.9156828121975747

key: train_fscore
value: [0.99191375 0.99728997 0.99459459 0.99459459 0.99459459 0.99730458
 0.9919571  0.99730458 0.99462366 0.9919571 ]

mean value: 0.9946134532763986

key: test_precision
value: [0.95238095 1.         1.         0.95       1.         0.94444444
 0.94736842 0.84210526 1.         0.94117647]

mean value: 0.9577475551624158

key: train_precision
value: [0.98924731 1.         0.99459459 0.99459459 0.99459459 0.99462366
 0.98930481 1.         0.99462366 0.98930481]

mean value: 0.9940888033108147

key: test_recall
value: [0.95238095 0.9047619  0.9047619  0.9047619  0.9047619  0.80952381
 0.9        0.8        0.9        0.8       ]

mean value: 0.8780952380952382

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.99459459 0.99459459 1.
 0.99462366 0.99462366 0.99462366 0.99462366]

mean value: 0.9951467596628887

key: test_roc_auc
value: [0.95238095 0.95238095 0.95238095 0.92738095 0.95238095 0.8797619
 0.92619048 0.82857143 0.95       0.87619048]

mean value: 0.9197619047619047

key: train_roc_auc
value: [0.99189189 0.9972973  0.99460913 0.99460913 0.99460913 0.99731183
 0.99190642 0.99731183 0.99460913 0.99190642]

mean value: 0.9946062191223481

key: test_jcc
value: [0.90909091 0.9047619  0.9047619  0.86363636 0.9047619  0.77272727
 0.85714286 0.69565217 0.9        0.76190476]

mean value: 0.8474440052700922

key: train_jcc
value: [0.98395722 0.99459459 0.98924731 0.98924731 0.98924731 0.99462366
 0.98404255 0.99462366 0.98930481 0.98404255]

mean value: 0.9892930980374963

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.03393626 0.01116276 0.01050591 0.01051927 0.01106477 0.01092601
 0.0112021  0.01075196 0.01055264 0.0097971 ]

mean value: 0.013041877746582031

key: score_time
value: [0.0211122  0.01032758 0.0091815  0.00963449 0.00966692 0.00959134
 0.00962043 0.0095799  0.00896215 0.00962305]

mean value: 0.01072995662689209

key: test_mcc
value: [0.42857143 0.52380952 0.7633652  0.51966679 0.75714286 0.46623254
 0.51190476 0.46300848 0.58066054 0.7565654 ]

mean value: 0.5770927520886003

key: train_mcc
value: [0.63036031 0.63110621 0.65040473 0.639995   0.63987066 0.65892307
 0.60747259 0.64549275 0.67228752 0.65953152]

mean value: 0.6435444364715565

key: test_accuracy
value: [0.71428571 0.76190476 0.87804878 0.75609756 0.87804878 0.73170732
 0.75609756 0.73170732 0.7804878  0.87804878]

mean value: 0.7866434378629501

key: train_accuracy
value: [0.81351351 0.81351351 0.82479784 0.81940701 0.81671159 0.82749326
 0.8032345  0.82210243 0.83557951 0.82749326]

mean value: 0.8203846434035114

key: test_fscore
value: [0.71428571 0.76190476 0.88888889 0.7826087  0.87804878 0.75555556
 0.75       0.71794872 0.8        0.87179487]

mean value: 0.7921035986518489

key: train_fscore
value: [0.82262211 0.82352941 0.82849604 0.82414698 0.82828283 0.83589744
 0.80939948 0.828125   0.84073107 0.83756345]

mean value: 0.8278793807837299

key: test_precision
value: [0.71428571 0.76190476 0.83333333 0.72       0.9        0.70833333
 0.75       0.73684211 0.72       0.89473684]

mean value: 0.7739436090225564

key: train_precision
value: [0.78431373 0.7815534  0.80927835 0.80102041 0.77725118 0.79512195
 0.78680203 0.8030303  0.81725888 0.79326923]

mean value: 0.794889946578593

key: test_recall
value: [0.71428571 0.76190476 0.95238095 0.85714286 0.85714286 0.80952381
 0.75       0.7        0.9        0.85      ]

mean value: 0.8152380952380952

key: train_recall
value: [0.86486486 0.87027027 0.84864865 0.84864865 0.88648649 0.88108108
 0.83333333 0.85483871 0.8655914  0.88709677]

mean value: 0.8640860215053764

key: test_roc_auc
value: [0.71428571 0.76190476 0.87619048 0.75357143 0.87857143 0.7297619
 0.75595238 0.73095238 0.78333333 0.87738095]

mean value: 0.7861904761904762

key: train_roc_auc
value: [0.81351351 0.81351351 0.82486196 0.81948561 0.81689916 0.82763731
 0.80315315 0.82201395 0.8354984  0.82733217]

mean value: 0.8203908747457135

key: test_jcc
value: [0.55555556 0.61538462 0.8        0.64285714 0.7826087  0.60714286
 0.6        0.56       0.66666667 0.77272727]

mean value: 0.6602942805986285

key: train_jcc
value: [0.69868996 0.7        0.70720721 0.70089286 0.70689655 0.71806167
 0.67982456 0.70666667 0.72522523 0.72052402]

mean value: 0.7063988717177541

MCC on Blind test: 0.62

Accuracy on Blind test: 0.83

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.018049   0.02355123 0.02099872 0.02084327 0.02586102 0.02243018
 0.02418661 0.02225685 0.02579045 0.02315402]

mean value: 0.022712135314941408

key: score_time
value: [0.00903296 0.01131964 0.01169229 0.01225495 0.01198363 0.0118804
 0.01187754 0.0114584  0.01192617 0.01193905]

mean value: 0.011536502838134765

key: test_mcc
value: [0.8660254  1.         0.95238095 0.90238095 1.         0.80817439
 0.95227002 0.72229808 0.86240942 0.81975606]

mean value: 0.8885695274349608

key: train_mcc
value: [0.95247913 0.99460913 0.9946235  0.97849275 1.         0.94236768
 0.99462366 0.94234975 0.96816407 0.98921825]

mean value: 0.9756927908121855

key: test_accuracy
value: [0.92857143 1.         0.97560976 0.95121951 1.         0.90243902
 0.97560976 0.85365854 0.92682927 0.90243902]

mean value: 0.9416376306620209

key: train_accuracy
value: [0.97567568 0.9972973  0.99730458 0.98921833 1.         0.9703504
 0.99730458 0.9703504  0.98382749 0.99460916]

mean value: 0.9875937932541706

key: test_fscore
value: [0.93333333 1.         0.97560976 0.95238095 1.         0.90909091
 0.97435897 0.86363636 0.91891892 0.88888889]

mean value: 0.9416218096705902

key: train_fscore
value: [0.9762533  0.99728997 0.99728997 0.98913043 1.         0.97112861
 0.99730458 0.97127937 0.98360656 0.99462366]

mean value: 0.9877906456528402

key: test_precision
value: [0.875      1.         1.         0.95238095 1.         0.86956522
 1.         0.79166667 1.         1.        ]

mean value: 0.9488612836438923

key: train_precision
value: [0.95360825 1.         1.         0.99453552 1.         0.94387755
 1.         0.94416244 1.         0.99462366]

mean value: 0.9830807410030974

key: test_recall
value: [1.         1.         0.95238095 0.95238095 1.         0.95238095
 0.95       0.95       0.85       0.8       ]

mean value: 0.9407142857142857

key: train_recall
value: [1.         0.99459459 0.99459459 0.98378378 1.         1.
 0.99462366 1.         0.96774194 0.99462366]

mean value: 0.9929962220284801

key: test_roc_auc
value: [0.92857143 1.         0.97619048 0.95119048 1.         0.90119048
 0.975      0.85595238 0.925      0.9       ]

mean value: 0.9413095238095238

key: train_roc_auc
value: [0.97567568 0.9972973  0.9972973  0.98920372 1.         0.97043011
 0.99731183 0.97027027 0.98387097 0.99460913]

mean value: 0.9875966288869515

key: test_jcc
value: [0.875      1.         0.95238095 0.90909091 1.         0.83333333
 0.95       0.76       0.85       0.8       ]

mean value: 0.8929805194805195

key: train_jcc
value: [0.95360825 0.99459459 0.99459459 0.97849462 1.         0.94387755
 0.99462366 0.94416244 0.96774194 0.98930481]

mean value: 0.9761002452068489

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01773119 0.01894855 0.0170114  0.01634312 0.01716518 0.01672268
 0.01967621 0.01650691 0.01855946 0.01665854]

mean value: 0.01753232479095459

key: score_time
value: [0.01205349 0.01189566 0.0119071  0.01189089 0.01292086 0.01191783
 0.01203084 0.01201177 0.01197481 0.01218939]

mean value: 0.012079262733459472

key: test_mcc
value: [0.80952381 0.78446454 0.95227002 0.86333169 0.78072006 0.74124932
 0.8547619  0.76500781 0.95227002 0.73786479]

mean value: 0.8241463949364545

key: train_mcc
value: [0.97298719 0.83968394 0.9214168  0.93057445 0.88164335 0.7451756
 0.97339739 0.95737027 0.98395537 0.81247091]

mean value: 0.9018675260545876

key: test_accuracy
value: [0.9047619  0.88095238 0.97560976 0.92682927 0.87804878 0.85365854
 0.92682927 0.87804878 0.97560976 0.85365854]

mean value: 0.9054006968641115

key: train_accuracy
value: [0.98648649 0.91351351 0.95956873 0.96495957 0.93800539 0.85714286
 0.98652291 0.97843666 0.99191375 0.89757412]

mean value: 0.9474123989218328

key: test_fscore
value: [0.9047619  0.89361702 0.97674419 0.92307692 0.86486486 0.83333333
 0.92682927 0.88372093 0.97435897 0.82352941]

mean value: 0.9004836818009054

key: train_fscore
value: [0.98644986 0.92039801 0.96083551 0.96418733 0.93409742 0.83280757
 0.9867374  0.97883598 0.992      0.88622754]

mean value: 0.9442576627868985

key: test_precision
value: [0.9047619  0.80769231 0.95454545 1.         1.         1.
 0.9047619  0.82608696 1.         1.        ]

mean value: 0.9397848528283311

key: train_precision
value: [0.98913043 0.85253456 0.92929293 0.98314607 0.99390244 1.
 0.97382199 0.96354167 0.98412698 1.        ]

mean value: 0.9669497073050086

key: test_recall
value: [0.9047619  1.         1.         0.85714286 0.76190476 0.71428571
 0.95       0.95       0.95       0.7       ]

mean value: 0.8788095238095238

key: train_recall
value: [0.98378378 1.         0.99459459 0.94594595 0.88108108 0.71351351
 1.         0.99462366 1.         0.79569892]

mean value: 0.930924149956408

key: test_roc_auc
value: [0.9047619  0.88095238 0.975      0.92857143 0.88095238 0.85714286
 0.92738095 0.8797619  0.975      0.85      ]

mean value: 0.905952380952381

key: train_roc_auc
value: [0.98648649 0.91351351 0.95966289 0.96490846 0.93785237 0.85675676
 0.98648649 0.97839291 0.99189189 0.89784946]

mean value: 0.9473801220575414

key: test_jcc
value: [0.82608696 0.80769231 0.95454545 0.85714286 0.76190476 0.71428571
 0.86363636 0.79166667 0.95       0.7       ]

mean value: 0.8226961082395865

key: train_jcc
value: [0.97326203 0.85253456 0.92462312 0.93085106 0.87634409 0.71351351
 0.97382199 0.95854922 0.98412698 0.79569892]

mean value: 0.8983325494425128

MCC on Blind test: 0.81

Accuracy on Blind test: 0.92

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.17113066 0.15320778 0.16114116 0.16019368 0.15601373 0.155586
 0.15632606 0.16162801 0.15992212 0.15480018]

mean value: 0.1589949369430542

key: score_time
value: [0.01620245 0.0164144  0.01639581 0.01671028 0.01555824 0.01625705
 0.0161562  0.01611733 0.01684666 0.01635766]

mean value: 0.016301608085632323

key: test_mcc
value: [1.         0.95346259 1.         0.95227002 0.95238095 0.95238095
 0.95227002 0.8547619  0.95227002 1.        ]

mean value: 0.9569796444320845

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         0.97560976 0.97560976 0.97560976
 0.97560976 0.92682927 0.97560976 1.        ]

mean value: 0.9781068524970964

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 1.         0.97674419 0.97560976 0.97560976
 0.97435897 0.92682927 0.97435897 1.        ]

mean value: 0.9780255101298777

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         0.95454545 1.         1.
 1.         0.9047619  1.         1.        ]

mean value: 0.9813852813852814

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.95238095 0.95238095
 0.95       0.95       0.95       1.        ]

mean value: 0.9754761904761905

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         0.975      0.97619048 0.97619048
 0.975      0.92738095 0.975      1.        ]

mean value: 0.9780952380952381

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 1.         0.95454545 0.95238095 0.95238095
 0.95       0.86363636 0.95       1.        ]

mean value: 0.9577489177489178

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.06097746 0.05407095 0.05890322 0.05291629 0.05129123 0.04869723
 0.05514884 0.06428218 0.07081532 0.0492568 ]

mean value: 0.05663595199584961

key: score_time
value: [0.02573156 0.02581716 0.02943945 0.02417064 0.01993871 0.02304506
 0.02579427 0.03719068 0.029531   0.02628541]

mean value: 0.026694393157958983

key: test_mcc
value: [0.90889326 0.9047619  0.95238095 0.95227002 0.90692382 0.95238095
 0.95227002 0.8547619  1.         0.95227002]

mean value: 0.9336912842968128

key: train_mcc
value: [1.         0.99460913 0.99462366 0.9946235  0.99462366 0.9946235
 0.99462366 1.         1.         0.99462366]

mean value: 0.9962350748975697

key: test_accuracy
value: [0.95238095 0.95238095 0.97560976 0.97560976 0.95121951 0.97560976
 0.97560976 0.92682927 1.         0.97560976]

mean value: 0.9660859465737515

key: train_accuracy
value: [1.         0.9972973  0.99730458 0.99730458 0.99730458 0.99730458
 0.99730458 1.         1.         0.99730458]

mean value: 0.9981124790558753

key: test_fscore
value: [0.95       0.95238095 0.97560976 0.97674419 0.95       0.97560976
 0.97435897 0.92682927 1.         0.97435897]

mean value: 0.9655891867633217

key: train_fscore
value: [1.         0.99728997 0.99730458 0.99728997 0.99730458 0.99728997
 0.99730458 1.         1.         0.99730458]

mean value: 0.9981088247540157

key: test_precision
value: [1.         0.95238095 1.         0.95454545 1.         1.
 1.         0.9047619  1.         1.        ]

mean value: 0.9811688311688311

key: train_precision
value: [1.         1.         0.99462366 1.         0.99462366 1.
 1.         1.         1.         1.        ]

mean value: 0.9989247311827957

key: test_recall
value: [0.9047619  0.95238095 0.95238095 1.         0.9047619  0.95238095
 0.95       0.95       1.         0.95      ]

mean value: 0.9516666666666667

key: train_recall
value: [1.         0.99459459 1.         0.99459459 1.         0.99459459
 0.99462366 1.         1.         0.99462366]

mean value: 0.9973031095611741

key: test_roc_auc
value: [0.95238095 0.95238095 0.97619048 0.975      0.95238095 0.97619048
 0.975      0.92738095 1.         0.975     ]

mean value: 0.9661904761904762

key: train_roc_auc
value: [1.         0.9972973  0.99731183 0.9972973  0.99731183 0.9972973
 0.99731183 1.         1.         0.99731183]

mean value: 0.9981139203719849

key: test_jcc
value: [0.9047619  0.90909091 0.95238095 0.95454545 0.9047619  0.95238095
 0.95       0.86363636 1.         0.95      ]

mean value: 0.9341558441558442

key: train_jcc
value: [1.         0.99459459 0.99462366 0.99459459 0.99462366 0.99459459
 0.99462366 1.         1.         0.99462366]

mean value: 0.9962278407439698

MCC on Blind test: 0.87

Accuracy on Blind test: 0.94

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.10364151 0.0689404  0.09549975 0.11759543 0.12540889 0.12004828
 0.07170916 0.10064197 0.09738994 0.06381583]

mean value: 0.0964691162109375

key: score_time
value: [0.02268863 0.01398158 0.02261162 0.03039312 0.02353644 0.02690196
 0.0140779  0.02318907 0.01394606 0.01409626]

mean value: 0.020542263984680176

key: test_mcc
value: [0.71754731 0.80952381 0.66668392 0.71121921 0.8547619  0.85441771
 0.81975606 0.7633652  0.7633652  0.81975606]

mean value: 0.7780396377492421

key: train_mcc
value: [0.97843556 0.97310093 0.97317174 0.97317174 0.97317174 0.98395537
 0.98921825 0.978494   0.98921825 0.97317407]

mean value: 0.9785111637414805

key: test_accuracy
value: [0.85714286 0.9047619  0.82926829 0.85365854 0.92682927 0.92682927
 0.90243902 0.87804878 0.87804878 0.90243902]

mean value: 0.8859465737514518

key: train_accuracy
value: [0.98918919 0.98648649 0.98652291 0.98652291 0.98652291 0.99191375
 0.99460916 0.98921833 0.99460916 0.98652291]

mean value: 0.9892117724193196

key: test_fscore
value: [0.85       0.9047619  0.82051282 0.85       0.92682927 0.93023256
 0.88888889 0.86486486 0.86486486 0.88888889]

mean value: 0.8789844059214451

key: train_fscore
value: [0.98913043 0.98637602 0.98637602 0.98637602 0.98637602 0.99182561
 0.99462366 0.98918919 0.99462366 0.98644986]

mean value: 0.989134650057088

key: test_precision
value: [0.89473684 0.9047619  0.88888889 0.89473684 0.95       0.90909091
 1.         0.94117647 0.94117647 1.        ]

mean value: 0.93245683281287

key: train_precision
value: [0.99453552 0.99450549 0.99450549 0.99450549 0.99450549 1.
 0.99462366 0.99456522 0.99462366 0.99453552]

mean value: 0.9950905545492605

key: test_recall
value: [0.80952381 0.9047619  0.76190476 0.80952381 0.9047619  0.95238095
 0.8        0.8        0.8        0.8       ]

mean value: 0.8342857142857143

key: train_recall
value: [0.98378378 0.97837838 0.97837838 0.97837838 0.97837838 0.98378378
 0.99462366 0.98387097 0.99462366 0.97849462]

mean value: 0.9832693984306887

key: test_roc_auc
value: [0.85714286 0.9047619  0.83095238 0.8547619  0.92738095 0.92619048
 0.9        0.87619048 0.87619048 0.9       ]

mean value: 0.8853571428571428

key: train_roc_auc
value: [0.98918919 0.98648649 0.98650102 0.98650102 0.98650102 0.99189189
 0.99460913 0.98923278 0.99460913 0.98654461]

mean value: 0.9892066259808195

key: test_jcc
value: [0.73913043 0.82608696 0.69565217 0.73913043 0.86363636 0.86956522
 0.8        0.76190476 0.76190476 0.8       ]

mean value: 0.7857011104837192

key: train_jcc
value: [0.97849462 0.97311828 0.97311828 0.97311828 0.97311828 0.98378378
 0.98930481 0.97860963 0.98930481 0.97326203]

mean value: 0.9785232809141727

MCC on Blind test: 0.37

Accuracy on Blind test: 0.71

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.54057002 0.54626584 0.54693365 0.54073143 0.54698849 0.52059269
 0.54536033 0.54451895 0.55396438 0.55362821]

mean value: 0.543955397605896

key: score_time
value: [0.00982523 0.00935054 0.00933266 0.00924683 0.00945854 0.0092802
 0.00935721 0.00927353 0.00961375 0.00918651]

mean value: 0.009392499923706055

key: test_mcc
value: [1.         0.95346259 1.         0.90238095 0.95238095 0.95238095
 0.95227002 0.90692382 1.         1.        ]

mean value: 0.9619799285556849

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         0.95121951 0.97560976 0.97560976
 0.97560976 0.95121951 1.         1.        ]

mean value: 0.9805458768873403

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97674419 1.         0.95238095 0.97560976 0.97560976
 0.97435897 0.95238095 1.         1.        ]

mean value: 0.9807084577362513

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95454545 1.         0.95238095 1.         1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9816017316017316

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.95238095 0.95238095 0.95238095
 0.95       1.         1.         1.        ]

mean value: 0.9807142857142856

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         0.95119048 0.97619048 0.97619048
 0.975      0.95238095 1.         1.        ]

mean value: 0.9807142857142856

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95454545 1.         0.90909091 0.95238095 0.95238095
 0.95       0.90909091 1.         1.        ]

mean value: 0.9627489177489177

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.86

Accuracy on Blind test: 0.94

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02883434 0.0289309  0.02810097 0.05655909 0.03880763 0.04022145
 0.03861618 0.02902699 0.02924132 0.03623962]

mean value: 0.03545784950256348

key: score_time
value: [0.01243997 0.01541114 0.01505589 0.01314878 0.02119303 0.02428484
 0.01450896 0.02024794 0.01529074 0.01999283]

mean value: 0.01715741157531738

key: test_mcc
value: [0.68640647 0.74535599 0.63496528 0.7197263  0.7565654  0.73786479
 0.58066054 0.6133669  0.86333169 0.8047619 ]

mean value: 0.7143005278179001

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.83333333 0.85714286 0.80487805 0.85365854 0.87804878 0.85365854
 0.7804878  0.80487805 0.92682927 0.90243902]

mean value: 0.8495354239256678

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85106383 0.875      0.83333333 0.86956522 0.88372093 0.875
 0.8        0.80952381 0.93023256 0.9       ]

mean value: 0.8627439678407774

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.76923077 0.77777778 0.74074074 0.8        0.86363636 0.77777778
 0.72       0.77272727 0.86956522 0.9       ]

mean value: 0.7991455919282007

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         0.95238095 0.95238095 0.9047619  1.
 0.9        0.85       1.         0.9       ]

mean value: 0.9411904761904761

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83333333 0.85714286 0.80119048 0.85119048 0.87738095 0.85
 0.78333333 0.80595238 0.92857143 0.90238095]

mean value: 0.849047619047619

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.74074074 0.77777778 0.71428571 0.76923077 0.79166667 0.77777778
 0.66666667 0.68       0.86956522 0.81818182]

mean value: 0.7605893148719236

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.69

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02349377 0.03713441 0.03637457 0.03698802 0.03131008 0.03687334
 0.03689528 0.03683734 0.03825593 0.03702378]

mean value: 0.03511865139007568

key: score_time
value: [0.02268529 0.02422953 0.0209794  0.02236128 0.02284622 0.02414441
 0.02387094 0.02138376 0.02061534 0.02167439]

mean value: 0.02247905731201172

key: test_mcc
value: [0.9047619  0.95346259 0.95238095 0.95227002 0.95238095 0.90238095
 0.90238095 0.75714286 0.85441771 0.80817439]

mean value: 0.8939753275609796

key: train_mcc
value: [0.97843556 0.96251377 0.97317407 0.94659116 0.97317407 0.97866529
 0.96787795 0.97305937 0.95709306 0.96787795]

mean value: 0.9678462240831375

key: test_accuracy
value: [0.95238095 0.97619048 0.97560976 0.97560976 0.97560976 0.95121951
 0.95121951 0.87804878 0.92682927 0.90243902]

mean value: 0.9465156794425087

key: train_accuracy
value: [0.98918919 0.98108108 0.98652291 0.97304582 0.98652291 0.98921833
 0.98382749 0.98652291 0.97843666 0.98382749]

mean value: 0.9838194798572157

key: test_fscore
value: [0.95238095 0.97674419 0.97560976 0.97674419 0.97560976 0.95238095
 0.95       0.87804878 0.92307692 0.89473684]

mean value: 0.9455332334720041

key: train_fscore
value: [0.98924731 0.98133333 0.98659517 0.97340426 0.98659517 0.98930481
 0.98404255 0.98659517 0.9787234  0.98404255]

mean value: 0.9839883746741166

key: test_precision
value: [0.95238095 0.95454545 1.         0.95454545 1.         0.95238095
 0.95       0.85714286 0.94736842 0.94444444]

mean value: 0.9512808536492747

key: train_precision
value: [0.98395722 0.96842105 0.9787234  0.95811518 0.9787234  0.97883598
 0.97368421 0.98395722 0.96842105 0.97368421]

mean value: 0.9746522935411154

key: test_recall
value: [0.95238095 1.         0.95238095 1.         0.95238095 0.95238095
 0.95       0.9        0.9        0.85      ]

mean value: 0.940952380952381

key: train_recall /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)

value: [0.99459459 0.99459459 0.99459459 0.98918919 0.99459459 1.
 0.99462366 0.98924731 0.98924731 0.99462366]

mean value: 0.9935309503051439

key: test_roc_auc
value: [0.95238095 0.97619048 0.97619048 0.975      0.97619048 0.95119048
 0.95119048 0.87857143 0.92619048 0.90119048]

mean value: 0.9464285714285714

key: train_roc_auc
value: [0.98918919 0.98108108 0.98654461 0.97308922 0.98654461 0.98924731
 0.98379831 0.98651555 0.97840744 0.98379831]

mean value: 0.9838215634989829

key: test_jcc
value: [0.90909091 0.95454545 0.95238095 0.95454545 0.95238095 0.90909091
 0.9047619  0.7826087  0.85714286 0.80952381]

mean value: 0.8986071899115378

key: train_jcc
value: [0.9787234  0.96335079 0.97354497 0.94818653 0.97354497 0.97883598
 0.96858639 0.97354497 0.95833333 0.96858639]

mean value: 0.9685237725766385

MCC on Blind test: 0.84

Accuracy on Blind test: 0.93

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.26893139 0.27967191 0.2897954  0.31818628 0.27074409 0.26609921
 0.26971841 0.26205802 0.26608682 0.26403761]

mean value: 0.2755329132080078

key: score_time
value: [0.02258849 0.02014399 0.02077603 0.02329016 0.02250028 0.01875901
 0.01647425 0.02191854 0.02195883 0.02291036]

mean value: 0.02113199234008789

key: test_mcc
value: [0.9047619  0.95346259 0.95238095 0.95227002 0.95238095 0.90238095
 0.90238095 0.75714286 0.85441771 0.80817439]

mean value: 0.8939753275609796

key: train_mcc
value: [0.97843556 0.96251377 0.97317407 0.94659116 0.97317407 0.97866529
 0.98384144 0.97305937 0.95709306 0.96787795]

mean value: 0.9694425730208168

key: test_accuracy
value: [0.95238095 0.97619048 0.97560976 0.97560976 0.97560976 0.95121951
 0.95121951 0.87804878 0.92682927 0.90243902]

mean value: 0.9465156794425087

key: train_accuracy
value: [0.98918919 0.98108108 0.98652291 0.97304582 0.98652291 0.98921833
 0.99191375 0.98652291 0.97843666 0.98382749]

mean value: 0.9846281051941429

key: test_fscore
value: [0.95238095 0.97674419 0.97560976 0.97674419 0.97560976 0.95238095
 0.95       0.87804878 0.92307692 0.89473684]

mean value: 0.9455332334720041

key: train_fscore
value: [0.98924731 0.98133333 0.98659517 0.97340426 0.98659517 0.98930481
 0.9919571  0.98659517 0.9787234  0.98404255]

mean value: 0.9847798298107318

key: test_precision
value: [0.95238095 0.95454545 1.         0.95454545 1.         0.95238095
 0.95       0.85714286 0.94736842 0.94444444]

mean value: 0.9512808536492747

key: train_precision
value: [0.98395722 0.96842105 0.9787234  0.95811518 0.9787234  0.97883598
 0.98930481 0.98395722 0.96842105 0.97368421]

mean value: 0.9762143537719062

key: test_recall
value: [0.95238095 1.         0.95238095 1.         0.95238095 0.95238095
 0.95       0.9        0.9        0.85      ]

mean value: 0.940952380952381

key: train_recall
value: [0.99459459 0.99459459 0.99459459 0.98918919 0.99459459 1.
 0.99462366 0.98924731 0.98924731 0.99462366]

mean value: 0.9935309503051439

key: test_roc_auc
value: [0.95238095 0.97619048 0.97619048 0.975      0.97619048 0.95119048
 0.95119048 0.87857143 0.92619048 0.90119048]

mean value: 0.9464285714285714

key: train_roc_auc
value: [0.98918919 0.98108108 0.98654461 0.97308922 0.98654461 0.98924731
 0.99190642 0.98651555 0.97840744 0.98379831]

mean value: 0.9846323743097937

key: test_jcc
value: [0.90909091 0.95454545 0.95238095 0.95454545 0.95238095 0.90909091
 0.9047619  0.7826087  0.85714286 0.80952381]

mean value: 0.8986071899115378

key: train_jcc
value: [0.9787234  0.96335079 0.97354497 0.94818653 0.97354497 0.97883598
 0.98404255 0.97354497 0.95833333 0.96858639]

mean value: 0.970069389152332

MCC on Blind test: 0.84

Accuracy on Blind test: 0.93

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02979517 0.03099942 0.02958393 0.02999544 0.0327394  0.03862596
 0.02719903 0.02743649 0.02765322 0.02869177]

mean value: 0.03027198314666748

key: score_time
value: [0.01283884 0.0125618  0.01267099 0.01301289 0.01287889 0.01229525
 0.01248908 0.01229692 0.01221228 0.01231456]

mean value: 0.012557148933410645

key: test_mcc
value: [0.73029674 0.46225016 0.80909091 0.63305416 0.74795759 0.52727273
 0.71562645 0.82572282 0.82275335 0.33028913]

mean value: 0.6604314052101561

key: train_mcc
value: [0.87387789 0.84252546 0.86391052 0.87508285 0.87452017 0.8852317
 0.87454765 0.89549293 0.86509383 0.89609412]

mean value: 0.874637713340964

key: test_accuracy
value: [0.86363636 0.72727273 0.9047619  0.80952381 0.85714286 0.76190476
 0.85714286 0.9047619  0.9047619  0.66666667]

mean value: 0.8257575757575757

key: train_accuracy
value: [0.93684211 0.92105263 0.93193717 0.93717277 0.93717277 0.94240838
 0.93717277 0.94764398 0.93193717 0.94764398]

mean value: 0.9370983742077708

key: test_fscore
value: [0.86956522 0.75       0.9        0.81818182 0.86956522 0.76190476
 0.86956522 0.9        0.91666667 0.69565217]

mean value: 0.8351101072840204

key: train_fscore
value: [0.9375     0.92227979 0.93264249 0.93877551 0.93814433 0.94358974
 0.9375     0.94791667 0.93333333 0.94845361]

mean value: 0.9380135471730902

key: test_precision
value: [0.83333333 0.69230769 0.9        0.75       0.76923077 0.72727273
 0.83333333 1.         0.84615385 0.66666667]

mean value: 0.8018298368298369

key: train_precision
value: [0.92783505 0.90816327 0.92783505 0.92       0.92857143 0.92929293
 0.92783505 0.93814433 0.91       0.92929293]

mean value: 0.9246970036999492

key: test_recall
value: [0.90909091 0.81818182 0.9        0.9        1.         0.8
 0.90909091 0.81818182 1.         0.72727273]

mean value: 0.8781818181818182

key: train_recall
value: [0.94736842 0.93684211 0.9375     0.95833333 0.94791667 0.95833333
 0.94736842 0.95789474 0.95789474 0.96842105]

mean value: 0.9517872807017543

key: test_roc_auc
value: [0.86363636 0.72727273 0.90454545 0.81363636 0.86363636 0.76363636
 0.85454545 0.90909091 0.9        0.66363636]

mean value: 0.8263636363636364

key: train_roc_auc
value: [0.93684211 0.92105263 0.93190789 0.9370614  0.93711623 0.94232456
 0.93722588 0.94769737 0.93207237 0.94775219]

mean value: 0.9371052631578948

key: test_jcc
value: [0.76923077 0.6        0.81818182 0.69230769 0.76923077 0.61538462
 0.76923077 0.81818182 0.84615385 0.53333333]

mean value: 0.7231235431235431

key: train_jcc
value: [0.88235294 0.85576923 0.87378641 0.88461538 0.88349515 0.89320388
 0.88235294 0.9009901  0.875      0.90196078]

mean value: 0.8833526817954387

MCC on Blind test: 0.76

Accuracy on Blind test: 0.89

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.72123003 0.84316516 0.7306025  0.74526048 0.81987715 0.71120024
 0.72252727 0.80323362 0.73404384 0.85311699]

mean value: 0.768425726890564

key: score_time
value: [0.01541853 0.01553202 0.01258135 0.01577401 0.01313853 0.01267266
 0.01245832 0.01556325 0.01253867 0.01255989]

mean value: 0.013823723793029786

key: test_mcc
value: [1.         0.64715023 0.74161985 0.71818182 0.71818182 0.71818182
 0.71818182 0.67419986 0.90829511 0.42727273]

mean value: 0.727126504633149

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.81818182 0.85714286 0.85714286 0.85714286 0.85714286
 0.85714286 0.80952381 0.95238095 0.71428571]

mean value: 0.858008658008658

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.83333333 0.82352941 0.85714286 0.85714286 0.85714286
 0.85714286 0.77777778 0.95652174 0.72727273]

mean value: 0.8547006417850408

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.76923077 1.         0.81818182 0.81818182 0.81818182
 0.9        1.         0.91666667 0.72727273]

mean value: 0.8767715617715618

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.7        0.9        0.9        0.9
 0.81818182 0.63636364 1.         0.72727273]

mean value: 0.8490909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.81818182 0.85       0.85909091 0.85909091 0.85909091
 0.85909091 0.81818182 0.95       0.71363636]

mean value: 0.8586363636363636

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.71428571 0.7        0.75       0.75       0.75
 0.75       0.63636364 0.91666667 0.57142857]

mean value: 0.7538744588744588

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01317358 0.011415   0.01040435 0.00997591 0.00988245 0.00988269
 0.01001906 0.00998998 0.00979877 0.00988674]

mean value: 0.010442852973937988

key: score_time
value: [0.01445031 0.01000452 0.00979805 0.00970507 0.00962663 0.00961113
 0.00959015 0.00951719 0.00951576 0.00954485]

mean value: 0.01013636589050293

key: test_mcc
value: [0.37796447 0.40824829 0.74795759 0.53300179 0.67419986 0.63305416
 0.35527986 0.66332496 0.33709993 0.33709993]

mean value: 0.5067230850517317

key: train_mcc
value: [0.50728584 0.59286988 0.53138382 0.59435593 0.53227428 0.4917695
 0.60438034 0.52466542 0.52683546 0.52870448]

mean value: 0.5434524940733954

key: test_accuracy
value: [0.68181818 0.68181818 0.85714286 0.71428571 0.80952381 0.80952381
 0.66666667 0.80952381 0.66666667 0.66666667]

mean value: 0.7363636363636363

key: train_accuracy
value: [0.73684211 0.78421053 0.7486911  0.78534031 0.7539267  0.73298429
 0.80104712 0.7486911  0.7539267  0.7486911 ]

mean value: 0.7594351060898319

key: test_fscore
value: [0.72       0.74074074 0.86956522 0.76923077 0.83333333 0.81818182
 0.63157895 0.84615385 0.72       0.72      ]

mean value: 0.7668784672400233

key: train_fscore
value: [0.77678571 0.81105991 0.78761062 0.81278539 0.78733032 0.77130045
 0.80808081 0.78181818 0.78139535 0.78378378]

mean value: 0.7901950517409254

key: test_precision
value: [0.64285714 0.625      0.76923077 0.625      0.71428571 0.75
 0.75       0.73333333 0.64285714 0.64285714]

mean value: 0.6895421245421246

key: train_precision
value: [0.6744186  0.72131148 0.68461538 0.72357724 0.696      0.67716535
 0.77669903 0.688      0.7        0.68503937]

mean value: 0.7026826453984404

key: test_recall
value: [0.81818182 0.90909091 1.         1.         1.         0.9
 0.54545455 1.         0.81818182 0.81818182]

mean value: 0.8809090909090909

key: train_recall
value: [0.91578947 0.92631579 0.92708333 0.92708333 0.90625    0.89583333
 0.84210526 0.90526316 0.88421053 0.91578947]

mean value: 0.9045723684210526

key: test_roc_auc
value: [0.68181818 0.68181818 0.86363636 0.72727273 0.81818182 0.81363636
 0.67272727 0.8        0.65909091 0.65909091]

mean value: 0.7377272727272728

key: train_roc_auc
value: [0.73684211 0.78421053 0.74775219 0.7845943  0.753125   0.73212719
 0.80126096 0.74950658 0.75460526 0.7495614 ]

mean value: 0.7593585526315789

key: test_jcc
value: [0.5625     0.58823529 0.76923077 0.625      0.71428571 0.69230769
 0.46153846 0.73333333 0.5625     0.5625    ]

mean value: 0.6271431264813618

key: train_jcc
value: [0.6350365  0.68217054 0.64963504 0.68461538 0.64925373 0.62773723
 0.6779661  0.64179104 0.64122137 0.64444444]

mean value: 0.6533871382679696

MCC on Blind test: 0.42

Accuracy on Blind test: 0.74

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0090909  0.00899744 0.00900245 0.00919318 0.01037669 0.00928235
 0.00912166 0.00934029 0.00910783 0.00934601]

mean value: 0.009285879135131837

key: score_time
value: [0.00865531 0.00874138 0.00873184 0.00945687 0.00972223 0.00895834
 0.00879216 0.00884581 0.00873852 0.00886703]

mean value: 0.008950948715209961

key: test_mcc
value: [0.63636364 0.45454545 0.61818182 0.33636364 0.63305416 0.71562645
 0.55161872 0.4719399  0.23373675 0.43007562]

mean value: 0.5081506148469601

key: train_mcc
value: [0.70920321 0.72063664 0.67566396 0.73823885 0.6859713  0.6859713
 0.69638158 0.69638158 0.62349105 0.7403031 ]

mean value: 0.697224255471689

key: test_accuracy
value: [0.81818182 0.72727273 0.80952381 0.66666667 0.80952381 0.85714286
 0.76190476 0.71428571 0.61904762 0.71428571]

mean value: 0.7497835497835498

key: train_accuracy
value: [0.85263158 0.85789474 0.83769634 0.86910995 0.84293194 0.84293194
 0.84816754 0.84816754 0.81151832 0.86910995]

mean value: 0.8480159823642877

key: test_fscore
value: [0.81818182 0.72727273 0.8        0.66666667 0.81818182 0.84210526
 0.73684211 0.66666667 0.66666667 0.75      ]

mean value: 0.7492583732057416

key: train_fscore
value: [0.86       0.86567164 0.84102564 0.87046632 0.84536082 0.84536082
 0.84816754 0.84816754 0.80645161 0.87309645]

mean value: 0.850376839168251

key: test_precision
value: [0.81818182 0.72727273 0.8        0.63636364 0.75       0.88888889
 0.875      0.85714286 0.61538462 0.69230769]

mean value: 0.7660542235542236

key: train_precision
value: [0.81904762 0.82075472 0.82828283 0.86597938 0.83673469 0.83673469
 0.84375    0.84375    0.82417582 0.84313725]

mean value: 0.8362347012587765

key: test_recall
value: [0.81818182 0.72727273 0.8        0.7        0.9        0.8
 0.63636364 0.54545455 0.72727273 0.81818182]

mean value: 0.7472727272727273

key: train_recall
value: [0.90526316 0.91578947 0.85416667 0.875      0.85416667 0.85416667
 0.85263158 0.85263158 0.78947368 0.90526316]

mean value: 0.8658552631578947

key: test_roc_auc
value: [0.81818182 0.72727273 0.80909091 0.66818182 0.81363636 0.85454545
 0.76818182 0.72272727 0.61363636 0.70909091]

mean value: 0.7504545454545455

key: train_roc_auc
value: [0.85263158 0.85789474 0.83760965 0.86907895 0.84287281 0.84287281
 0.84819079 0.84819079 0.81140351 0.86929825]

mean value: 0.8480043859649123

key: test_jcc
value: [0.69230769 0.57142857 0.66666667 0.5        0.69230769 0.72727273
 0.58333333 0.5        0.5        0.6       ]

mean value: 0.6033316683316683

key: train_jcc
value: [0.75438596 0.76315789 0.72566372 0.7706422  0.73214286 0.73214286
 0.73636364 0.73636364 0.67567568 0.77477477]

mean value: 0.7401313215761582

MCC on Blind test: 0.55

Accuracy on Blind test: 0.78

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00881052 0.00952911 0.00968528 0.00963211 0.00966859 0.00922441
 0.00979471 0.00978827 0.00979757 0.00972414]

mean value: 0.009565472602844238

key: score_time
value: [0.01085305 0.01080704 0.01071835 0.0106647  0.01091552 0.01068163
 0.01075578 0.01077008 0.01102042 0.01079583]

mean value: 0.010798239707946777

key: test_mcc
value: [0.18257419 0.         0.90909091 0.52295779 0.63305416 0.23636364
 0.05504819 0.39196475 0.33028913 0.23373675]

mean value: 0.34950794979197125

key: train_mcc
value: [0.6344324  0.7264768  0.60269927 0.66509486 0.65445773 0.70692117
 0.64512756 0.69662073 0.65445773 0.6871103 ]

mean value: 0.6673398564766175

key: test_accuracy
value: [0.59090909 0.5        0.95238095 0.76190476 0.80952381 0.61904762
 0.52380952 0.66666667 0.66666667 0.61904762]

mean value: 0.670995670995671

key: train_accuracy
value: [0.81578947 0.86315789 0.80104712 0.83246073 0.82722513 0.85340314
 0.82198953 0.84816754 0.82722513 0.84293194]

mean value: 0.8333397630201157

key: test_fscore
value: [0.57142857 0.42105263 0.95238095 0.73684211 0.81818182 0.6
 0.5        0.58823529 0.69565217 0.66666667]

mean value: 0.6550440213530804

key: train_fscore
value: [0.80662983 0.86170213 0.79787234 0.83157895 0.82901554 0.8556701
 0.81521739 0.84491979 0.82539683 0.83695652]

mean value: 0.8304959421378466

key: test_precision
value: [0.6        0.5        0.90909091 0.77777778 0.75       0.6
 0.55555556 0.83333333 0.66666667 0.61538462]

mean value: 0.6807808857808858

key: train_precision
value: [0.84883721 0.87096774 0.81521739 0.84042553 0.82474227 0.84693878
 0.84269663 0.85869565 0.82978723 0.86516854]

mean value: 0.8443476972764284

key: test_recall
value: [0.54545455 0.36363636 1.         0.7        0.9        0.6
 0.45454545 0.45454545 0.72727273 0.72727273]

mean value: 0.6472727272727272

key: train_recall
value: [0.76842105 0.85263158 0.78125    0.82291667 0.83333333 0.86458333
 0.78947368 0.83157895 0.82105263 0.81052632]

mean value: 0.8175767543859649

key: test_roc_auc
value: [0.59090909 0.5        0.95454545 0.75909091 0.81363636 0.61818182
 0.52727273 0.67727273 0.66363636 0.61363636]

mean value: 0.6718181818181818

key: train_roc_auc
value: [0.81578947 0.86315789 0.80115132 0.83251096 0.82719298 0.8533443
 0.82182018 0.84808114 0.82719298 0.84276316]

mean value: 0.8333004385964912

key: test_jcc
value: [0.4        0.26666667 0.90909091 0.58333333 0.69230769 0.42857143
 0.33333333 0.41666667 0.53333333 0.5       ]

mean value: 0.5063303363303363

key: train_jcc
value: [0.67592593 0.75700935 0.66371681 0.71171171 0.7079646  0.74774775
 0.68807339 0.73148148 0.7027027  0.71962617]

mean value: 0.7105959894012878

MCC on Blind test: 0.4

Accuracy on Blind test: 0.68

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01154423 0.01183224 0.01161528 0.01154399 0.01124811 0.01126695
 0.01114488 0.01137328 0.01132202 0.01127124]

mean value: 0.011416220664978027

key: score_time
value: [0.00947404 0.00990534 0.00957632 0.00936294 0.00929713 0.00936723
 0.0095048  0.00931263 0.00929332 0.00933337]

mean value: 0.009442710876464843

key: test_mcc
value: [0.64715023 0.37796447 0.82572282 0.39196475 0.67419986 0.44038551
 0.62641448 0.90909091 0.82275335 0.23373675]

mean value: 0.5949383133354442

key: train_mcc
value: [0.79818857 0.79172691 0.77786752 0.76212373 0.79905587 0.82557489
 0.8047179  0.83546371 0.81125858 0.80327722]

mean value: 0.8009254890182891

key: test_accuracy
value: [0.81818182 0.68181818 0.9047619  0.66666667 0.80952381 0.71428571
 0.80952381 0.95238095 0.9047619  0.61904762]

mean value: 0.7880952380952381

key: train_accuracy
value: [0.89473684 0.88947368 0.88481675 0.87434555 0.89528796 0.91099476
 0.90052356 0.91623037 0.90052356 0.90052356]

mean value: 0.8967456599614219

key: test_fscore
value: [0.83333333 0.72       0.90909091 0.72       0.83333333 0.72727273
 0.83333333 0.95238095 0.91666667 0.66666667]

mean value: 0.8112077922077922

key: train_fscore
value: [0.90196078 0.89855072 0.89320388 0.88571429 0.90291262 0.91542289
 0.90452261 0.91919192 0.90731707 0.9035533 ]

mean value: 0.9032350090012564

key: test_precision
value: [0.76923077 0.64285714 0.83333333 0.6        0.71428571 0.66666667
 0.76923077 1.         0.84615385 0.61538462]

mean value: 0.7457142857142858

key: train_precision
value: [0.8440367  0.83035714 0.83636364 0.81578947 0.84545455 0.87619048
 0.86538462 0.88349515 0.84545455 0.87254902]

mean value: 0.8515075297875789

key: test_recall
value: [0.90909091 0.81818182 1.         0.9        1.         0.8
 0.90909091 0.90909091 1.         0.72727273]

mean value: 0.8972727272727272

key: train_recall
value: [0.96842105 0.97894737 0.95833333 0.96875    0.96875    0.95833333
 0.94736842 0.95789474 0.97894737 0.93684211]

mean value: 0.9622587719298246

key: test_roc_auc
value: [0.81818182 0.68181818 0.90909091 0.67727273 0.81818182 0.71818182
 0.80454545 0.95454545 0.9        0.61363636]

mean value: 0.7895454545454546

key: train_roc_auc
value: [0.89473684 0.88947368 0.88442982 0.87384868 0.89490132 0.91074561
 0.90076754 0.91644737 0.90093202 0.90071272]

mean value: 0.8966995614035087

key: test_jcc
value: [0.71428571 0.5625     0.83333333 0.5625     0.71428571 0.57142857
 0.71428571 0.90909091 0.84615385 0.5       ]

mean value: 0.6927863802863803

key: train_jcc
value: [0.82142857 0.81578947 0.80701754 0.79487179 0.82300885 0.8440367
 0.82568807 0.85046729 0.83035714 0.82407407]

mean value: 0.8236739510694793

MCC on Blind test: 0.66

Accuracy on Blind test: 0.85

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.84304023 0.7079587  0.69341254 0.83001733 0.68674278 0.71935916
 0.85144925 0.70927286 0.68605232 0.81782484]

mean value: 0.7545130014419555

key: score_time
value: [0.01477885 0.0163033  0.01992774 0.01476359 0.01474428 0.01230025
 0.01460433 0.01469612 0.01476789 0.0146749 ]

mean value: 0.015156126022338868

key: test_mcc
value: [0.63636364 0.36514837 0.71562645 0.55161872 0.63305416 0.44038551
 0.35527986 0.55161872 0.80909091 0.52295779]

mean value: 0.5581144128303206

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.81818182 0.68181818 0.85714286 0.76190476 0.80952381 0.71428571
 0.66666667 0.76190476 0.9047619  0.76190476]

mean value: 0.7738095238095238

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.81818182 0.69565217 0.84210526 0.7826087  0.81818182 0.72727273
 0.63157895 0.73684211 0.90909091 0.7826087 ]

mean value: 0.7744123153734137

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.81818182 0.66666667 0.88888889 0.69230769 0.75       0.66666667
 0.75       0.875      0.90909091 0.75      ]

mean value: 0.7766802641802641

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.72727273 0.8        0.9        0.9        0.8
 0.54545455 0.63636364 0.90909091 0.81818182]

mean value: 0.7854545454545455

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.81818182 0.68181818 0.85454545 0.76818182 0.81363636 0.71818182
 0.67272727 0.76818182 0.90454545 0.75909091]

mean value: 0.7759090909090909

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.69230769 0.53333333 0.72727273 0.64285714 0.69230769 0.57142857
 0.46153846 0.58333333 0.83333333 0.64285714]

mean value: 0.638056943056943

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.66

Accuracy on Blind test: 0.85

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01818252 0.01459813 0.01283693 0.01229072 0.01137853 0.01279497
 0.01171303 0.01191187 0.01186252 0.01267433]

mean value: 0.013024353981018066

key: score_time
value: [0.0117681  0.00920272 0.00893354 0.00869656 0.00866723 0.00870919
 0.00860786 0.0086484  0.00879455 0.00891566]

mean value: 0.00909438133239746

key: test_mcc
value: [1.         0.73029674 0.80909091 0.90829511 0.90909091 0.80909091
 0.90829511 0.90909091 0.71818182 0.43007562]

mean value: 0.8131508025748061

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.86363636 0.9047619  0.95238095 0.95238095 0.9047619
 0.95238095 0.95238095 0.85714286 0.71428571]

mean value: 0.9054112554112554

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.86956522 0.9        0.94736842 0.95238095 0.9
 0.95652174 0.95238095 0.85714286 0.75      ]

mean value: 0.9085360139479133

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.83333333 0.9        1.         0.90909091 0.9
 0.91666667 1.         0.9        0.69230769]

mean value: 0.9051398601398601

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.9        0.9        1.         0.9
 1.         0.90909091 0.81818182 0.81818182]

mean value: 0.9154545454545455

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.86363636 0.90454545 0.95       0.95454545 0.90454545
 0.95       0.95454545 0.85909091 0.70909091]

mean value: 0.905

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.76923077 0.81818182 0.9        0.90909091 0.81818182
 0.91666667 0.90909091 0.75       0.6       ]

mean value: 0.8390442890442891

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.85

Accuracy on Blind test: 0.93

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09295607 0.09225941 0.091506   0.09173632 0.0931685  0.09369612
 0.09518909 0.09360838 0.09597731 0.09335709]

mean value: 0.09334542751312255

key: score_time
value: [0.01729226 0.01723695 0.01731944 0.01766443 0.01851225 0.01764464
 0.01801538 0.01791477 0.01864171 0.01731777]

mean value: 0.017755961418151854

key: test_mcc
value: [0.63636364 0.63636364 1.         0.52727273 0.74795759 0.52727273
 0.4719399  1.         0.71562645 0.33028913]

mean value: 0.6593085799873805

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.81818182 0.81818182 1.         0.76190476 0.85714286 0.76190476
 0.71428571 1.         0.85714286 0.66666667]

mean value: 0.8255411255411256

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.81818182 0.81818182 1.         0.76190476 0.86956522 0.76190476
 0.66666667 1.         0.86956522 0.69565217]

mean value: 0.826162243553548

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.81818182 0.81818182 1.         0.72727273 0.76923077 0.72727273
 0.85714286 1.         0.83333333 0.66666667]

mean value: 0.8217282717282718

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.81818182 1.         0.8        1.         0.8
 0.54545455 1.         0.90909091 0.72727273]

mean value: 0.8418181818181818

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.81818182 0.81818182 1.         0.76363636 0.86363636 0.76363636
 0.72272727 1.         0.85454545 0.66363636]

mean value: 0.8268181818181818

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.69230769 0.69230769 1.         0.61538462 0.76923077 0.61538462
 0.5        1.         0.76923077 0.53333333]

mean value: 0.7187179487179487

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.69

Accuracy on Blind test: 0.85

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01018524 0.01018262 0.01032281 0.00936413 0.01030827 0.0103271
 0.00937414 0.00911331 0.00936627 0.00902772]

mean value: 0.009757161140441895

key: score_time
value: [0.00935912 0.00933194 0.00907087 0.00914145 0.00938082 0.00943184
 0.00886512 0.00865507 0.00879741 0.00870585]

mean value: 0.009073948860168457

key: test_mcc
value: [0.2773501  0.54772256 0.71818182 0.52295779 0.33636364 0.35527986
 0.55161872 0.52727273 0.52727273 0.13483997]

mean value: 0.4498859905944563

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.63636364 0.77272727 0.85714286 0.76190476 0.66666667 0.66666667
 0.76190476 0.76190476 0.76190476 0.57142857]

mean value: 0.7218614718614719

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.76190476 0.85714286 0.73684211 0.66666667 0.69565217
 0.73684211 0.76190476 0.76190476 0.64      ]

mean value: 0.7285526860629835

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.61538462 0.8        0.81818182 0.77777778 0.63636364 0.61538462
 0.875      0.8        0.8        0.57142857]

mean value: 0.7309521034521035

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.72727273 0.9        0.7        0.7        0.8
 0.63636364 0.72727273 0.72727273 0.72727273]

mean value: 0.7372727272727273

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.63636364 0.77272727 0.85909091 0.75909091 0.66818182 0.67272727
 0.76818182 0.76363636 0.76363636 0.56363636]

mean value: 0.7227272727272727

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.61538462 0.75       0.58333333 0.5        0.53333333
 0.58333333 0.61538462 0.61538462 0.47058824]

mean value: 0.5766742081447964

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.48

Accuracy on Blind test: 0.75

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.30615759 1.29314303 1.21581531 1.25626373 1.22632051 1.20627904
 1.20618749 1.20896769 1.21818185 1.19756627]

mean value: 1.2334882497787476

key: score_time
value: [0.09819245 0.09016585 0.09647751 0.09425235 0.0900619  0.08902097
 0.08858657 0.08892465 0.09014058 0.09470963]

mean value: 0.09205324649810791

key: test_mcc
value: [0.73029674 0.75592895 1.         0.90909091 0.90909091 0.71818182
 0.80909091 0.90909091 0.90829511 0.43007562]

mean value: 0.8079141865537268

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86363636 0.86363636 1.         0.95238095 0.95238095 0.85714286
 0.9047619  0.95238095 0.95238095 0.71428571]

mean value: 0.9012987012987013

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.86956522 0.88       1.         0.95238095 0.95238095 0.85714286
 0.90909091 0.95238095 0.95652174 0.75      ]

mean value: 0.9079463579898363

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.83333333 0.78571429 1.         0.90909091 0.90909091 0.81818182
 0.90909091 1.         0.91666667 0.69230769]

mean value: 0.8773476523476523

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         1.         1.         1.         0.9
 0.90909091 0.90909091 1.         0.81818182]

mean value: 0.9445454545454546

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86363636 0.86363636 1.         0.95454545 0.95454545 0.85909091
 0.90454545 0.95454545 0.95       0.70909091]

mean value: 0.9013636363636364

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.76923077 0.78571429 1.         0.90909091 0.90909091 0.75
 0.83333333 0.90909091 0.91666667 0.6       ]

mean value: 0.8382217782217782

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.84

Accuracy on Blind test: 0.93

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.90209532 0.89312887 0.89125085 0.87585258 0.90369177 0.9013927
 0.9555521  0.85647655 0.92886138 0.84741616]

mean value: 0.8955718278884888

key: score_time
value: [0.25639296 0.21086621 0.20298195 0.16946197 0.25704312 0.1911087
 0.12228298 0.24127936 0.19275093 0.20414257]

mean value: 0.20483107566833497

key: test_mcc
value: [0.83205029 0.75592895 1.         0.90909091 0.90909091 0.63305416
 0.71818182 0.90909091 0.82275335 0.43007562]

mean value: 0.7919316917369833

key: train_mcc
value: [0.95874497 0.96890428 0.9690588  0.95894679 0.9690588  0.9690588
 0.95896444 0.96906883 0.95896444 0.97927405]

mean value: 0.9660044199886203

key: test_accuracy
value: [0.90909091 0.86363636 1.         0.95238095 0.95238095 0.80952381
 0.85714286 0.95238095 0.9047619  0.71428571]

mean value: 0.8915584415584416

key: train_accuracy
value: [0.97894737 0.98421053 0.98429319 0.97905759 0.98429319 0.98429319
 0.97905759 0.98429319 0.97905759 0.9895288 ]

mean value: 0.9827032240286581

key: test_fscore
value: [0.91666667 0.88       1.         0.95238095 0.95238095 0.81818182
 0.85714286 0.95238095 0.91666667 0.75      ]

mean value: 0.8995800865800866

key: train_fscore
value: [0.97938144 0.98445596 0.98461538 0.97959184 0.98461538 0.98461538
 0.97938144 0.98445596 0.97938144 0.98958333]

mean value: 0.9830077570909534

key: test_precision
value: [0.84615385 0.78571429 1.         0.90909091 0.90909091 0.75
 0.9        1.         0.84615385 0.69230769]

mean value: 0.8638511488511489

key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
train_precision
value: [0.95959596 0.96938776 0.96969697 0.96       0.96969697 0.96969697
 0.95959596 0.96938776 0.95959596 0.97938144]

mean value: 0.9666035741381839

key: test_recall
value: [1.         1.         1.         1.         1.         0.9
 0.81818182 0.90909091 1.         0.81818182]

mean value: 0.9445454545454546

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90909091 0.86363636 1.         0.95454545 0.95454545 0.81363636
 0.85909091 0.95454545 0.9        0.70909091]

mean value: 0.8918181818181818

key: train_roc_auc
value: [0.97894737 0.98421053 0.98421053 0.97894737 0.98421053 0.98421053
 0.97916667 0.984375   0.97916667 0.98958333]

mean value: 0.982702850877193

key: test_jcc
value: [0.84615385 0.78571429 1.         0.90909091 0.90909091 0.69230769
 0.75       0.90909091 0.84615385 0.6       ]

mean value: 0.8247602397602397

key: train_jcc
value: [0.95959596 0.96938776 0.96969697 0.96       0.96969697 0.96969697
 0.95959596 0.96938776 0.95959596 0.97938144]

mean value: 0.9666035741381839

MCC on Blind test: 0.87

Accuracy on Blind test: 0.94

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02339101 0.00911975 0.01014471 0.00981402 0.00911069 0.0097158
 0.0100708  0.00990605 0.00931025 0.01011109]

mean value: 0.011069416999816895

key: score_time
value: [0.01295066 0.0090971  0.00953102 0.00900054 0.00888133 0.01002598
 0.00940371 0.00870204 0.00876236 0.00936818]

mean value: 0.009572291374206543

key: test_mcc
value: [0.63636364 0.45454545 0.61818182 0.33636364 0.63305416 0.71562645
 0.55161872 0.4719399  0.23373675 0.43007562]

mean value: 0.5081506148469601

key: train_mcc
value: [0.70920321 0.72063664 0.67566396 0.73823885 0.6859713  0.6859713
 0.69638158 0.69638158 0.62349105 0.7403031 ]

mean value: 0.697224255471689

key: test_accuracy
value: [0.81818182 0.72727273 0.80952381 0.66666667 0.80952381 0.85714286
 0.76190476 0.71428571 0.61904762 0.71428571]

mean value: 0.7497835497835498

key: train_accuracy
value: [0.85263158 0.85789474 0.83769634 0.86910995 0.84293194 0.84293194
 0.84816754 0.84816754 0.81151832 0.86910995]

mean value: 0.8480159823642877

key: test_fscore
value: [0.81818182 0.72727273 0.8        0.66666667 0.81818182 0.84210526
 0.73684211 0.66666667 0.66666667 0.75      ]

mean value: 0.7492583732057416

key: train_fscore
value: [0.86       0.86567164 0.84102564 0.87046632 0.84536082 0.84536082
 0.84816754 0.84816754 0.80645161 0.87309645]

mean value: 0.850376839168251

key: test_precision
value: [0.81818182 0.72727273 0.8        0.63636364 0.75       0.88888889
 0.875      0.85714286 0.61538462 0.69230769]

mean value: 0.7660542235542236

key: train_precision
value: [0.81904762 0.82075472 0.82828283 0.86597938 0.83673469 0.83673469
 0.84375    0.84375    0.82417582 0.84313725]

mean value: 0.8362347012587765

key: test_recall
value: [0.81818182 0.72727273 0.8        0.7        0.9        0.8
 0.63636364 0.54545455 0.72727273 0.81818182]

mean value: 0.7472727272727273

key: train_recall
value: [0.90526316 0.91578947 0.85416667 0.875      0.85416667 0.85416667
 0.85263158 0.85263158 0.78947368 0.90526316]

mean value: 0.8658552631578947

key: test_roc_auc
value: [0.81818182 0.72727273 0.80909091 0.66818182 0.81363636 0.85454545
 0.76818182 0.72272727 0.61363636 0.70909091]

mean value: 0.7504545454545455

key: train_roc_auc
value: [0.85263158 0.85789474 0.83760965 0.86907895 0.84287281 0.84287281
 0.84819079 0.84819079 0.81140351 0.86929825]

mean value: 0.8480043859649123

key: test_jcc
value: [0.69230769 0.57142857 0.66666667 0.5        0.69230769 0.72727273
 0.58333333 0.5        0.5        0.6       ]

mean value: 0.6033316683316683

key: train_jcc
value: [0.75438596 0.76315789 0.72566372 0.7706422  0.73214286 0.73214286
 0.73636364 0.73636364 0.67567568 0.77477477]

mean value: 0.7401313215761582

MCC on Blind test: 0.55

Accuracy on Blind test: 0.78

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.08275819 0.05508804 0.04714227 0.05396843 0.06033301 0.0590179
 0.05804062 0.05956721 0.05484629 0.0539155 ]

mean value: 0.058467745780944824

key: score_time
value: [0.01187897 0.01033783 0.0107851  0.01045775 0.01141596 0.01055002
 0.01059866 0.01087523 0.01029396 0.01083279]

mean value: 0.010802626609802246

key: test_mcc
value: [1.         0.73029674 0.90909091 1.         0.90909091 0.90909091
 0.90829511 0.90909091 0.90829511 0.80909091]

mean value: 0.8992341501253261

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.86363636 0.95238095 1.         0.95238095 0.95238095
 0.95238095 0.95238095 0.95238095 0.9047619 ]

mean value: 0.9482683982683983

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.86956522 0.95238095 1.         0.95238095 0.95238095
 0.95652174 0.95238095 0.95652174 0.90909091]

mean value: 0.9501223414266893

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.83333333 0.90909091 1.         0.90909091 0.90909091
 0.91666667 1.         0.91666667 0.90909091]

mean value: 0.9303030303030303

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 1.         1.         1.         1.
 1.         0.90909091 1.         0.90909091]

mean value: 0.9727272727272727

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.86363636 0.95454545 1.         0.95454545 0.95454545
 0.95       0.95454545 0.95       0.90454545]

mean value: 0.9486363636363636

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.76923077 0.90909091 1.         0.90909091 0.90909091
 0.91666667 0.90909091 0.91666667 0.83333333]

mean value: 0.9072261072261072

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03119111 0.06622481 0.02681947 0.0669136  0.05784321 0.0625658
 0.05946803 0.02557802 0.05062222 0.03925514]

mean value: 0.04864814281463623

key: score_time
value: [0.02157354 0.01243377 0.01237917 0.02732563 0.02043533 0.02117467
 0.0124836  0.01243639 0.02221417 0.01236629]

mean value: 0.01748225688934326

key: test_mcc
value: [ 0.73029674  0.73029674  0.42727273  0.52295779  0.80909091  0.71562645
  0.63305416  0.39196475  0.52727273 -0.05504819]

mean value: 0.5432784810726394

key: train_mcc
value: [1.         1.         1.         1.         1.         0.9895822
 0.97927405 0.98958333 1.         1.        ]

mean value: 0.9958439579407851

key: test_accuracy
value: [0.86363636 0.86363636 0.71428571 0.76190476 0.9047619  0.85714286
 0.80952381 0.66666667 0.76190476 0.47619048]

mean value: 0.767965367965368

key: train_accuracy
value: [1.        1.        1.        1.        1.        0.9947644 0.9895288
 0.9947644 1.        1.       ]

mean value: 0.9979057591623037

key: test_fscore
value: [0.86956522 0.86956522 0.7        0.73684211 0.9        0.84210526
 0.8        0.58823529 0.76190476 0.52173913]

mean value: 0.7589956989660853

key: train_fscore
value: [1.         1.         1.         1.         1.         0.99481865
 0.98958333 0.9947644  1.         1.        ]

mean value: 0.9979166384088833

key: test_precision
value: [0.83333333 0.83333333 0.7        0.77777778 0.9        0.88888889
 0.88888889 0.83333333 0.8        0.5       ]

mean value: 0.7955555555555556

key: train_precision
value: [1.         1.         1.         1.         1.         0.98969072
 0.97938144 0.98958333 1.         1.        ]

mean value: 0.9958655498281787

key: test_recall
value: [0.90909091 0.90909091 0.7        0.7        0.9        0.8
 0.72727273 0.45454545 0.72727273 0.54545455]

mean value: 0.7372727272727273

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86363636 0.86363636 0.71363636 0.75909091 0.90454545 0.85454545
 0.81363636 0.67727273 0.76363636 0.47272727]

mean value: 0.7686363636363636

key: train_roc_auc
value: [1.         1.         1.         1.         1.         0.99473684
 0.98958333 0.99479167 1.         1.        ]

mean value: 0.9979111842105263

key: test_jcc
value: [0.76923077 0.76923077 0.53846154 0.58333333 0.81818182 0.72727273
 0.66666667 0.41666667 0.61538462 0.35294118]

mean value: 0.6257370080899493

key: train_jcc
value: [1.         1.         1.         1.         1.         0.98969072
 0.97938144 0.98958333 1.         1.        ]

mean value: 0.9958655498281787

MCC on Blind test: 0.7

Accuracy on Blind test: 0.86

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0188179  0.01023531 0.01005864 0.00986576 0.00988483 0.01002455
 0.0098505  0.01014495 0.00988603 0.00996375]

mean value: 0.010873222351074218

key: score_time
value: [0.01108694 0.00969553 0.00944519 0.00939345 0.00933337 0.00927711
 0.00928974 0.00942636 0.00939441 0.0093236 ]

mean value: 0.009566569328308105

key: test_mcc
value: [0.54772256 0.46225016 0.90909091 0.4719399  0.82572282 0.52295779
 0.44038551 0.90909091 0.52727273 0.13762047]

mean value: 0.5754053759176136

key: train_mcc
value: [0.6344324  0.67824625 0.62776058 0.7105481  0.6460861  0.66711224
 0.6548207  0.63650874 0.61435486 0.69958718]

mean value: 0.6569457148583459

key: test_accuracy
value: [0.77272727 0.72727273 0.95238095 0.71428571 0.9047619  0.76190476
 0.71428571 0.95238095 0.76190476 0.57142857]

mean value: 0.7833333333333333

key: train_accuracy
value: [0.81578947 0.83684211 0.81151832 0.85340314 0.82198953 0.83246073
 0.82722513 0.81675393 0.80628272 0.84816754]

mean value: 0.8270432626067787

key: test_fscore
value: [0.7826087  0.75       0.95238095 0.75       0.90909091 0.73684211
 0.7        0.95238095 0.76190476 0.60869565]

mean value: 0.7903904028846821

key: train_fscore
value: [0.8241206  0.84577114 0.82352941 0.86138614 0.83       0.84
 0.82901554 0.8241206  0.81218274 0.85427136]

mean value: 0.8344397542629447

key: test_precision
value: [0.75       0.69230769 0.90909091 0.64285714 0.83333333 0.77777778
 0.77777778 1.         0.8        0.58333333]

mean value: 0.7766477966477967

key: train_precision
value: [0.78846154 0.80188679 0.77777778 0.82075472 0.79807692 0.80769231
 0.81632653 0.78846154 0.78431373 0.81730769]

mean value: 0.8001059543314181

key: test_recall
value: [0.81818182 0.81818182 1.         0.9        1.         0.7
 0.63636364 0.90909091 0.72727273 0.63636364]

mean value: 0.8145454545454546

key: train_recall
value: [0.86315789 0.89473684 0.875      0.90625    0.86458333 0.875
 0.84210526 0.86315789 0.84210526 0.89473684]

mean value: 0.8720833333333333

key: test_roc_auc
value: [0.77272727 0.72727273 0.95454545 0.72272727 0.90909091 0.75909091
 0.71818182 0.95454545 0.76363636 0.56818182]

mean value: 0.785

key: train_roc_auc
value: [0.81578947 0.83684211 0.81118421 0.853125   0.82176535 0.83223684
 0.82730263 0.81699561 0.8064693  0.84841009]

mean value: 0.8270120614035088

key: test_jcc
value: [0.64285714 0.6        0.90909091 0.6        0.83333333 0.58333333
 0.53846154 0.90909091 0.61538462 0.4375    ]

mean value: 0.6669051781551781

key: train_jcc
value: [0.7008547  0.73275862 0.7        0.75652174 0.70940171 0.72413793
 0.7079646  0.7008547  0.68376068 0.74561404]

mean value: 0.7161868722583998

MCC on Blind test: 0.61

Accuracy on Blind test: 0.82

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01093888 0.01521778 0.01604748 0.0163331  0.0161612  0.01637101
 0.01448059 0.01658916 0.01551127 0.01330352]

mean value: 0.015095400810241699

key: score_time
value: [0.00858784 0.01158476 0.01148224 0.01158595 0.01156044 0.01155162
 0.01152015 0.01150441 0.01155853 0.01151228]

mean value: 0.011244821548461913

key: test_mcc
value: [0.73029674 0.63636364 0.90829511 0.71818182 0.74795759 0.63305416
 0.71562645 0.46249729 0.66332496 0.45226702]

mean value: 0.6667864773402087

key: train_mcc
value: [0.88073886 0.92884073 0.95831967 0.95831967 0.92917291 0.94893045
 0.91949402 0.94893045 0.83546973 0.73122789]

mean value: 0.9039444374117305

key: test_accuracy
value: [0.86363636 0.81818182 0.95238095 0.85714286 0.85714286 0.80952381
 0.85714286 0.66666667 0.80952381 0.71428571]

mean value: 0.8205627705627705

key: train_accuracy
value: [0.93684211 0.96315789 0.97905759 0.97905759 0.96335079 0.97382199
 0.95811518 0.97382199 0.91099476 0.84816754]

mean value: 0.9486387434554974

key: test_fscore
value: [0.85714286 0.81818182 0.94736842 0.85714286 0.86956522 0.81818182
 0.86956522 0.53333333 0.84615385 0.76923077]

mean value: 0.818586615520254

key: train_fscore
value: [0.93258427 0.96174863 0.97938144 0.97938144 0.96482412 0.97461929
 0.95959596 0.97297297 0.9178744  0.86757991]

mean value: 0.9510562437463755

key: test_precision
value: [0.9        0.81818182 1.         0.81818182 0.76923077 0.75
 0.83333333 1.         0.73333333 0.66666667]

mean value: 0.8288927738927739

key: train_precision
value: [1.         1.         0.96938776 0.96938776 0.93203883 0.95049505
 0.9223301  1.         0.84821429 0.76612903]

mean value: 0.9357982809720218

key: test_recall
value: [0.81818182 0.81818182 0.9        0.9        1.         0.9
 0.90909091 0.36363636 1.         0.90909091]

mean value: 0.8518181818181818

key: train_recall
value: [0.87368421 0.92631579 0.98958333 0.98958333 1.         1.
 1.         0.94736842 1.         1.        ]

mean value: 0.9726535087719298

key: test_roc_auc
value: [0.86363636 0.81818182 0.95       0.85909091 0.86363636 0.81363636
 0.85454545 0.68181818 0.8        0.70454545]

mean value: 0.8209090909090909

key: train_roc_auc
value: [0.93684211 0.96315789 0.97900219 0.97900219 0.96315789 0.97368421
 0.95833333 0.97368421 0.91145833 0.84895833]

mean value: 0.9487280701754386

key: test_jcc
value: [0.75       0.69230769 0.9        0.75       0.76923077 0.69230769
 0.76923077 0.36363636 0.73333333 0.625     ]

mean value: 0.704504662004662

key: train_jcc
value: [0.87368421 0.92631579 0.95959596 0.95959596 0.93203883 0.95049505
 0.9223301  0.94736842 0.84821429 0.76612903]

mean value: 0.9085767639760687

MCC on Blind test: 0.47

Accuracy on Blind test: 0.68

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01468706 0.01549625 0.01435971 0.01410246 0.01430082 0.01447487
 0.01353121 0.01427817 0.01515913 0.01476002]

mean value: 0.014514970779418945

key: score_time
value: [0.01159954 0.0116024  0.01148248 0.01152968 0.01152968 0.01156878
 0.01153922 0.01157212 0.01158714 0.01147294]

mean value: 0.011548399925231934

key: test_mcc
value: [0.68313005 0.56694671 0.82275335 0.82275335 0.67419986 0.52727273
 0.71562645 0.46249729 0.80909091 0.33709993]

mean value: 0.6421370630446213

key: train_mcc
value: [0.65465367 0.74657689 0.91643821 0.82648365 0.74743893 0.92674636
 0.92922547 0.75599417 0.9690588  0.85363527]

mean value: 0.8326251416033537

key: test_accuracy
value: [0.81818182 0.77272727 0.9047619  0.9047619  0.80952381 0.76190476
 0.85714286 0.66666667 0.9047619  0.66666667]

mean value: 0.8067099567099567

key: train_accuracy
value: [0.8        0.85789474 0.95811518 0.90575916 0.85863874 0.96335079
 0.96335079 0.86387435 0.98429319 0.92146597]

mean value: 0.9076742904381372

key: test_fscore
value: [0.77777778 0.73684211 0.88888889 0.88888889 0.83333333 0.76190476
 0.86956522 0.53333333 0.90909091 0.72      ]

mean value: 0.7919625215872356

key: train_fscore
value: [0.75       0.83435583 0.95789474 0.89655172 0.87671233 0.96373057
 0.96446701 0.84146341 0.98395722 0.92682927]

mean value: 0.8995962095170513

key: test_precision
value: [1.         0.875      1.         1.         0.71428571 0.72727273
 0.83333333 1.         0.90909091 0.64285714]

mean value: 0.8701839826839827

key: train_precision
value: [1.         1.         0.96808511 1.         0.7804878  0.95876289
 0.93137255 1.         1.         0.86363636]

mean value: 0.9502344710514937

key: test_recall
value: [0.63636364 0.63636364 0.8        0.8        1.         0.8
 0.90909091 0.36363636 0.90909091 0.81818182]

mean value: 0.7672727272727273

key: train_recall
value: [0.6        0.71578947 0.94791667 0.8125     1.         0.96875
 1.         0.72631579 0.96842105 1.        ]

mean value: 0.873969298245614

key: test_roc_auc
value: [0.81818182 0.77272727 0.9        0.9        0.81818182 0.76363636
 0.85454545 0.68181818 0.90454545 0.65909091]

mean value: 0.8072727272727273

key: train_roc_auc
value: [0.8        0.85789474 0.95816886 0.90625    0.85789474 0.96332237
 0.96354167 0.86315789 0.98421053 0.921875  ]

mean value: 0.9076315789473685

key: test_jcc
value: [0.63636364 0.58333333 0.8        0.8        0.71428571 0.61538462
 0.76923077 0.36363636 0.83333333 0.5625    ]

mean value: 0.6678067765567766

key: train_jcc
value: [0.6        0.71578947 0.91919192 0.8125     0.7804878  0.93
 0.93137255 0.72631579 0.96842105 0.86363636]

mean value: 0.8247714952515414

MCC on Blind test: 0.73

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.12453985 0.11010838 0.11050749 0.11169887 0.11265731 0.11060405
 0.11088371 0.10958815 0.11039591 0.11021256]

mean value: 0.11211962699890136

key: score_time
value: [0.01472282 0.01498485 0.01611757 0.01495147 0.01522636 0.01489305
 0.0149107  0.01497507 0.01493835 0.01501346]

mean value: 0.015073370933532716

key: test_mcc
value: [1.         0.83205029 0.90909091 0.82275335 0.90909091 0.90909091
 0.90829511 1.         0.90829511 0.90829511]

mean value: 0.9106961691505756

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.90909091 0.95238095 0.9047619  0.95238095 0.95238095
 0.95238095 1.         0.95238095 0.95238095]

mean value: 0.9528138528138528

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.91666667 0.95238095 0.88888889 0.95238095 0.95238095
 0.95652174 1.         0.95652174 0.95652174]

mean value: 0.9532263630089717

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.84615385 0.90909091 1.         0.90909091 0.90909091
 0.91666667 1.         0.91666667 0.91666667]

mean value: 0.9323426573426573

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.  1.  1.  0.8 1.  1.  1.  1.  1.  1. ]

mean value: 0.98

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.90909091 0.95454545 0.9        0.95454545 0.95454545
 0.95       1.         0.95       0.95      ]

mean value: 0.9522727272727273

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.84615385 0.90909091 0.8        0.90909091 0.90909091
 0.91666667 1.         0.91666667 0.91666667]

mean value: 0.9123426573426573

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03981161 0.04514956 0.04036641 0.04546928 0.03787112 0.03224587
 0.04925346 0.03796482 0.03647232 0.0328362 ]

mean value: 0.03974406719207764

key: score_time
value: [0.01733565 0.03619933 0.03100276 0.02404404 0.01759744 0.03142309
 0.02289605 0.02225947 0.02215791 0.02506065]

mean value: 0.024997639656066894

key: test_mcc
value: [1.         0.83205029 0.80909091 1.         0.90909091 1.
 0.90829511 0.90909091 0.80909091 0.80909091]

mean value: 0.8985799946021636

key: train_mcc
value: [0.98952851 0.98952851 1.         0.9895822  1.         0.9895822
 1.         0.98958333 1.         1.        ]

mean value: 0.9947804741799376

key: test_accuracy
value: [1.         0.90909091 0.9047619  1.         0.95238095 1.
 0.95238095 0.95238095 0.9047619  0.9047619 ]

mean value: 0.948051948051948

key: train_accuracy
value: [0.99473684 0.99473684 1.         0.9947644  1.         0.9947644
 1.         0.9947644  1.         1.        ]

mean value: 0.9973766877927804

key: test_fscore
value: [1.         0.91666667 0.9        1.         0.95238095 1.
 0.95652174 0.95238095 0.90909091 0.90909091]

mean value: 0.9496132128740824

key: train_fscore
value: [0.9947644  0.9947644  1.         0.99481865 1.         0.99481865
 1.         0.9947644  1.         1.        ]

mean value: 0.9973930499416759

key: test_precision
value: [1.         0.84615385 0.9        1.         0.90909091 1.
 0.91666667 1.         0.90909091 0.90909091]

mean value: 0.939009324009324

key: train_precision
value: [0.98958333 0.98958333 1.         0.98969072 1.         0.98969072
 1.         0.98958333 1.         1.        ]

mean value: 0.9948131443298969

key: test_recall
value: [1.         1.         0.9        1.         1.         1.
 1.         0.90909091 0.90909091 0.90909091]

mean value: 0.9627272727272728

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.90909091 0.90454545 1.         0.95454545 1.
 0.95       0.95454545 0.90454545 0.90454545]

mean value: 0.9481818181818181

key: train_roc_auc
value: [0.99473684 0.99473684 1.         0.99473684 1.         0.99473684
 1.         0.99479167 1.         1.        ]

mean value: 0.997373903508772

key: test_jcc
value: [1.         0.84615385 0.81818182 1.         0.90909091 1.
 0.91666667 0.90909091 0.83333333 0.83333333]

mean value: 0.9065850815850816

key: train_jcc
value: [0.98958333 0.98958333 1.         0.98969072 1.         0.98969072
 1.         0.98958333 1.         1.        ]

mean value: 0.9948131443298969

MCC on Blind test: 0.93

Accuracy on Blind test: 0.97

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.05713534 0.08317208 0.0825665  0.08128595 0.05665421 0.06311941
 0.07721305 0.05506492 0.0598321  0.06101608]

mean value: 0.06770596504211426

key: score_time
value: [0.02309966 0.02279806 0.02322483 0.02260709 0.01263595 0.02153444
 0.02136874 0.02311778 0.0216713  0.02307773]

mean value: 0.02151355743408203

key: test_mcc
value: [0.46225016 0.36514837 0.90909091 0.43007562 0.82572282 0.42727273
 0.06741999 0.53300179 0.82275335 0.13762047]

mean value: 0.49803562097870197

key: train_mcc
value: [0.98952851 1.         1.         1.         1.         0.98958333
 1.         1.         0.9895822  0.9895822 ]

mean value: 0.9958276234546217

key: test_accuracy
value: [0.72727273 0.68181818 0.95238095 0.71428571 0.9047619  0.71428571
 0.52380952 0.71428571 0.9047619  0.57142857]

mean value: 0.740909090909091

key: train_accuracy
value: [0.99473684 1.         1.         1.         1.         0.9947644
 1.         1.         0.9947644  0.9947644 ]

mean value: 0.997903003582254

key: test_fscore
value: [0.7        0.69565217 0.95238095 0.66666667 0.90909091 0.7
 0.44444444 0.625      0.91666667 0.60869565]

mean value: 0.7218597465336596

key: train_fscore
value: [0.9947644  1.         1.         1.         1.         0.9947644
 1.         1.         0.99470899 0.99470899]

mean value: 0.9978946785229508

key: test_precision
value: [0.77777778 0.66666667 0.90909091 0.75       0.83333333 0.7
 0.57142857 1.         0.84615385 0.58333333]

mean value: 0.7637784437784437

key: train_precision
value: [0.98958333 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9989583333333334

key: test_recall
value: [0.63636364 0.72727273 1.         0.6        1.         0.7
 0.36363636 0.45454545 1.         0.63636364]

mean value: 0.7118181818181818

key: train_recall
value: [1.         1.         1.         1.         1.         0.98958333
 1.         1.         0.98947368 0.98947368]

mean value: 0.9968530701754386

key: test_roc_auc
value: [0.72727273 0.68181818 0.95454545 0.70909091 0.90909091 0.71363636
 0.53181818 0.72727273 0.9        0.56818182]

mean value: 0.7422727272727272

key: train_roc_auc
value: [0.99473684 1.         1.         1.         1.         0.99479167
 1.         1.         0.99473684 0.99473684]

mean value: 0.9979002192982456

key: test_jcc
value: [0.53846154 0.53333333 0.90909091 0.5        0.83333333 0.53846154
 0.28571429 0.45454545 0.84615385 0.4375    ]

mean value: 0.5876594239094239

key: train_jcc
value: [0.98958333 1.         1.         1.         1.         0.98958333
 1.         1.         0.98947368 0.98947368]

mean value: 0.995811403508772

MCC on Blind test: 0.43

Accuracy on Blind test: 0.72

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.35024142 0.32633758 0.34231949 0.32462835 0.33578348 0.33862019
 0.33136415 0.35129523 0.33722878 0.32974505]

mean value: 0.33675637245178225

key: score_time
value: [0.00945258 0.00955868 0.0095489  0.00915074 0.0095396  0.00941062
 0.00953364 0.01018429 0.00923729 0.00993609]

mean value: 0.009555244445800781

key: test_mcc
value: [1.         0.73029674 0.90909091 0.90829511 0.90909091 0.90909091
 0.90829511 0.90909091 0.90829511 0.43007562]

mean value: 0.852162131379549

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.86363636 0.95238095 0.95238095 0.95238095 0.95238095
 0.95238095 0.95238095 0.95238095 0.71428571]

mean value: 0.9244588744588744

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.86956522 0.95238095 0.94736842 0.95238095 0.95238095
 0.95652174 0.95238095 0.95652174 0.75      ]

mean value: 0.9289500926228615

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.83333333 0.90909091 1.         0.90909091 0.90909091
 0.91666667 1.         0.91666667 0.69230769]

mean value: 0.9086247086247086

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 1.         0.9        1.         1.
 1.         0.90909091 1.         0.81818182]

mean value: 0.9536363636363636

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.86363636 0.95454545 0.95       0.95454545 0.95454545
 0.95       0.95454545 0.95       0.70909091]

mean value: 0.9240909090909091

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.76923077 0.90909091 0.9        0.90909091 0.90909091
 0.91666667 0.90909091 0.91666667 0.6       ]

mean value: 0.8738927738927739

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.86

Accuracy on Blind test: 0.94

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02344608 0.02274013 0.02399993 0.0233686  0.02374434 0.02332354
 0.02358508 0.02412319 0.02367616 0.02417731]

mean value: 0.023618435859680174

key: score_time
value: [0.01234913 0.01238394 0.01759195 0.01474524 0.01687384 0.01484942
 0.01608396 0.02134705 0.01809335 0.01308537]

mean value: 0.015740323066711425

key: test_mcc
value: [0.27272727 0.         0.33028913 0.05504819 0.44038551 0.33028913
 0.15894099 0.23373675 0.43007562 0.15894099]

mean value: 0.24104335656188314

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.63636364 0.5        0.66666667 0.52380952 0.71428571 0.66666667
 0.57142857 0.61904762 0.71428571 0.57142857]

mean value: 0.6183982683982684

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.63636364 0.52173913 0.63157895 0.54545455 0.72727273 0.63157895
 0.52631579 0.66666667 0.75       0.52631579]

mean value: 0.6163286179876569

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.63636364 0.5        0.66666667 0.5        0.66666667 0.66666667
 0.625      0.61538462 0.69230769 0.625     ]

mean value: 0.6194055944055944

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.54545455 0.6        0.6        0.8        0.6
 0.45454545 0.72727273 0.81818182 0.45454545]

mean value: 0.6236363636363637

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.63636364 0.5        0.66363636 0.52727273 0.71818182 0.66363636
 0.57727273 0.61363636 0.70909091 0.57727273]

mean value: 0.6186363636363637

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.46666667 0.35294118 0.46153846 0.375      0.57142857 0.46153846
 0.35714286 0.5        0.6        0.35714286]

mean value: 0.4503399051928464

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.01

Accuracy on Blind test: 0.45

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03454781 0.03430343 0.03426123 0.03424621 0.03509831 0.03485656
 0.0354929  0.03958178 0.04269385 0.03967881]

mean value: 0.03647608757019043

key: score_time
value: [0.03377104 0.02306747 0.02031469 0.02201653 0.02278471 0.02362514
 0.0235343  0.02275467 0.02374935 0.02373099]

mean value: 0.02393488883972168

key: test_mcc
value: [0.83205029 0.63636364 0.80909091 0.71818182 0.90909091 0.80909091
 0.90829511 0.63305416 0.71562645 0.42727273]

mean value: 0.7398116921937792

key: train_mcc
value: [0.94784115 0.95810708 0.95894679 0.9690588  0.95894679 0.94810203
 0.95896444 0.95832877 0.95896444 0.97927405]

mean value: 0.9596534335646937

key: test_accuracy
value: [0.90909091 0.81818182 0.9047619  0.85714286 0.95238095 0.9047619
 0.95238095 0.80952381 0.85714286 0.71428571]

mean value: 0.8679653679653679

key: train_accuracy
value: [0.97368421 0.97894737 0.97905759 0.98429319 0.97905759 0.97382199
 0.97905759 0.97905759 0.97905759 0.9895288 ]

mean value: 0.9795563516120144

key: test_fscore
value: [0.91666667 0.81818182 0.9        0.85714286 0.95238095 0.9
 0.95652174 0.8        0.86956522 0.72727273]

mean value: 0.8697731978166761

key: train_fscore
value: [0.97409326 0.97916667 0.97959184 0.98461538 0.97959184 0.97435897
 0.97938144 0.97916667 0.97938144 0.98958333]

mean value: 0.9798930849957056

key: test_precision
value: [0.84615385 0.81818182 0.9        0.81818182 0.90909091 0.9
 0.91666667 0.88888889 0.83333333 0.72727273]

mean value: 0.8557770007770008

key: train_precision
value: [0.95918367 0.96907216 0.96       0.96969697 0.96       0.95959596
 0.95959596 0.96907216 0.95959596 0.97938144]

mean value: 0.9645194295150112

key: test_recall
value: [1.         0.81818182 0.9        0.9        1.         0.9
 1.         0.72727273 0.90909091 0.72727273]

mean value: 0.8881818181818182

key: train_recall /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)

value: [0.98947368 0.98947368 1.         1.         1.         0.98958333
 1.         0.98947368 1.         1.        ]

mean value: 0.9958004385964913

key: test_roc_auc
value: [0.90909091 0.81818182 0.90454545 0.85909091 0.95454545 0.90454545
 0.95       0.81363636 0.85454545 0.71363636]

mean value: 0.8681818181818182

key: train_roc_auc
value: [0.97368421 0.97894737 0.97894737 0.98421053 0.97894737 0.97373904
 0.97916667 0.97911184 0.97916667 0.98958333]

mean value: 0.9795504385964913

key: test_jcc
value: [0.84615385 0.69230769 0.81818182 0.75       0.90909091 0.81818182
 0.91666667 0.66666667 0.76923077 0.57142857]

mean value: 0.7757908757908758

key: train_jcc
value: [0.94949495 0.95918367 0.96       0.96969697 0.96       0.95
 0.95959596 0.95918367 0.95959596 0.97938144]

mean value: 0.9606132628621583

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.26996922 0.23063183 0.27355933 0.23035693 0.261343   0.21444583
 0.22415519 0.27059078 0.30639648 0.24822783]

mean value: 0.25296764373779296

key: score_time
value: [0.02385116 0.02095103 0.02128196 0.02257371 0.02106357 0.02217174
 0.02236676 0.02201104 0.02247214 0.02237701]

mean value: 0.022112011909484863

key: test_mcc
value: [0.83205029 0.63636364 0.80909091 0.71818182 0.90909091 0.80909091
 0.90829511 0.63305416 0.71562645 0.42727273]

mean value: 0.7398116921937792

key: train_mcc
value: [0.94784115 0.95810708 0.95894679 0.9690588  0.95894679 0.94810203
 0.95896444 0.95832877 0.95896444 0.97927405]

mean value: 0.9596534335646937

key: test_accuracy
value: [0.90909091 0.81818182 0.9047619  0.85714286 0.95238095 0.9047619
 0.95238095 0.80952381 0.85714286 0.71428571]

mean value: 0.8679653679653679

key: train_accuracy
value: [0.97368421 0.97894737 0.97905759 0.98429319 0.97905759 0.97382199
 0.97905759 0.97905759 0.97905759 0.9895288 ]

mean value: 0.9795563516120144

key: test_fscore
value: [0.91666667 0.81818182 0.9        0.85714286 0.95238095 0.9
 0.95652174 0.8        0.86956522 0.72727273]

mean value: 0.8697731978166761

key: train_fscore
value: [0.97409326 0.97916667 0.97959184 0.98461538 0.97959184 0.97435897
 0.97938144 0.97916667 0.97938144 0.98958333]

mean value: 0.9798930849957056

key: test_precision
value: [0.84615385 0.81818182 0.9        0.81818182 0.90909091 0.9
 0.91666667 0.88888889 0.83333333 0.72727273]

mean value: 0.8557770007770008

key: train_precision
value: [0.95918367 0.96907216 0.96       0.96969697 0.96       0.95959596
 0.95959596 0.96907216 0.95959596 0.97938144]

mean value: 0.9645194295150112

key: test_recall
value: [1.         0.81818182 0.9        0.9        1.         0.9
 1.         0.72727273 0.90909091 0.72727273]

mean value: 0.8881818181818182

key: train_recall
value: [0.98947368 0.98947368 1.         1.         1.         0.98958333
 1.         0.98947368 1.         1.        ]

mean value: 0.9958004385964913

key: test_roc_auc
value: [0.90909091 0.81818182 0.90454545 0.85909091 0.95454545 0.90454545
 0.95       0.81363636 0.85454545 0.71363636]

mean value: 0.8681818181818182

key: train_roc_auc
value: [0.97368421 0.97894737 0.97894737 0.98421053 0.97894737 0.97373904
 0.97916667 0.97911184 0.97916667 0.98958333]

mean value: 0.9795504385964913

key: test_jcc
value: [0.84615385 0.69230769 0.81818182 0.75       0.90909091 0.81818182
 0.91666667 0.66666667 0.76923077 0.57142857]

mean value: 0.7757908757908758

key: train_jcc
value: [0.94949495 0.95918367 0.96       0.96969697 0.96       0.95
 0.95959596 0.95918367 0.95959596 0.97938144]

mean value: 0.9606132628621583

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.0495255  0.04171205 0.05164695 0.03768301 0.03714681 0.04049611
 0.05802965 0.06608534 0.03809094 0.03966403]

mean value: 0.04600803852081299

key: score_time
value: [0.01200128 0.01423216 0.0186379  0.01211715 0.01423645 0.01453924
 0.01463461 0.01525044 0.01495218 0.01509023]

mean value: 0.01456916332244873

key: test_mcc
value: [0.81322028 0.82462113 0.61152662 0.90692382 0.80817439 0.90692382
 0.85441771 0.7197263  0.7098505  0.86240942]

mean value: 0.8017793998138952

key: train_mcc
value: [0.91372712 0.91893234 0.91395353 0.90846996 0.90846996 0.91379661
 0.89790701 0.91380162 0.913746   0.93536575]

mean value: 0.9138169912072902

key: test_accuracy
value: [0.9047619  0.9047619  0.80487805 0.95121951 0.90243902 0.95121951
 0.92682927 0.85365854 0.85365854 0.92682927]

mean value: 0.8980255516840883

key: train_accuracy
value: [0.95675676 0.95945946 0.95687332 0.9541779  0.9541779  0.95687332
 0.94878706 0.95687332 0.95687332 0.96765499]

mean value: 0.956850732133751

key: test_fscore
value: [0.9        0.89473684 0.78947368 0.95238095 0.89473684 0.95238095
 0.93023256 0.86956522 0.86363636 0.93333333]

mean value: 0.8980476745683493

key: train_fscore
value: [0.95721925 0.95934959 0.95744681 0.95466667 0.95466667 0.95721925
 0.94933333 0.95698925 0.95675676 0.96774194]

mean value: 0.9571389510899493

key: test_precision
value: [0.94736842 1.         0.83333333 0.90909091 0.94444444 0.90909091
 0.90909091 0.8        0.82608696 0.875     ]

mean value: 0.8953505882624876

key: train_precision
value: [0.94708995 0.96195652 0.94736842 0.94708995 0.94708995 0.95212766
 0.93684211 0.95187166 0.95675676 0.96256684]

mean value: 0.9510759808329783

key: test_recall
value: [0.85714286 0.80952381 0.75       1.         0.85       1.
 0.95238095 0.95238095 0.9047619  1.        ]

mean value: 0.9076190476190475

key: train_recall
value: [0.96756757 0.95675676 0.96774194 0.96236559 0.96236559 0.96236559
 0.96216216 0.96216216 0.95675676 0.97297297]

mean value: 0.9633217088055798

key: test_roc_auc
value: [0.9047619  0.9047619  0.80357143 0.95238095 0.90119048 0.95238095
 0.92619048 0.85119048 0.85238095 0.925     ]

mean value: 0.8973809523809524

key: train_roc_auc
value: [0.95675676 0.95945946 0.95684394 0.95415577 0.95415577 0.95685847
 0.94882302 0.95688753 0.956873   0.96766928]

mean value: 0.9568482999128161

key: test_jcc
value: [0.81818182 0.80952381 0.65217391 0.90909091 0.80952381 0.90909091
 0.86956522 0.76923077 0.76       0.875     ]

mean value: 0.8181381155076808

key: train_jcc
value: [0.91794872 0.921875   0.91836735 0.91326531 0.91326531 0.91794872
 0.9035533  0.91752577 0.91709845 0.9375    ]

mean value: 0.9178347913365226

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.09578729 1.01711416 1.37589264 0.92109752 0.99411583 0.98509765
 1.15172482 1.26849198 1.10450602 0.88261199]

mean value: 1.0796439886093139

key: score_time
value: [0.01467395 0.02109146 0.0124011  0.01224828 0.01224065 0.01213932
 0.0122056  0.0121479  0.01539993 0.01800776]

mean value: 0.014255595207214356

key: test_mcc
value: [0.95346259 0.90889326 0.90649828 0.95227002 0.95227002 0.95227002
 0.95238095 1.         0.80817439 0.90649828]

mean value: 0.9292717796884236

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97619048 0.95238095 0.95121951 0.97560976 0.97560976 0.97560976
 0.97560976 1.         0.90243902 0.95121951]

mean value: 0.963588850174216

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 0.95       0.94736842 0.97435897 0.97435897 0.97435897
 0.97560976 1.         0.90909091 0.95454545]

mean value: 0.963530121996104

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.86956522 0.91304348]

mean value: 0.9782608695652174

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 0.9047619  0.9        0.95       0.95       0.95
 0.95238095 1.         0.95238095 1.        ]

mean value: 0.9511904761904761

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 0.95238095 0.95       0.975      0.975      0.975
 0.97619048 1.         0.90119048 0.95      ]

mean value: 0.9630952380952381

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 0.9047619  0.9        0.95       0.95       0.95
 0.95238095 1.         0.83333333 0.91304348]

mean value: 0.9305900621118012

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.81

Accuracy on Blind test: 0.92

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01408195 0.01117468 0.00955725 0.00979543 0.00963759 0.00944567
 0.00941515 0.00972009 0.00966501 0.00958419]

mean value: 0.010207700729370116

key: score_time
value: [0.01482582 0.00992823 0.0092001  0.00906563 0.00890923 0.00876284
 0.00872135 0.00879741 0.00879622 0.00883293]

mean value: 0.00958397388458252

key: test_mcc
value: [0.38490018 0.62187434 0.65871309 0.6806903  0.66668392 0.42516543
 0.36718832 0.42916625 0.61969655 0.46623254]

mean value: 0.5320310918576006

key: train_mcc
value: [0.60203744 0.59709223 0.6477664  0.52948819 0.59363692 0.55833251
 0.55570017 0.59043621 0.6092626  0.58611464]

mean value: 0.5869867305151683

key: test_accuracy
value: [0.69047619 0.80952381 0.82926829 0.82926829 0.82926829 0.70731707
 0.68292683 0.70731707 0.80487805 0.73170732]

mean value: 0.7621951219512195

key: train_accuracy
value: [0.7972973  0.79459459 0.81940701 0.76010782 0.79514825 0.77628032
 0.77358491 0.79245283 0.80053908 0.78706199]

mean value: 0.789647410213448

key: test_fscore
value: [0.71111111 0.81818182 0.82051282 0.84444444 0.8372093  0.72727273
 0.71111111 0.75       0.82608696 0.75555556]

mean value: 0.7801485847036909

key: train_fscore
value: [0.81203008 0.81       0.8337469  0.78132678 0.80612245 0.79197995
 0.79104478 0.80506329 0.815      0.80589681]

mean value: 0.8052211026787506

key: test_precision
value: [0.66666667 0.7826087  0.84210526 0.76       0.7826087  0.66666667
 0.66666667 0.66666667 0.76       0.70833333]

mean value: 0.7302322654462242

key: train_precision
value: [0.75700935 0.75348837 0.77419355 0.71945701 0.76699029 0.74178404
 0.73271889 0.75714286 0.75813953 0.73873874]

mean value: 0.7499662633444528

key: test_recall
value: [0.76190476 0.85714286 0.8        0.95       0.9        0.8
 0.76190476 0.85714286 0.9047619  0.80952381]

mean value: 0.8402380952380952

key: train_recall
value: [0.87567568 0.87567568 0.90322581 0.85483871 0.84946237 0.84946237
 0.85945946 0.85945946 0.88108108 0.88648649]

mean value: 0.8694827085149666

key: test_roc_auc
value: [0.69047619 0.80952381 0.82857143 0.83214286 0.83095238 0.70952381
 0.68095238 0.70357143 0.80238095 0.7297619 ]

mean value: 0.7617857142857143

key: train_roc_auc
value: [0.7972973  0.79459459 0.81918047 0.75985179 0.79500145 0.77608253
 0.77381575 0.79263296 0.80075559 0.78732926]

mean value: 0.7896541702993316

key: test_jcc
value: [0.55172414 0.69230769 0.69565217 0.73076923 0.72       0.57142857
 0.55172414 0.6        0.7037037  0.60714286]

mean value: 0.6424452505127167

key: train_jcc
value: [0.6835443  0.68067227 0.71489362 0.64112903 0.67521368 0.65560166
 0.65432099 0.67372881 0.68776371 0.67489712]

mean value: 0.6741765190584461

MCC on Blind test: 0.47

Accuracy on Blind test: 0.76

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00980854 0.00986242 0.00992036 0.00991201 0.00991249 0.00991249
 0.00982308 0.00993419 0.00997567 0.00992012]

mean value: 0.009898138046264649

key: score_time
value: [0.00885415 0.00884795 0.00889301 0.00889349 0.00893068 0.00886345
 0.00891495 0.00888014 0.00895143 0.00895309]

mean value: 0.008898234367370606

key: test_mcc
value: [0.57207755 0.52620136 0.36718832 0.72229808 0.7098505  0.6806903
 0.65871309 0.41428571 0.51320273 0.51190476]

mean value: 0.5676412422906144

key: train_mcc
value: [0.63807092 0.62969126 0.6558879  0.6516517  0.68216317 0.64716482
 0.68245673 0.68195292 0.68220933 0.66900863]

mean value: 0.662025738279246

key: test_accuracy
value: [0.78571429 0.76190476 0.68292683 0.85365854 0.85365854 0.82926829
 0.82926829 0.70731707 0.75609756 0.75609756]

mean value: 0.7815911730545877

key: train_accuracy
value: [0.81891892 0.81351351 0.82749326 0.82479784 0.84097035 0.82210243
 0.84097035 0.84097035 0.84097035 0.8328841 ]

mean value: 0.8303591462082028

key: test_fscore
value: [0.79069767 0.77272727 0.64864865 0.86363636 0.84210526 0.84444444
 0.8372093  0.71428571 0.77272727 0.76190476]

mean value: 0.7848386718276559

key: train_fscore
value: [0.82133333 0.82170543 0.83246073 0.83204134 0.84350133 0.83076923
 0.84350133 0.84097035 0.84266667 0.84020619]

mean value: 0.8349155922270581

key: test_precision
value: [0.77272727 0.73913043 0.70588235 0.79166667 0.88888889 0.76
 0.81818182 0.71428571 0.73913043 0.76190476]

mean value: 0.7691798345161517

key: train_precision
value: [0.81052632 0.78712871 0.81122449 0.80099502 0.83246073 0.79411765
 0.828125   0.83870968 0.83157895 0.80295567]

mean value: 0.8137822213187824

key: test_recall
value: [0.80952381 0.80952381 0.6        0.95       0.8        0.95
 0.85714286 0.71428571 0.80952381 0.76190476]

mean value: 0.8061904761904761

key: train_recall
value: [0.83243243 0.85945946 0.85483871 0.8655914  0.85483871 0.87096774
 0.85945946 0.84324324 0.85405405 0.88108108]

mean value: 0.8575966288869514

key: test_roc_auc
value: [0.78571429 0.76190476 0.68095238 0.85595238 0.85238095 0.83214286
 0.82857143 0.70714286 0.7547619  0.75595238]

mean value: 0.781547619047619

key: train_roc_auc
value: [0.81891892 0.81351351 0.82741935 0.82468759 0.84093287 0.82197036
 0.84102005 0.84097646 0.84100552 0.83301366]

mean value: 0.8303458297006684

key: test_jcc
value: [0.65384615 0.62962963 0.48       0.76       0.72727273 0.73076923
 0.72       0.55555556 0.62962963 0.61538462]

mean value: 0.6502087542087542

key: train_jcc
value: [0.69683258 0.69736842 0.71300448 0.71238938 0.7293578  0.71052632
 0.7293578  0.7255814  0.7281106  0.72444444]

mean value: 0.716697321606543

MCC on Blind test: 0.6

Accuracy on Blind test: 0.81

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00937152 0.00930285 0.00922108 0.00923443 0.00945258 0.00900841
 0.00912046 0.01025939 0.01043296 0.01040459]

mean value: 0.00958082675933838

key: score_time
value: [0.01154137 0.01131701 0.01124048 0.01131988 0.01132011 0.01105022
 0.01111436 0.01203346 0.01216817 0.01221538]

mean value: 0.011532044410705567

key: test_mcc
value: [0.248452   0.57735027 0.01756821 0.65871309 0.61152662 0.47439956
 0.61152662 0.26730386 0.38060103 0.6133669 ]

mean value: 0.446080816017905

key: train_mcc
value: [0.62220365 0.67248442 0.64889279 0.63017348 0.61050859 0.64734861
 0.68042218 0.6676026  0.61385458 0.6558879 ]

mean value: 0.6449378798016933

key: test_accuracy
value: [0.61904762 0.78571429 0.51219512 0.82926829 0.80487805 0.73170732
 0.80487805 0.63414634 0.68292683 0.80487805]

mean value: 0.7209639953542393

key: train_accuracy
value: [0.81081081 0.83513514 0.82210243 0.81401617 0.8032345  0.82210243
 0.83827493 0.8328841  0.8032345  0.82749326]

mean value: 0.8209288264005246

key: test_fscore
value: [0.55555556 0.76923077 0.41176471 0.82051282 0.78947368 0.68571429
 0.81818182 0.65116279 0.64864865 0.8       ]

mean value: 0.6950245078634452

key: train_fscore
value: [0.80662983 0.82816901 0.81142857 0.80672269 0.79202279 0.81355932
 0.82857143 0.8258427  0.78592375 0.82222222]

mean value: 0.8121092323988096

key: test_precision
value: [0.66666667 0.83333333 0.5        0.84210526 0.83333333 0.8
 0.7826087  0.63636364 0.75       0.84210526]

mean value: 0.7486516191664934

key: train_precision
value: [0.82485876 0.86470588 0.86585366 0.84210526 0.84242424 0.85714286
 0.87878788 0.85964912 0.85897436 0.84571429]

mean value: 0.8540216306960209

key: test_recall
value: [0.47619048 0.71428571 0.35       0.8        0.75       0.6
 0.85714286 0.66666667 0.57142857 0.76190476]

mean value: 0.6547619047619048

key: train_recall
value: [0.78918919 0.79459459 0.76344086 0.77419355 0.74731183 0.77419355
 0.78378378 0.79459459 0.72432432 0.8       ]

mean value: 0.7745626271432723

key: test_roc_auc
value: [0.61904762 0.78571429 0.50833333 0.82857143 0.80357143 0.72857143
 0.80357143 0.63333333 0.68571429 0.80595238]

mean value: 0.7202380952380952

key: train_roc_auc
value: [0.81081081 0.83513514 0.82226097 0.8141238  0.80338564 0.82223191
 0.83812845 0.83278117 0.80302238 0.82741935]

mean value: 0.8209299622202848

key: test_jcc
value: [0.38461538 0.625      0.25925926 0.69565217 0.65217391 0.52173913
 0.69230769 0.48275862 0.48       0.66666667]

mean value: 0.5460172840929962

key: train_jcc
value: [0.67592593 0.70673077 0.68269231 0.67605634 0.65566038 0.68571429
 0.70731707 0.70334928 0.647343   0.69811321]

mean value: 0.6838902562133582

MCC on Blind test: 0.33

Accuracy on Blind test: 0.66

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01808858 0.01731157 0.01881266 0.01991796 0.01996613 0.01853418
 0.01932144 0.01959515 0.01970315 0.01968789]

mean value: 0.019093871116638184

key: score_time
value: [0.01118565 0.01099086 0.01180649 0.01189542 0.01190186 0.01185799
 0.01187539 0.01184416 0.01220989 0.01184678]

mean value: 0.011741447448730468

key: test_mcc
value: [0.80952381 0.58834841 0.51190476 0.78072006 0.70714286 0.72229808
 0.67700771 0.7197263  0.65871309 0.73786479]

mean value: 0.6913249867860977

key: train_mcc
value: [0.84533292 0.7964953  0.83490488 0.79583079 0.81394491 0.80073631
 0.82483989 0.80086091 0.82989307 0.7959641 ]

mean value: 0.8138803081683508

key: test_accuracy
value: [0.9047619  0.78571429 0.75609756 0.87804878 0.85365854 0.85365854
 0.82926829 0.85365854 0.82926829 0.85365854]

mean value: 0.8397793263646922

key: train_accuracy
value: [0.92162162 0.89459459 0.91644205 0.89487871 0.90566038 0.89757412
 0.91105121 0.89757412 0.91374663 0.89487871]

mean value: 0.9048022146135354

key: test_fscore
value: [0.9047619  0.80851064 0.75       0.88888889 0.85       0.86363636
 0.85106383 0.86956522 0.8372093  0.875     ]

mean value: 0.8498636145089149

key: train_fscore
value: [0.92428198 0.90126582 0.91948052 0.90126582 0.90956072 0.9035533
 0.91428571 0.90306122 0.91666667 0.90076336]

mean value: 0.9094185136611744

key: test_precision
value: [0.9047619  0.73076923 0.75       0.8        0.85       0.79166667
 0.76923077 0.8        0.81818182 0.77777778]

mean value: 0.7992388167388167

key: train_precision
value: [0.89393939 0.84761905 0.88944724 0.85167464 0.87562189 0.85576923
 0.88       0.85507246 0.88442211 0.85096154]

mean value: 0.8684527552986584

key: test_recall
value: [0.9047619  0.9047619  0.75       1.         0.85       0.95
 0.95238095 0.95238095 0.85714286 1.        ]

mean value: 0.9121428571428571

key: train_recall
value: [0.95675676 0.96216216 0.9516129  0.95698925 0.94623656 0.95698925
 0.95135135 0.95675676 0.95135135 0.95675676]

mean value: 0.9546963092124383

key: test_roc_auc
value: [0.9047619  0.78571429 0.75595238 0.88095238 0.85357143 0.85595238
 0.82619048 0.85119048 0.82857143 0.85      ]

mean value: 0.8392857142857143

key: train_roc_auc
value: [0.92162162 0.89459459 0.91634699 0.89471084 0.90555071 0.89741354
 0.91115955 0.89773322 0.91384772 0.89504505]

mean value: 0.9048023830281895

key: test_jcc
value: [0.82608696 0.67857143 0.6        0.8        0.73913043 0.76
 0.74074074 0.76923077 0.72       0.77777778]

mean value: 0.7411538107625064

key: train_jcc
value: [0.8592233  0.8202765  0.85096154 0.8202765  0.83412322 0.82407407
 0.84210526 0.82325581 0.84615385 0.81944444]

mean value: 0.833989449935668

MCC on Blind test: 0.72

Accuracy on Blind test: 0.88

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.48036051 1.6958499  1.75260854 1.34077454 1.51118517 1.51683521
 1.85760546 2.30665851 1.85421658 1.85729861]

mean value: 1.7173393011093139

key: score_time
value: [0.01311946 0.01507545 0.01247096 0.01482487 0.01506543 0.03141546
 0.01907992 0.01261067 0.01477861 0.01477838]

mean value: 0.01632192134857178

key: test_mcc
value: [0.82462113 0.81322028 0.8047619  0.95238095 0.81975606 0.95227002
 0.8047619  1.         0.80817439 0.95227002]

mean value: 0.8732216655064745

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9047619  0.9047619  0.90243902 0.97560976 0.90243902 0.97560976
 0.90243902 1.         0.90243902 0.97560976]

mean value: 0.9346109175377468

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.89473684 0.9        0.9        0.97560976 0.88888889 0.97435897
 0.9047619  1.         0.90909091 0.97674419]

mean value: 0.9324191461350013

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.94736842 0.9        0.95238095 1.         1.
 0.9047619  1.         0.86956522 0.95454545]

mean value: 0.9528621950132248

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.80952381 0.85714286 0.9        1.         0.8        0.95
 0.9047619  1.         0.95238095 1.        ]

mean value: 0.9173809523809524

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9047619  0.9047619  0.90238095 0.97619048 0.9        0.975
 0.90238095 1.         0.90119048 0.975     ]

mean value: 0.9341666666666666

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.80952381 0.81818182 0.81818182 0.95238095 0.8        0.95
 0.82608696 1.         0.83333333 0.95454545]

mean value: 0.8762234142668925

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.67

Accuracy on Blind test: 0.86

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02465558 0.016572   0.01520538 0.02612782 0.01691365 0.02268863
 0.02848172 0.02351189 0.01709127 0.02407074]

mean value: 0.02153186798095703

key: score_time
value: [0.01841354 0.01036406 0.01123571 0.00964928 0.00933528 0.00891948
 0.01145577 0.00994396 0.00893211 0.01515102]

mean value: 0.011340022087097168

key: test_mcc
value: [0.8660254  0.95346259 0.90649828 1.         0.85441771 0.95227002
 0.95238095 0.95238095 0.85441771 0.8547619 ]

mean value: 0.9146615512789581

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.97619048 0.95121951 1.         0.92682927 0.97560976
 0.97560976 0.97560976 0.92682927 0.92682927]

mean value: 0.9563298490127758

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.92307692 0.97560976 0.94736842 1.         0.92307692 0.97435897
 0.97560976 0.97560976 0.93023256 0.92682927]

mean value: 0.9551772336290353

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         0.94736842 1.
 1.         1.         0.90909091 0.95      ]

mean value: 0.9806459330143541

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.95238095 0.9        1.         0.9        0.95
 0.95238095 0.95238095 0.95238095 0.9047619 ]

mean value: 0.9321428571428572

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.97619048 0.95       1.         0.92619048 0.975
 0.97619048 0.97619048 0.92619048 0.92738095]

mean value: 0.9561904761904761

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.85714286 0.95238095 0.9        1.         0.85714286 0.95
 0.95238095 0.95238095 0.86956522 0.86363636]

mean value: 0.9154630152456239

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.82

Accuracy on Blind test: 0.92

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.1296804  0.12818766 0.12894821 0.13796115 0.11931705 0.12167764
 0.11896539 0.12175727 0.12500453 0.13481355]

mean value: 0.1266312837600708

key: score_time
value: [0.0192759  0.01999116 0.01901579 0.01934671 0.01817155 0.02717113
 0.03437304 0.02251887 0.02317071 0.01954842]

mean value: 0.02225832939147949

key: test_mcc
value: [0.95346259 0.85811633 0.8047619  0.95238095 0.81975606 1.
 0.80817439 0.86333169 0.7633652  0.85441771]

mean value: 0.867776682928389

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97619048 0.92857143 0.90243902 0.97560976 0.90243902 1.
 0.90243902 0.92682927 0.87804878 0.92682927]

mean value: 0.9319396051103368

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 0.93023256 0.9        0.97560976 0.88888889 1.
 0.90909091 0.92307692 0.88888889 0.93023256]

mean value: 0.9321630238419801

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.90909091 0.9        0.95238095 1.         1.
 0.86956522 1.         0.83333333 0.90909091]

mean value: 0.9373461321287408

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 0.95238095 0.9        1.         0.8        1.
 0.95238095 0.85714286 0.95238095 0.95238095]

mean value: 0.9319047619047619

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 0.92857143 0.90238095 0.97619048 0.9        1.
 0.90119048 0.92857143 0.87619048 0.92619048]

mean value: 0.9315476190476191

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 0.86956522 0.81818182 0.95238095 0.8        1.
 0.83333333 0.85714286 0.8        0.86956522]

mean value: 0.8752550348202522

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.77

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01090407 0.01682448 0.01031947 0.00970149 0.0096662  0.01756215
 0.00981498 0.00959396 0.01117182 0.01251793]

mean value: 0.011807656288146973

key: score_time
value: [0.01086402 0.01643109 0.00869727 0.0086472  0.00876594 0.01397753
 0.00873542 0.00865102 0.00945759 0.01478791]

mean value: 0.010901498794555663

key: test_mcc
value: [0.71754731 0.64597519 0.46623254 0.77831178 0.81975606 0.85441771
 0.65952381 0.72229808 0.8047619  0.6133669 ]

mean value: 0.7082191294366296

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.80952381 0.73170732 0.87804878 0.90243902 0.92682927
 0.82926829 0.85365854 0.90243902 0.80487805]

mean value: 0.8495934959349594

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85       0.77777778 0.7027027  0.85714286 0.88888889 0.92307692
 0.82926829 0.84210526 0.9047619  0.8       ]

mean value: 0.8375724610191876

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.89473684 0.93333333 0.76470588 1.         1.         0.94736842
 0.85       0.94117647 0.9047619  0.84210526]

mean value: 0.9078188117352204

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.80952381 0.66666667 0.65       0.75       0.8        0.9
 0.80952381 0.76190476 0.9047619  0.76190476]

mean value: 0.7814285714285715

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.80952381 0.7297619  0.875      0.9        0.92619048
 0.8297619  0.85595238 0.90238095 0.80595238]

mean value: 0.8491666666666666

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73913043 0.63636364 0.54166667 0.75       0.8        0.85714286
 0.70833333 0.72727273 0.82608696 0.66666667]

mean value: 0.7252663278750235

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.42

Accuracy on Blind test: 0.75

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.71200705 1.73543882 1.69342971 1.76952863 1.73066044 1.48200154
 1.47888637 1.46924496 1.47123814 1.48011613]

mean value: 1.6022551774978637

key: score_time
value: [0.11766553 0.1164763  0.10360074 0.10462189 0.09121513 0.09602308
 0.14463115 0.09650397 0.09187126 0.09328437]

mean value: 0.10558934211730957

key: test_mcc
value: [0.95346259 1.         1.         0.95238095 1.         1.
 0.85441771 1.         0.86240942 0.95227002]

mean value: 0.9574940680202537

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97619048 1.         1.         0.97560976 1.         1.
 0.92682927 1.         0.92682927 0.97560976]

mean value: 0.9781068524970964

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 1.         1.         0.97560976 1.         1.
 0.93023256 1.         0.93333333 0.97674419]

mean value: 0.9791529589714502

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95238095 1.         1.
 0.90909091 1.         0.875      0.95454545]

mean value: 0.9691017316017316

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         1.         1.         1.         1.
 0.95238095 1.         1.         1.        ]

mean value: 0.9904761904761905

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 1.         1.         0.97619048 1.         1.
 0.92619048 1.         0.925      0.975     ]

mean value: 0.9778571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 1.         1.         0.95238095 1.         1.
 0.86956522 1.         0.875      0.95454545]

mean value: 0.9603872576698663

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.88163686 0.94899797 0.92457771 0.92120004 0.88247681 1.04019213
 0.91119266 0.97453594 0.94692683 0.96164441]

mean value: 0.9393381357192994

key: score_time
value: [0.24661112 0.24717331 0.24636698 0.15798783 0.18983316 0.25734568
 0.26642156 0.22456074 0.2825954  0.12304974]

mean value: 0.22419455051422119

key: test_mcc
value: [0.95346259 0.95346259 0.90238095 0.95238095 1.         1.
 0.85441771 0.90649828 0.90649828 0.90649828]

mean value: 0.9335599629113059

key: train_mcc
value: [0.97310093 0.97332853 0.96261094 0.96787795 0.98395537 0.97339739
 0.97866529 0.9734012  0.98395676 0.97317407]

mean value: 0.9743468410052059

key: test_accuracy
value: [0.97619048 0.97619048 0.95121951 0.97560976 1.         1.
 0.92682927 0.95121951 0.95121951 0.95121951]

mean value: 0.9659698025551684

key: train_accuracy
value: [0.98648649 0.98648649 0.98113208 0.98382749 0.99191375 0.98652291
 0.98921833 0.98652291 0.99191375 0.98652291]

mean value: 0.9870547096962191

key: test_fscore
value: [0.97560976 0.97560976 0.95       0.97560976 1.         1.
 0.93023256 0.95454545 0.95454545 0.95454545]

mean value: 0.9670698190068581

key: train_fscore
value: [0.98659517 0.98666667 0.98143236 0.98404255 0.992      0.9867374
 0.98930481 0.98666667 0.9919571  0.98659517]

mean value: 0.9871997913715367

key: test_precision
value: [1.         1.         0.95       0.95238095 1.         1.
 0.90909091 0.91304348 0.91304348 0.91304348]

mean value: 0.955060229625447

key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[0.9787234  0.97368421 0.96858639 0.97368421 0.98412698 0.97382199
 0.97883598 0.97368421 0.98404255 0.9787234 ]

mean value: 0.9767913333207389

key: test_recall
value: [0.95238095 0.95238095 0.95       1.         1.         1.
 0.95238095 1.         1.         1.        ]

mean value: 0.9807142857142856

key: train_recall
value: [0.99459459 1.         0.99462366 0.99462366 1.         1.
 1.         1.         1.         0.99459459]

mean value: 0.9978436501017146

key: test_roc_auc
value: [0.97619048 0.97619048 0.95119048 0.97619048 1.         1.
 0.92619048 0.95       0.95       0.95      ]

mean value: 0.9655952380952381

key: train_roc_auc
value: [0.98648649 0.98648649 0.98109561 0.98379831 0.99189189 0.98648649
 0.98924731 0.98655914 0.99193548 0.98654461]

mean value: 0.9870531822144726

key: test_jcc
value: [0.95238095 0.95238095 0.9047619  0.95238095 1.         1.
 0.86956522 0.91304348 0.91304348 0.91304348]

mean value: 0.9370600414078675

key: train_jcc
value: [0.97354497 0.97368421 0.96354167 0.96858639 0.98412698 0.97382199
 0.97883598 0.97368421 0.98404255 0.97354497]

mean value: 0.9747413927927049

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02322245 0.01115775 0.01000214 0.0100019  0.00992441 0.0100491
 0.00989175 0.00991106 0.01019478 0.01086497]

mean value: 0.011522030830383301

key: score_time
value: [0.01117277 0.00906587 0.00894356 0.00887871 0.00882888 0.00881004
 0.00894785 0.00873399 0.00964856 0.00930524]

mean value: 0.009233546257019044

key: test_mcc
value: [0.57207755 0.52620136 0.36718832 0.72229808 0.7098505  0.6806903
 0.65871309 0.41428571 0.51320273 0.51190476]

mean value: 0.5676412422906144

key: train_mcc
value: [0.63807092 0.62969126 0.6558879  0.6516517  0.68216317 0.64716482
 0.68245673 0.68195292 0.68220933 0.66900863]

mean value: 0.662025738279246

key: test_accuracy
value: [0.78571429 0.76190476 0.68292683 0.85365854 0.85365854 0.82926829
 0.82926829 0.70731707 0.75609756 0.75609756]

mean value: 0.7815911730545877

key: train_accuracy
value: [0.81891892 0.81351351 0.82749326 0.82479784 0.84097035 0.82210243
 0.84097035 0.84097035 0.84097035 0.8328841 ]

mean value: 0.8303591462082028

key: test_fscore
value: [0.79069767 0.77272727 0.64864865 0.86363636 0.84210526 0.84444444
 0.8372093  0.71428571 0.77272727 0.76190476]

mean value: 0.7848386718276559

key: train_fscore
value: [0.82133333 0.82170543 0.83246073 0.83204134 0.84350133 0.83076923
 0.84350133 0.84097035 0.84266667 0.84020619]

mean value: 0.8349155922270581

key: test_precision
value: [0.77272727 0.73913043 0.70588235 0.79166667 0.88888889 0.76
 0.81818182 0.71428571 0.73913043 0.76190476]

mean value: 0.7691798345161517

key: train_precision
value: [0.81052632 0.78712871 0.81122449 0.80099502 0.83246073 0.79411765
 0.828125   0.83870968 0.83157895 0.80295567]

mean value: 0.8137822213187824

key: test_recall
value: [0.80952381 0.80952381 0.6        0.95       0.8        0.95
 0.85714286 0.71428571 0.80952381 0.76190476]

mean value: 0.8061904761904761

key: train_recall
value: [0.83243243 0.85945946 0.85483871 0.8655914  0.85483871 0.87096774
 0.85945946 0.84324324 0.85405405 0.88108108]

mean value: 0.8575966288869514

key: test_roc_auc
value: [0.78571429 0.76190476 0.68095238 0.85595238 0.85238095 0.83214286
 0.82857143 0.70714286 0.7547619  0.75595238]

mean value: 0.781547619047619

key: train_roc_auc
value: [0.81891892 0.81351351 0.82741935 0.82468759 0.84093287 0.82197036
 0.84102005 0.84097646 0.84100552 0.83301366]

mean value: 0.8303458297006684

key: test_jcc
value: [0.65384615 0.62962963 0.48       0.76       0.72727273 0.73076923
 0.72       0.55555556 0.62962963 0.61538462]

mean value: 0.6502087542087542

key: train_jcc
value: [0.69683258 0.69736842 0.71300448 0.71238938 0.7293578  0.71052632
 0.7293578  0.7255814  0.7281106  0.72444444]

mean value: 0.716697321606543

MCC on Blind test: 0.6

Accuracy on Blind test: 0.81

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.25485444 0.65282845 0.07063746 0.06406283 0.07283258 0.06812763
 0.06811261 0.07140255 0.07633281 0.44679737]

mean value: 0.18459887504577638

key: score_time
value: [0.01287341 0.01114225 0.01090765 0.01067328 0.01123691 0.01061821
 0.01061773 0.01126909 0.01094055 0.01281428]

mean value: 0.011309337615966798

key: test_mcc
value: [0.95346259 0.95346259 1.         1.         0.85441771 1.
 0.95238095 1.         0.90649828 0.95227002]

mean value: 0.9572492133248716

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97619048 0.97619048 1.         1.         0.92682927 1.
 0.97560976 1.         0.95121951 0.97560976]

mean value: 0.9781649245063879

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 0.97560976 1.         1.         0.92307692 1.
 0.97560976 1.         0.95454545 0.97674419]

mean value: 0.9781195831961572

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         0.94736842 1.
 1.         1.         0.91304348 0.95454545]

mean value: 0.9814957353858955

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 0.95238095 1.         1.         0.9        1.
 0.95238095 1.         1.         1.        ]

mean value: 0.9757142857142858

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 0.97619048 1.         1.         0.92619048 1.
 0.97619048 1.         0.95       0.975     ]

mean value: 0.9779761904761904

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 0.95238095 1.         1.         0.85714286 1.
 0.95238095 1.         0.91304348 0.95454545]

mean value: 0.9581874647092038

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.88

Accuracy on Blind test: 0.95

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.04435015 0.07989097 0.07348156 0.06565762 0.07407832 0.03800893
 0.08862805 0.09641933 0.09125471 0.06087875]

mean value: 0.07126483917236329

key: score_time
value: [0.02085805 0.02275562 0.02071571 0.02121043 0.01224446 0.01214051
 0.02340412 0.02287984 0.02411938 0.0126636 ]

mean value: 0.01929917335510254

key: test_mcc
value: [0.9047619  0.81322028 0.95238095 0.7633652  0.95227002 0.95227002
 0.86333169 0.8547619  0.65952381 0.95227002]

mean value: 0.8668155793025234

key: train_mcc
value: [0.98379816 0.98918919 0.98921825 0.98921825 0.98921825 0.98921825
 0.99462366 0.98384191 0.9946235  0.9946235 ]

mean value: 0.9897572910128184

key: test_accuracy
value: [0.95238095 0.9047619  0.97560976 0.87804878 0.97560976 0.97560976
 0.92682927 0.92682927 0.82926829 0.97560976]

mean value: 0.9320557491289199

key: train_accuracy
value: [0.99189189 0.99459459 0.99460916 0.99460916 0.99460916 0.99460916
 0.99730458 0.99191375 0.99730458 0.99730458]

mean value: 0.9948750637429883

key: test_fscore
value: [0.95238095 0.9        0.97560976 0.86486486 0.97435897 0.97435897
 0.92307692 0.92682927 0.82926829 0.97674419]

mean value: 0.9297492192160371

key: train_fscore
value: [0.99186992 0.99459459 0.99462366 0.99462366 0.99462366 0.99462366
 0.99730458 0.99191375 0.99728997 0.99728997]

mean value: 0.9948757411590124

key: test_precision
value: [0.95238095 0.94736842 0.95238095 0.94117647 1.         1.
 1.         0.95       0.85       0.95454545]

mean value: 0.9547852250948226

key: train_precision
value: [0.99456522 0.99459459 0.99462366 0.99462366 0.99462366 0.99462366
 0.99462366 0.98924731 1.         1.        ]

mean value: 0.9951525403383749

key: test_recall
value: [0.95238095 0.85714286 1.         0.8        0.95       0.95
 0.85714286 0.9047619  0.80952381 1.        ]

mean value: 0.9080952380952381

key: train_recall
value: [0.98918919 0.99459459 0.99462366 0.99462366 0.99462366 0.99462366
 1.         0.99459459 0.99459459 0.99459459]

mean value: 0.9946062191223481

key: test_roc_auc
value: [0.95238095 0.9047619  0.97619048 0.87619048 0.975      0.975
 0.92857143 0.92738095 0.8297619  0.975     ]

mean value: 0.9320238095238095

key: train_roc_auc
value: [0.99189189 0.99459459 0.99460913 0.99460913 0.99460913 0.99460913
 0.99731183 0.99192095 0.9972973  0.9972973 ]

mean value: 0.9948750363266493

key: test_jcc
value: [0.90909091 0.81818182 0.95238095 0.76190476 0.95       0.95
 0.85714286 0.86363636 0.70833333 0.95454545]

mean value: 0.872521645021645

key: train_jcc
value: [0.98387097 0.98924731 0.98930481 0.98930481 0.98930481 0.98930481
 0.99462366 0.98395722 0.99459459 0.99459459]

mean value: 0.9898107595261295

MCC on Blind test: 0.78

Accuracy on Blind test: 0.9

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01397157 0.01064801 0.01116705 0.00983787 0.01101112 0.01056051
 0.01734877 0.01102638 0.01001835 0.01107717]

mean value: 0.011666679382324218

key: score_time
value: [0.01169801 0.0106709  0.01013255 0.00997734 0.00976014 0.00959802
 0.01371956 0.00910187 0.00948548 0.00973678]

mean value: 0.010388064384460449

key: test_mcc
value: [0.61904762 0.53357838 0.56086079 0.58066054 0.65871309 0.51551459
 0.46428571 0.7197263  0.65871309 0.51966679]

mean value: 0.5830766911453171

key: train_mcc
value: [0.62458505 0.63192977 0.66817939 0.6027138  0.62040699 0.62388021
 0.58080121 0.62919597 0.66395875 0.63891466]

mean value: 0.6284565795023681

key: test_accuracy
value: [0.80952381 0.76190476 0.7804878  0.7804878  0.82926829 0.75609756
 0.73170732 0.85365854 0.82926829 0.75609756]

mean value: 0.7888501742160279

key: train_accuracy
value: [0.81081081 0.81351351 0.8328841  0.80053908 0.80862534 0.81132075
 0.78975741 0.81401617 0.83018868 0.81671159]

mean value: 0.8128367451008961

key: test_fscore
value: [0.80952381 0.7826087  0.76923077 0.8        0.82051282 0.76190476
 0.73170732 0.86956522 0.8372093  0.7826087 ]

mean value: 0.7964871389266566

key: train_fscore
value: [0.81958763 0.82442748 0.84020619 0.80829016 0.81841432 0.81770833
 0.79581152 0.81889764 0.83804627 0.82741117]

mean value: 0.8208800702499555

key: test_precision
value: [0.80952381 0.72       0.78947368 0.72       0.84210526 0.72727273
 0.75       0.8        0.81818182 0.72      ]

mean value: 0.7696557302346776

key: train_precision
value: [0.78325123 0.77884615 0.80693069 0.78       0.7804878  0.79292929
 0.7715736  0.79591837 0.79901961 0.77990431]

mean value: 0.7868861061720982

key: test_recall
value: [0.80952381 0.85714286 0.75       0.9        0.8        0.8
 0.71428571 0.95238095 0.85714286 0.85714286]

mean value: 0.8297619047619047

key: train_recall
value: [0.85945946 0.87567568 0.87634409 0.83870968 0.86021505 0.84408602
 0.82162162 0.84324324 0.88108108 0.88108108]

mean value: 0.858151700087184

key: test_roc_auc
value: [0.80952381 0.76190476 0.7797619  0.78333333 0.82857143 0.75714286
 0.73214286 0.85119048 0.82857143 0.75357143]

mean value: 0.7885714285714286

key: train_roc_auc
value: [0.81081081 0.81351351 0.83276664 0.80043592 0.80848591 0.8112322
 0.78984307 0.81409474 0.83032549 0.81688463]

mean value: 0.812839290903807

key: test_jcc
value: [0.68       0.64285714 0.625      0.66666667 0.69565217 0.61538462
 0.57692308 0.76923077 0.72       0.64285714]

mean value: 0.6634571587832457

key: train_jcc
value: [0.69432314 0.7012987  0.72444444 0.67826087 0.69264069 0.69162996
 0.66086957 0.69333333 0.72123894 0.70562771]

mean value: 0.6963667350232523

MCC on Blind test: 0.62

Accuracy on Blind test: 0.83

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.02573824 0.02512217 0.02389956 0.01943803 0.01827455 0.02354956
 0.02093744 0.02193046 0.02814054 0.01929259]

mean value: 0.022632312774658204

key: score_time
value: [0.01042128 0.01171613 0.01189804 0.01167703 0.01169205 0.01298094
 0.0117383  0.01208544 0.01217604 0.01196671]

mean value: 0.011835193634033203

key: test_mcc
value: [0.90889326 0.8660254  0.85441771 0.90692382 0.81975606 1.
 0.90238095 0.95238095 0.80817439 0.77831178]

mean value: 0.8797264335200057

key: train_mcc
value: [0.97860715 0.98391316 0.9946235  0.93726212 0.94697838 0.97866283
 0.93728335 0.98395537 1.         0.87323811]

mean value: 0.9614523980709841

key: test_accuracy
value: [0.95238095 0.92857143 0.92682927 0.95121951 0.90243902 1.
 0.95121951 0.97560976 0.90243902 0.87804878]

mean value: 0.9368757259001161

key: train_accuracy
value: [0.98918919 0.99189189 0.99730458 0.96765499 0.97304582 0.98921833
 0.96765499 0.99191375 1.         0.93261456]

mean value: 0.9800488089167334

key: test_fscore
value: [0.95       0.92307692 0.92307692 0.95238095 0.88888889 1.
 0.95238095 0.97560976 0.90909091 0.89361702]

mean value: 0.9368122326269706

key: train_fscore
value: [0.98907104 0.99182561 0.99731903 0.96875    0.97252747 0.9893617
 0.96858639 0.99182561 1.         0.93670886]

mean value: 0.9805975722111132

key: test_precision
value: [1.         1.         0.94736842 0.90909091 1.         1.
 0.95238095 1.         0.86956522 0.80769231]

mean value: 0.9486097807608105

key: train_precision
value: [1.         1.         0.99465241 0.93939394 0.99438202 0.97894737
 0.93908629 1.         1.         0.88095238]

mean value: 0.9727414412072639

key: test_recall
value: [0.9047619  0.85714286 0.9        1.         0.8        1.
 0.95238095 0.95238095 0.95238095 1.        ]

mean value: 0.9319047619047619

key: train_recall
value: [0.97837838 0.98378378 1.         1.         0.9516129  1.
 1.         0.98378378 1.         1.        ]

mean value: 0.9897558849171753

key: test_roc_auc
value: [0.95238095 0.92857143 0.92619048 0.95238095 0.9        1.
 0.95119048 0.97619048 0.90119048 0.875     ]

mean value: 0.9363095238095238

key: train_roc_auc
value: [0.98918919 0.99189189 0.9972973  0.96756757 0.97310375 0.98918919
 0.96774194 0.99189189 1.         0.9327957 ]

mean value: 0.980066841034583

key: test_jcc
value: [0.9047619  0.85714286 0.85714286 0.90909091 0.8        1.
 0.90909091 0.95238095 0.83333333 0.80769231]

mean value: 0.8830636030636031

key: train_jcc
value: [0.97837838 0.98378378 0.99465241 0.93939394 0.94652406 0.97894737
 0.93908629 0.98378378 1.         0.88095238]

mean value: 0.9625502399717798

MCC on Blind test: 0.8

Accuracy on Blind test: 0.91

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01839256 0.03484535 0.01788545 0.01615334 0.01706839 0.01675463
 0.02165723 0.01748395 0.01716471 0.01620626]

mean value: 0.019361186027526855

key: score_time
value: [0.01203561 0.01304317 0.0117414  0.01175261 0.01182771 0.01171184
 0.01184654 0.01171041 0.01172614 0.01175642]

mean value: 0.011915183067321778

key: test_mcc
value: [0.95346259 0.70710678 0.67700771 0.90692382 0.85441771 0.74124932
 0.59335232 0.73786479 0.62048368 0.8213423 ]

mean value: 0.7613211022538029

key: train_mcc
value: [0.96807684 0.92195445 0.93618785 0.94103803 0.95709803 0.71475641
 0.72814281 0.8780389  0.62242988 0.74956267]

mean value: 0.8417285864866383

key: test_accuracy
value: [0.97619048 0.83333333 0.82926829 0.95121951 0.92682927 0.85365854
 0.7804878  0.85365854 0.7804878  0.90243902]

mean value: 0.8687572590011614

key: train_accuracy
value: [0.98378378 0.95945946 0.96765499 0.9703504  0.97843666 0.83827493
 0.84636119 0.93530997 0.77897574 0.85983827]

mean value: 0.9118445399577475

key: test_fscore
value: [0.97560976 0.8        0.8        0.95238095 0.92307692 0.86956522
 0.81632653 0.875      0.82352941 0.89473684]

mean value: 0.8730225633428955

key: train_fscore
value: [0.98404255 0.95774648 0.96703297 0.97082228 0.97826087 0.86111111
 0.86651054 0.93908629 0.81858407 0.83647799]

mean value: 0.9179675152216906

key: test_precision
value: [1.         1.         0.93333333 0.90909091 0.94736842 0.76923077
 0.71428571 0.77777778 0.7        1.        ]

mean value: 0.8751086924771135

key: train_precision
value: [0.96858639 1.         0.98876404 0.95811518 0.98901099 0.75609756
 0.76446281 0.88516746 0.6928839  1.        ]

mean value: 0.9003088334774321

key: test_recall
value: [0.95238095 0.66666667 0.7        1.         0.9        1.
 0.95238095 1.         1.         0.80952381]

mean value: 0.8980952380952381

key: train_recall
value: [1.         0.91891892 0.94623656 0.98387097 0.96774194 1.
 1.         1.         1.         0.71891892]

mean value: 0.9535687300203429

key: test_roc_auc
value: [0.97619048 0.83333333 0.82619048 0.95238095 0.92619048 0.85714286
 0.77619048 0.85       0.775      0.9047619 ]

mean value: 0.8677380952380952

key: train_roc_auc
value: [0.98378378 0.95945946 0.96771287 0.97031386 0.97846556 0.83783784
 0.84677419 0.93548387 0.77956989 0.85945946]

mean value: 0.9118860796280152

key: test_jcc
value: [0.95238095 0.66666667 0.66666667 0.90909091 0.85714286 0.76923077
 0.68965517 0.77777778 0.7        0.80952381]

mean value: 0.7798135580894201

key: train_jcc
value: [0.96858639 0.91891892 0.93617021 0.94329897 0.95744681 0.75609756
 0.76446281 0.88516746 0.6928839  0.71891892]

mean value: 0.8541951945760038

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.18290973 0.17475319 0.17721367 0.17609525 0.176162   0.1720438
 0.18470693 0.16746712 0.15189815 0.15740561]

mean value: 0.17206554412841796

key: score_time
value: [0.01708293 0.016047   0.01525402 0.01880813 0.01527667 0.01667857
 0.02105594 0.01652122 0.01655769 0.01626611]

mean value: 0.016954827308654784

key: test_mcc
value: [0.95346259 1.         1.         1.         0.95238095 0.95227002
 0.95238095 0.95238095 0.90649828 0.95227002]

mean value: 0.9621643756089681

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97619048 1.         1.         1.         0.97560976 0.97560976
 0.97560976 0.97560976 0.95121951 0.97560976]

mean value: 0.9805458768873403

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97560976 1.         1.         1.         0.97560976 0.97435897
 0.97560976 0.97560976 0.95454545 0.97674419]

mean value: 0.9808087639341184

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         0.95238095 1.
 1.         1.         0.91304348 0.95454545]

mean value: 0.9819969885187276

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95238095 1.         1.         1.         1.         0.95
 0.95238095 0.95238095 1.         1.        ]

mean value: 0.9807142857142856

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97619048 1.         1.         1.         0.97619048 0.975
 0.97619048 0.97619048 0.95       0.975     ]

mean value: 0.9804761904761905

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95238095 1.         1.         1.         0.95238095 0.95
 0.95238095 0.95238095 0.91304348 0.95454545]

mean value: 0.9627112742330133

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.9

Accuracy on Blind test: 0.95

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.06012321 0.05812716 0.06727004 0.06751275 0.05394554 0.05166769
 0.07125807 0.0593648  0.05061197 0.06649494]

mean value: 0.060637617111206056

key: score_time
value: [0.03201938 0.02349854 0.02908587 0.03036594 0.02365637 0.02560067
 0.024652   0.02214336 0.02164698 0.02623129]

mean value: 0.025890040397644042

key: test_mcc
value: [0.95346259 1.         0.95238095 1.         0.90238095 0.95227002
 0.95238095 0.95238095 0.90649828 0.80907152]

mean value: 0.9380826209946751

key: train_mcc
value: [1.         1.         0.98384144 0.99462366 0.99462366 1.
 0.99462366 1.         0.9946235  0.99462366]

mean value: 0.9956959560990601

key: test_accuracy
value: [0.97619048 1.         0.97560976 1.         0.95121951 0.97560976
 0.97560976 0.97560976 0.95121951 0.90243902]

mean value: 0.9683507549361208

key: train_accuracy
value: [1.         1.         0.99191375 0.99730458 0.99730458 1.
 0.99730458 1.         0.99730458 0.99730458]

mean value: 0.997843665768194

key: test_fscore
value: [0.97560976 1.         0.97560976 1.         0.95       0.97435897
 0.97560976 0.97560976 0.95454545 0.9       ]

mean value: 0.9681343453294673

key: train_fscore
value: [1.         1.         0.9919571  0.99730458 0.99730458 1.
 0.99730458 1.         0.99728997 0.99730458]

mean value: 0.997846540629834

key: test_precision
value: [1.         1.         0.95238095 1.         0.95       1.
 1.         1.         0.91304348 0.94736842]

mean value: 0.9762792851694453

key: train_precision
value: [1.         1.         0.98930481 1.         1.         1.
 0.99462366 1.         1.         0.99462366]

mean value: 0.9978552124662181

key: test_recall
value: [0.95238095 1.         1.         1.         0.95       0.95
 0.95238095 0.95238095 1.         0.85714286]

mean value: 0.9614285714285714

key: train_recall
value: [1.         1.         0.99462366 0.99462366 0.99462366 1.
 1.         1.         0.99459459 1.        ]

mean value: 0.997846556233653

key: test_roc_auc
value: [0.97619048 1.         0.97619048 1.         0.95119048 0.975
 0.97619048 0.97619048 0.95       0.90357143]

mean value: 0.968452380952381

key: train_roc_auc
value: [1.         1.         0.99190642 0.99731183 0.99731183 1.
 0.99731183 1.         0.9972973  0.99731183]

mean value: 0.9978451031676838

key: test_jcc
value: [0.95238095 1.         0.95238095 1.         0.9047619  0.95
 0.95238095 0.95238095 0.91304348 0.81818182]

mean value: 0.9395511010728401

key: train_jcc
value: [1.         1.         0.98404255 0.99462366 0.99462366 1.
 0.99462366 1.         0.99459459 0.99462366]

mean value: 0.9957131771441998

MCC on Blind test: 0.87

Accuracy on Blind test: 0.94

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.09891081 0.11998367 0.09824228 0.12802958 0.08986497 0.14036012
 0.14788032 0.11883092 0.14361811 0.19772005]

mean value: 0.12834408283233642

key: score_time
value: [0.0226264  0.02250314 0.01391673 0.02268267 0.01394129 0.02342319
 0.02288437 0.02269959 0.02311373 0.02368975]

mean value: 0.021148085594177246

key: test_mcc
value: [0.68640647 0.71754731 0.57570364 0.90238095 0.7197263  0.75714286
 0.65952381 0.86333169 0.6133669  0.76500781]

mean value: 0.7260137745399882

key: train_mcc
value: [0.97310093 0.97843556 0.978494   0.98927606 0.978494   0.96765475
 0.97849275 0.97849275 0.97317174 0.97305937]

mean value: 0.9768671913757482

key: test_accuracy
value: [0.83333333 0.85714286 0.7804878  0.95121951 0.85365854 0.87804878
 0.82926829 0.92682927 0.80487805 0.87804878]

mean value: 0.8592915214866435

key: train_accuracy
value: [0.98648649 0.98918919 0.98921833 0.99460916 0.98921833 0.98382749
 0.98921833 0.98921833 0.98652291 0.98652291]

mean value: 0.9884031470823924

key: test_fscore
value: [0.81081081 0.85       0.74285714 0.95       0.83333333 0.87804878
 0.82926829 0.92307692 0.8        0.87179487]

mean value: 0.8489190155043814

key: train_fscore
value: [0.98637602 0.98913043 0.98918919 0.99459459 0.98918919 0.98387097
 0.98913043 0.98913043 0.98637602 0.98644986]

mean value: 0.988343715315811

key: test_precision
value: [0.9375     0.89473684 0.86666667 0.95       0.9375     0.85714286
 0.85       1.         0.84210526 0.94444444]

mean value: 0.9080096073517125

key: train_precision
value: [0.99450549 0.99453552 0.99456522 1.         0.99456522 0.98387097
 0.99453552 0.99453552 0.99450549 0.98913043]

mean value: 0.9934749383695191

key: test_recall
value: [0.71428571 0.80952381 0.65       0.95       0.75       0.9
 0.80952381 0.85714286 0.76190476 0.80952381]

mean value: 0.8011904761904762

key: train_recall
value: [0.97837838 0.98378378 0.98387097 0.98924731 0.98387097 0.98387097
 0.98378378 0.98378378 0.97837838 0.98378378]

mean value: 0.9832752106945656

key: test_roc_auc
value: [0.83333333 0.85714286 0.77738095 0.95119048 0.85119048 0.87857143
 0.8297619  0.92857143 0.80595238 0.8797619 ]

mean value: 0.8592857142857143

key: train_roc_auc
value: [0.98648649 0.98918919 0.98923278 0.99462366 0.98923278 0.98382738
 0.98920372 0.98920372 0.98650102 0.98651555]

mean value: 0.9884016274338856

key: test_jcc
value: [0.68181818 0.73913043 0.59090909 0.9047619  0.71428571 0.7826087
 0.70833333 0.85714286 0.66666667 0.77272727]

mean value: 0.7418384152079804

key: train_jcc
value: [0.97311828 0.97849462 0.97860963 0.98924731 0.97860963 0.96825397
 0.97849462 0.97849462 0.97311828 0.97326203]

mean value: 0.9769702993611912

MCC on Blind test: 0.37

Accuracy on Blind test: 0.71

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.60291195 0.56733131 0.5522635  0.54364038 0.55846596 0.54851103
 0.51982498 0.54667163 0.54160666 0.54516959]

mean value: 0.5526396989822387

key: score_time
value: [0.00946116 0.00969267 0.00912714 0.00913095 0.00936699 0.00920463
 0.00918961 0.0092113  0.00943303 0.00921965]

mean value: 0.009303712844848632

key: test_mcc
value: [1.         0.95346259 1.         1.         1.         1.
 0.95238095 1.         0.90649828 0.95227002]

mean value: 0.9764611836143962

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.97619048 1.         1.         1.         1.
 0.97560976 1.         0.95121951 0.97560976]

mean value: 0.987862950058072

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.97560976 1.         1.         1.         1.
 0.97560976 1.         0.95454545 0.97674419]

mean value: 0.9882509152787088

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.91304348 0.95454545]

mean value: 0.9867588932806324

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.95238095 1.         1.         1.         1.
 0.95238095 1.         1.         1.        ]

mean value: 0.9904761904761905

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.97619048 1.         1.         1.         1.
 0.97619048 1.         0.95       0.975     ]

mean value: 0.9877380952380952

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.95238095 1.         1.         1.         1.
 0.95238095 1.         0.91304348 0.95454545]

mean value: 0.9772350837568229

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.86

Accuracy on Blind test: 0.94

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.03106356 0.02730608 0.02864671 0.02820945 0.02958632 0.06434035
 0.02798939 0.09193873 0.05388188 0.03148031]

mean value: 0.0414442777633667

key: score_time
value: [0.01246595 0.01286077 0.01365685 0.01574326 0.0151403  0.01277018
 0.0197835  0.01280165 0.01409435 0.01528525]

mean value: 0.014460206031799316

key: test_mcc
value: [0.67357531 0.72760688 0.74124932 0.86333169 0.78072006 0.70714286
 0.61969655 0.65871309 0.66432098 0.7098505 ]

mean value: 0.7146207238265427

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.83333333 0.85714286 0.85365854 0.92682927 0.87804878 0.85365854
 0.80487805 0.82926829 0.82926829 0.85365854]

mean value: 0.8519744483159117

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84444444 0.86956522 0.86956522 0.93023256 0.88888889 0.85
 0.82608696 0.8372093  0.84444444 0.86363636]

mean value: 0.8624073393183606

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.79166667 0.8        0.76923077 0.86956522 0.8        0.85
 0.76       0.81818182 0.79166667 0.82608696]

mean value: 0.8076398094658964

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9047619  0.95238095 1.         1.         1.         0.85
 0.9047619  0.85714286 0.9047619  0.9047619 ]

mean value: 0.9278571428571428

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83333333 0.85714286 0.85714286 0.92857143 0.88095238 0.85357143
 0.80238095 0.82857143 0.82738095 0.85238095]

mean value: 0.8521428571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73076923 0.76923077 0.76923077 0.86956522 0.8        0.73913043
 0.7037037  0.72       0.73076923 0.76      ]

mean value: 0.7592399355877617

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.69

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02879977 0.03616357 0.03607559 0.03617406 0.03616214 0.03645635
 0.04655337 0.04668188 0.03914762 0.03796554]

mean value: 0.038017988204956055

key: score_time
value: [0.01821136 0.02122855 0.02009559 0.02300406 0.02157331 0.02307343
 0.0226903  0.0217855  0.02382231 0.02646971]

mean value: 0.02219541072845459

key: test_mcc
value: [0.95346259 0.85811633 0.71121921 0.90649828 0.95227002 1.
 0.90692382 0.86240942 0.7633652  0.86240942]

mean value: 0.8776674275412332

key: train_mcc
value: [0.95698047 0.97837838 0.96787795 0.97317174 0.96771006 0.96261094
 0.98384191 0.96261632 0.97317407 0.96788166]

mean value: 0.9694243488227355

key: test_accuracy
value: [0.97619048 0.92857143 0.85365854 0.95121951 0.97560976 1.
 0.95121951 0.92682927 0.87804878 0.92682927]

mean value: 0.9368176538908246

key: train_accuracy
value: [0.97837838 0.98918919 0.98382749 0.98652291 0.98382749 0.98113208
 0.99191375 0.98113208 0.98652291 0.98382749]

mean value: 0.9846273767028484

key: test_fscore
value: [0.97560976 0.92682927 0.85714286 0.94736842 0.97435897 1.
 0.95       0.93333333 0.88888889 0.93333333]

mean value: 0.9386864832500262

key: train_fscore
value: [0.97860963 0.98918919 0.98404255 0.98666667 0.98395722 0.98143236
 0.99191375 0.98133333 0.98659517 0.98395722]

mean value: 0.984769708818797

key: test_precision
value: [1.         0.95       0.81818182 1.         1.         1.
 1.         0.875      0.83333333 0.875     ]

mean value: 0.9351515151515152

key: train_precision
value: [0.96825397 0.98918919 0.97368421 0.97883598 0.9787234  0.96858639
 0.98924731 0.96842105 0.9787234  0.97354497]

mean value: 0.9767209880755154

key: test_recall
value: [0.95238095 0.9047619  0.9        0.9        0.95       1.
 0.9047619  1.         0.95238095 1.        ]

mean value: 0.9464285714285714

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_7030.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98918919 0.98918919 0.99462366 0.99462366 0.98924731 0.99462366
 0.99459459 0.99459459 0.99459459 0.99459459]

mean value: 0.992987503632665

key: test_roc_auc
value: [0.97619048 0.92857143 0.8547619  0.95       0.975      1.
 0.95238095 0.925      0.87619048 0.925     ]

mean value: 0.9363095238095238

key: train_roc_auc
value: [0.97837838 0.98918919 0.98379831 0.98650102 0.98381285 0.98109561
 0.99192095 0.98116827 0.98654461 0.98385644]

mean value: 0.9846265620459169

key: test_jcc
value: [0.95238095 0.86363636 0.75       0.9        0.95       1.
 0.9047619  0.875      0.8        0.875     ]

mean value: 0.8870779220779221

key: train_jcc
value: [0.95811518 0.97860963 0.96858639 0.97368421 0.96842105 0.96354167
 0.98395722 0.96335079 0.97354497 0.96842105]

mean value: 0.9700232156941843

MCC on Blind test: 0.84

Accuracy on Blind test: 0.93

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=168)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.27132845 0.3317194  0.22286606 0.19475937 0.26474047 0.25768924
 0.47041345 0.29687142 0.2708199  0.24366689]

mean value: 0.28248746395111085

key: score_time
value: [0.01650262 0.02799678 0.01505399 0.01735926 0.02323914 0.02022839
 0.02487922 0.02377343 0.02366781 0.01204443]

mean value: 0.020474505424499512

key: test_mcc
value: [0.95346259 0.85811633 0.71121921 0.90649828 0.95227002 1.
 0.90692382 0.80817439 0.7633652  0.86240942]

mean value: 0.8722439251771972

key: train_mcc
value: [0.95698047 0.97837838 0.96787795 0.97317174 0.96771006 0.96261094
 0.98384191 0.96788166 0.97317407 0.96788166]

mean value: 0.96995088248731

key: test_accuracy
value: [0.97619048 0.92857143 0.85365854 0.95121951 0.97560976 1.
 0.95121951 0.90243902 0.87804878 0.92682927]

mean value: 0.9343786295005807

key: train_accuracy
value: [0.97837838 0.98918919 0.98382749 0.98652291 0.98382749 0.98113208
 0.99191375 0.98382749 0.98652291 0.98382749]

mean value: 0.9848969184818241

key: test_fscore
value: [0.97560976 0.92682927 0.85714286 0.94736842 0.97435897 1.
 0.95       0.90909091 0.88888889 0.93333333]

mean value: 0.9362622408257838

key: train_fscore
value: [0.97860963 0.98918919 0.98404255 0.98666667 0.98395722 0.98143236
 0.99191375 0.98395722 0.98659517 0.98395722]

mean value: 0.9850320974105973

key: test_precision
value: [1.         0.95       0.81818182 1.         1.         1.
 1.         0.86956522 0.83333333 0.875     ]

mean value: 0.9346080368906455

key: train_precision
value: [0.96825397 0.98918919 0.97368421 0.97883598 0.9787234  0.96858639
 0.98924731 0.97354497 0.9787234  0.97354497]

mean value: 0.9772333801668549

key: test_recall
value: [0.95238095 0.9047619  0.9        0.9        0.95       1.
 0.9047619  0.95238095 0.95238095 1.        ]

mean value: 0.9416666666666667

key: train_recall
value: [0.98918919 0.98918919 0.99462366 0.99462366 0.98924731 0.99462366
 0.99459459 0.99459459 0.99459459 0.99459459]

mean value: 0.992987503632665

key: test_roc_auc
value: [0.97619048 0.92857143 0.8547619  0.95       0.975      1.
 0.95238095 0.90119048 0.87619048 0.925     ]

mean value: 0.9339285714285714

key: train_roc_auc
value: [0.97837838 0.98918919 0.98379831 0.98650102 0.98381285 0.98109561
 0.99192095 0.98385644 0.98654461 0.98385644]

mean value: 0.984895379250218

key: test_jcc
value: [0.95238095 0.86363636 0.75       0.9        0.95       1.
 0.9047619  0.83333333 0.8        0.875     ]

mean value: 0.8829112554112554

key: train_jcc
value: [0.95811518 0.97860963 0.96858639 0.97368421 0.96842105 0.96354167
 0.98395722 0.96842105 0.97354497 0.96842105]

mean value: 0.9705302424233108

MCC on Blind test: 0.84

Accuracy on Blind test: 0.93