LSHTM_analysis/scripts/ml/log_rpob_8020.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 1133

PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation

or_mychisq          339
log10_or_mychisq    339
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 169
No. of categorical features: 7

PASS: x_features has no target variable

No. of columns for x_features: 176

-------------------------------------------------------------
Successfully split data with stratification: 80/20
Train data size: (445, 176)
Test data size: (112, 176)
y_train numbers: Counter({0: 225, 1: 220})
y_train ratio: 1.0227272727272727

y_test_numbers: Counter({0: 57, 1: 55})
y_test ratio: 1.0363636363636364
-------------------------------------------------------------

Simple Random OverSampling
 Counter({1: 225, 0: 225})
(450, 176)

Simple Random UnderSampling
 Counter({0: 220, 1: 220})
(440, 176)

Simple Combined Over and UnderSampling
 Counter({0: 225, 1: 225})
(450, 176)

SMOTE_NC OverSampling
 Counter({1: 225, 0: 225})
(450, 176)

#####################################################################

Running ML analysis: 80/20 split
Gene name: rpoB
Drug name: rifampicin

Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_8020/
Sanity checks:
ML source data size: (557, 176)
Total input features: (445, 176)
Target feature numbers: Counter({0: 225, 1: 220})
Target features ratio: 1.0227272727272727

#####################################################################


================================================================

Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.08822989 0.09819555 0.1139729  0.10658669 0.11813283 0.05520678
 0.07728314 0.09269142 0.11131549 0.06736732]

mean value: 0.09289820194244384

key: score_time
value: [0.01899791 0.02099395 0.02197051 0.02467132 0.05310249 0.02295399
 0.02127385 0.02181339 0.0188055  0.01463914]

mean value: 0.023922204971313477

key: test_mcc
value: [0.82506438 0.86732843 0.68911026 0.8360602  0.86758893 0.86452993
 0.86452993 0.77352678 0.77352678 0.77352678]

mean value: 0.8134792424092705

key: train_mcc
value: [0.860043   0.85528899 0.8500425  0.869987   0.85018502 0.86053339
 0.85041172 0.85535874 0.8705095  0.87541359]

mean value: 0.8597773460103459

key: test_accuracy
value: [0.91111111 0.93333333 0.84444444 0.91111111 0.93333333 0.93181818
 0.93181818 0.88636364 0.88636364 0.88636364]

mean value: 0.9056060606060606

key: train_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.93       0.9275     0.925      0.935      0.925      0.93017456
 0.92518703 0.9276808  0.93516209 0.93765586]

mean value: 0.9298360349127183

key: test_fscore
value: [0.9047619  0.93023256 0.8372093  0.91666667 0.93333333 0.93023256
 0.93333333 0.88888889 0.88888889 0.88888889]

mean value: 0.9052436323366556

key: train_fscore
value: [0.92964824 0.9276808  0.92462312 0.93434343 0.925      0.93
 0.92462312 0.92695214 0.935      0.93734336]

mean value: 0.9295214204164155

key: test_precision
value: [0.95       0.95238095 0.85714286 0.84615385 0.91304348 0.95238095
 0.91304348 0.86956522 0.86956522 0.86956522]

mean value: 0.8992841216754259

key: train_precision
value: [0.925      0.91625616 0.92       0.93434343 0.91584158 0.92079208
 0.92       0.92462312 0.92574257 0.93034826]

mean value: 0.9232947203887022

key: test_recall
value: [0.86363636 0.90909091 0.81818182 1.         0.95454545 0.90909091
 0.95454545 0.90909091 0.90909091 0.90909091]

mean value: 0.9136363636363636

key: train_recall
value: [0.93434343 0.93939394 0.92929293 0.93434343 0.93434343 0.93939394
 0.92929293 0.92929293 0.94444444 0.94444444]

mean value: 0.9358585858585858

key: test_roc_auc
value: [0.91007905 0.93280632 0.84387352 0.91304348 0.93379447 0.93181818
 0.93181818 0.88636364 0.88636364 0.88636364]

mean value: 0.9056324110671937

key: train_roc_auc
value: [0.930043   0.92761776 0.9250425  0.9349935  0.92509251 0.9302881
 0.9252376  0.92770065 0.93527641 0.93773946]

mean value: 0.9299031504135635

key: test_jcc
value: [0.82608696 0.86956522 0.72       0.84615385 0.875      0.86956522
 0.875      0.8        0.8        0.8       ]

mean value: 0.8281371237458194

key: train_jcc
value: [0.8685446  0.86511628 0.85981308 0.87677725 0.86046512 0.86915888
 0.85981308 0.86384977 0.87793427 0.88207547]

mean value: 0.8683547803458409

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.07467175 1.19143915 3.18981433 1.21566057 1.01042342 0.9827199
 1.26141262 2.24912834 3.37447715 2.48636866]

mean value: 1.8036115884780883

key: score_time
value: [0.01486397 0.01910448 0.01350474 0.01525831 0.02177715 0.02520943
 0.01491761 0.02106428 0.03649759 0.02547741]

mean value: 0.02076749801635742

key: test_mcc
value: [0.82506438 0.82506438 0.64613475 0.79854941 0.91485328 0.81818182
 0.86452993 0.81818182 0.81818182 0.77352678]

mean value: 0.8102268367108156

key: train_mcc
value: [0.89018902 0.900045   0.83510219 0.89002252 0.88510532 0.89536533
 0.90029107 0.89025725 0.89555655 0.90043786]

mean value: 0.8882372108052222

key: test_accuracy
value: [0.91111111 0.91111111 0.82222222 0.88888889 0.95555556 0.90909091
 0.93181818 0.90909091 0.90909091 0.88636364]

mean value: 0.9034343434343434

key: train_accuracy
value: [0.945      0.95       0.9175     0.945      0.9425     0.94763092
 0.95012469 0.94513716 0.94763092 0.95012469]

mean value: 0.9440648379052369

key: test_fscore
value: [0.9047619  0.9047619  0.80952381 0.89795918 0.95652174 0.90909091
 0.93333333 0.90909091 0.90909091 0.88888889]

mean value: 0.9023023491346472

key: train_fscore
value: [0.945      0.94974874 0.91729323 0.94416244 0.94235589 0.94736842
 0.94974874 0.94444444 0.94763092 0.95      ]

mean value: 0.943775283498277

key: test_precision
value: [0.95       0.95       0.85       0.81481481 0.91666667 0.90909091
 0.91304348 0.90909091 0.90909091 0.86956522]

mean value: 0.8991362904406383

key: train_precision
value: [0.93564356 0.945      0.91044776 0.94897959 0.93532338 0.94029851
 0.945      0.94444444 0.93596059 0.94059406]

mean value: 0.9381691902917854

key: test_recall
value: [0.86363636 0.86363636 0.77272727 1.         1.         0.90909091
 0.95454545 0.90909091 0.90909091 0.90909091]

mean value: 0.9090909090909091

key: train_recall
value: [0.95454545 0.95454545 0.92424242 0.93939394 0.94949495 0.95454545
 0.95454545 0.94444444 0.95959596 0.95959596]

mean value: 0.9494949494949495

key: test_roc_auc
value: [0.91007905 0.91007905 0.82114625 0.89130435 0.95652174 0.90909091
 0.93181818 0.90909091 0.90909091 0.88636364]

mean value: 0.9034584980237155

key: train_roc_auc
value: [0.94509451 0.950045   0.91756676 0.94494449 0.94256926 0.94771608
 0.95017913 0.94512863 0.94777828 0.95024133]

mean value: 0.9441263461321502

key: test_jcc
value: [0.82608696 0.82608696 0.68       0.81481481 0.91666667 0.83333333
 0.875      0.83333333 0.83333333 0.8       ]

mean value: 0.823865539452496

key: train_jcc
value: [0.8957346  0.90430622 0.84722222 0.89423077 0.89099526 0.9
 0.90430622 0.89473684 0.90047393 0.9047619 ]

mean value: 0.8936767969980741

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.02495861 0.01450539 0.01457644 0.01471186 0.01506543 0.01446843
 0.01499057 0.01478457 0.01477289 0.01486278]

mean value: 0.015769696235656737

key: score_time
value: [0.01284051 0.01264167 0.01298308 0.012573   0.01277876 0.01334071
 0.01299834 0.0127492  0.01270795 0.0127039 ]

mean value: 0.012831711769104004

key: test_mcc
value: [0.56261436 0.66660455 0.74410286 0.51089209 0.60079051 0.72727273
 0.66143783 0.68252363 0.43151697 0.68252363]

mean value: 0.6270279162046954

key: train_mcc
value: [0.68538393 0.67445688 0.70858632 0.68778613 0.64887146 0.64847406
 0.66903696 0.68058469 0.68521411 0.6862916 ]

mean value: 0.6774686130760773

key: test_accuracy
value: [0.77777778 0.82222222 0.86666667 0.75555556 0.8        0.86363636
 0.81818182 0.84090909 0.70454545 0.84090909]

mean value: 0.809040404040404

key: train_accuracy
value: [0.84       0.835      0.8525     0.8425     0.8225     0.82044888
 0.83291771 0.83790524 0.840399   0.84289277]

mean value: 0.8367063591022443

key: test_fscore
value: [0.75       0.78947368 0.85       0.74418605 0.8        0.86363636
 0.78947368 0.8372093  0.64864865 0.84444444]

mean value: 0.7917072173987718

key: train_fscore
value: [0.82702703 0.82258065 0.84266667 0.83289125 0.80965147 0.8021978
 0.82133333 0.82479784 0.82795699 0.84367246]

mean value: 0.8254775485090063

key: test_precision
value: [0.83333333 0.9375     0.94444444 0.76190476 0.7826087  0.86363636
 0.9375     0.85714286 0.8        0.82608696]

mean value: 0.8544157412635673

key: train_precision
value: [0.88953488 0.87931034 0.89265537 0.87709497 0.86285714 0.87951807
 0.8700565  0.88439306 0.88505747 0.82926829]

mean value: 0.8749746107699744

key: test_recall
value: [0.68181818 0.68181818 0.77272727 0.72727273 0.81818182 0.86363636
 0.68181818 0.81818182 0.54545455 0.86363636]

mean value: 0.7454545454545455

key: train_recall
value: [0.77272727 0.77272727 0.7979798  0.79292929 0.76262626 0.73737374
 0.77777778 0.77272727 0.77777778 0.85858586]

mean value: 0.7823232323232323

key: test_roc_auc
value: [0.7756917  0.81916996 0.86462451 0.75494071 0.80039526 0.86363636
 0.81818182 0.84090909 0.70454545 0.84090909]

mean value: 0.808300395256917

key: train_roc_auc
value: [0.83933393 0.83438344 0.8519602  0.8420092  0.82190719 0.81942578
 0.83223864 0.83710255 0.83962781 0.84308603]

mean value: 0.8361074777428482

key: test_jcc
value: [0.6        0.65217391 0.73913043 0.59259259 0.66666667 0.76
 0.65217391 0.72       0.48       0.73076923]

mean value: 0.6593506750898055

key: train_jcc
value: [0.70506912 0.69863014 0.7281106  0.71363636 0.68018018 0.66972477
 0.69683258 0.70183486 0.70642202 0.72961373]

mean value: 0.7030054368772396

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01549268 0.01472116 0.01465893 0.0147357  0.01448822 0.01551938
 0.01875544 0.02495813 0.01451755 0.01468372]

mean value: 0.016253089904785155

key: score_time
value: [0.01308107 0.01281691 0.01316237 0.01281404 0.01322484 0.01289582
 0.02950215 0.01286101 0.01288104 0.01303196]

mean value: 0.01462712287902832

key: test_mcc
value: [0.68911026 0.77821935 0.64426877 0.60079051 0.70780516 0.63636364
 0.73029674 0.63636364 0.5547002  0.77352678]

mean value: 0.6751445052594404

key: train_mcc
value: [0.73006509 0.73513714 0.7700385  0.74497106 0.74497106 0.71074778
 0.7306343  0.75588396 0.69693637 0.75588396]

mean value: 0.7375269233005722

key: test_accuracy
value: [0.84444444 0.88888889 0.82222222 0.8        0.84444444 0.81818182
 0.86363636 0.81818182 0.77272727 0.88636364]

mean value: 0.8359090909090909

key: train_accuracy
value: [0.865      0.8675     0.885      0.8725     0.8725     0.8553616
 0.86533666 0.87780549 0.8478803  0.87780549]

mean value: 0.8686689526184539

key: test_fscore
value: [0.8372093  0.88372093 0.81818182 0.8        0.85714286 0.81818182
 0.85714286 0.81818182 0.75       0.88888889]

mean value: 0.8328650290278198

key: train_fscore
value: [0.8622449  0.86445013 0.88442211 0.87088608 0.87088608 0.85204082
 0.86294416 0.87780549 0.84073107 0.87780549]

mean value: 0.866421631011566

key: test_precision
value: [0.85714286 0.9047619  0.81818182 0.7826087  0.77777778 0.81818182
 0.9        0.81818182 0.83333333 0.86956522]

mean value: 0.8379735240604806

key: train_precision
value: [0.87113402 0.87564767 0.88       0.87309645 0.87309645 0.86082474
 0.86734694 0.86699507 0.87027027 0.86699507]

mean value: 0.8705406681510427

key: test_recall
value: [0.81818182 0.86363636 0.81818182 0.81818182 0.95454545 0.81818182
 0.81818182 0.81818182 0.68181818 0.90909091]

mean value: 0.8318181818181818

key: train_recall
value: [0.85353535 0.85353535 0.88888889 0.86868687 0.86868687 0.84343434
 0.85858586 0.88888889 0.81313131 0.88888889]

mean value: 0.8626262626262626

key: test_roc_auc
value: [0.84387352 0.88833992 0.82213439 0.80039526 0.84683794 0.81818182
 0.86363636 0.81818182 0.77272727 0.88636364]

mean value: 0.8360671936758893

key: train_roc_auc
value: [0.86488649 0.86736174 0.8850385  0.87246225 0.87246225 0.85521471
 0.86525352 0.87794198 0.84745236 0.87794198]

mean value: 0.8686015769064591

key: test_jcc
value: [0.72       0.79166667 0.69230769 0.66666667 0.75       0.69230769
 0.75       0.69230769 0.6        0.8       ]

mean value: 0.715525641025641

key: train_jcc
value: [0.75784753 0.76126126 0.79279279 0.77130045 0.77130045 0.74222222
 0.75892857 0.78222222 0.72522523 0.78222222]

mean value: 0.7645322947867791

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01377344 0.0139215  0.01400995 0.02145243 0.01363587 0.01715064
 0.01344013 0.01378942 0.03159833 0.03337765]

mean value: 0.018614935874938964

key: score_time
value: [0.09378958 0.03972101 0.03561568 0.05780077 0.05395031 0.0524869
 0.03577709 0.05378652 0.04946804 0.04471517]

mean value: 0.051711106300354005

key: test_mcc
value: [0.48086334 0.42403053 0.29512214 0.19960474 0.55666994 0.64715023
 0.59648091 0.59648091 0.50051733 0.59648091]

mean value: 0.4893400981241138

key: train_mcc
value: [0.68527843 0.70019536 0.70019536 0.70627441 0.68496131 0.7009041
 0.66084236 0.66734561 0.71074778 0.69173625]

mean value: 0.690848097079054

key: test_accuracy
value: [0.73333333 0.71111111 0.64444444 0.6        0.77777778 0.81818182
 0.79545455 0.79545455 0.75       0.79545455]

mean value: 0.7421212121212121

key: train_accuracy
value: [0.8425     0.85       0.85       0.8525     0.8425     0.85037406
 0.83042394 0.83291771 0.8553616  0.84538653]

mean value: 0.8451963840399003

key: test_fscore
value: [0.68421053 0.68292683 0.57894737 0.59090909 0.76190476 0.83333333
 0.7804878  0.7804878  0.74418605 0.80851064]

mean value: 0.7245904204717919

key: train_fscore
value: [0.83804627 0.84615385 0.84615385 0.845953   0.84050633 0.84615385
 0.82653061 0.82414698 0.85204082 0.83854167]

mean value: 0.8404227219545394

key: test_precision
value: [0.8125     0.73684211 0.6875     0.59090909 0.8        0.76923077
 0.84210526 0.84210526 0.76190476 0.76      ]

mean value: 0.760309725362357

key: train_precision
value: [0.85340314 0.859375   0.859375   0.87567568 0.84263959 0.859375
 0.83505155 0.8579235  0.86082474 0.8655914 ]

mean value: 0.8569234594722578

key: test_recall
value: [0.59090909 0.63636364 0.5        0.59090909 0.72727273 0.90909091
 0.72727273 0.72727273 0.72727273 0.86363636]

mean value: 0.7

key: train_recall
value: [0.82323232 0.83333333 0.83333333 0.81818182 0.83838384 0.83333333
 0.81818182 0.79292929 0.84343434 0.81313131]

mean value: 0.8247474747474748

key: test_roc_auc
value: [0.73023715 0.70948617 0.64130435 0.59980237 0.77667984 0.81818182
 0.79545455 0.79545455 0.75       0.79545455]

mean value: 0.7412055335968379

key: train_roc_auc
value: [0.84230923 0.84983498 0.84983498 0.85216022 0.84245925 0.8501642
 0.83027318 0.83242524 0.85521471 0.8449893 ]

mean value: 0.8449665286725717

key: test_jcc
value: [0.52       0.51851852 0.40740741 0.41935484 0.61538462 0.71428571
 0.64       0.64       0.59259259 0.67857143]

mean value: 0.5746115115469954

key: train_jcc
value: [0.72123894 0.73333333 0.73333333 0.73303167 0.72489083 0.73333333
 0.70434783 0.70089286 0.74222222 0.72197309]

mean value: 0.7248597441578004

MCC on Blind test: 0.41

Accuracy on Blind test: 0.71

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.02649641 0.02651262 0.02658916 0.02641249 0.0266242  0.02697945
 0.02693439 0.02807307 0.02710104 0.0273788 ]

mean value: 0.026910161972045897

key: score_time
value: [0.01609492 0.01617599 0.01605654 0.0158565  0.03611541 0.01599598
 0.01581216 0.0163033  0.01630783 0.01622319]

mean value: 0.018094182014465332

key: test_mcc
value: [0.78405645 0.82506438 0.60000118 0.74605372 0.82574419 0.86452993
 0.90909091 0.72727273 0.81818182 0.77352678]

mean value: 0.7873522091161106

key: train_mcc
value: [0.80528086 0.80528086 0.81500094 0.809981   0.79998    0.79560664
 0.79055651 0.80547816 0.80053238 0.81050825]

mean value: 0.8038205592014417

key: test_accuracy
value: [0.88888889 0.91111111 0.8        0.86666667 0.91111111 0.93181818
 0.95454545 0.86363636 0.90909091 0.88636364]

mean value: 0.8923232323232323

key: train_accuracy
value: [0.9025     0.9025     0.9075     0.905      0.9        0.89775561
 0.89526185 0.90274314 0.90024938 0.90523691]

mean value: 0.9018746882793017

key: test_fscore
value: [0.87804878 0.9047619  0.79069767 0.875      0.91304348 0.93023256
 0.95454545 0.86363636 0.90909091 0.88888889]

mean value: 0.8907946012230334

key: train_fscore
value: [0.90274314 0.90274314 0.90680101 0.9040404  0.8989899  0.89724311
 0.89447236 0.90176322 0.89949749 0.90452261]

mean value: 0.9012816389138596

key: test_precision
value: [0.94736842 0.95       0.80952381 0.80769231 0.875      0.95238095
 0.95454545 0.86363636 0.90909091 0.86956522]

mean value: 0.8938803435313732

key: train_precision
value: [0.89162562 0.89162562 0.90452261 0.9040404  0.8989899  0.89054726
 0.89       0.89949749 0.895      0.9       ]

mean value: 0.8965848898741502

key: test_recall
value: [0.81818182 0.86363636 0.77272727 0.95454545 0.95454545 0.90909091
 0.95454545 0.86363636 0.90909091 0.90909091]

mean value: 0.8909090909090909

key: train_recall
value: [0.91414141 0.91414141 0.90909091 0.9040404  0.8989899  0.9040404
 0.8989899  0.9040404  0.9040404  0.90909091]

mean value: 0.9060606060606061

key: test_roc_auc
value: [0.88735178 0.91007905 0.79940711 0.86857708 0.91205534 0.93181818
 0.95454545 0.86363636 0.90909091 0.88636364]

mean value: 0.8922924901185771

key: train_roc_auc
value: [0.90261526 0.90261526 0.90751575 0.9049905  0.89999    0.89783301
 0.89530776 0.90275912 0.90029606 0.90528437]

mean value: 0.9019207093123105

key: test_jcc
value: [0.7826087  0.82608696 0.65384615 0.77777778 0.84       0.86956522
 0.91304348 0.76       0.83333333 0.8       ]

mean value: 0.8056261612783352

key: train_jcc
value: [0.82272727 0.82272727 0.82949309 0.82488479 0.81651376 0.81363636
 0.80909091 0.82110092 0.8173516  0.82568807]

mean value: 0.8203214048833244

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [2.87617755 3.59716487 3.2788682  3.04321766 2.02100229 0.76258993
 0.71084738 1.37337828 0.62539959 1.43995023]

mean value: 1.9728595972061158

key: score_time
value: [0.02402663 0.0251689  0.02707911 0.05417156 0.02970004 0.01262379
 0.02017164 0.02028584 0.02023172 0.02038431]

mean value: 0.025384354591369628

key: test_mcc
value: [0.86732843 0.77821935 0.60404349 0.76206649 0.86758893 0.51031036
 0.86452993 0.77352678 0.77352678 0.60678804]

mean value: 0.7407928602447988

key: train_mcc
value: [0.99501219 1.         0.99501219 1.         1.         0.63345212
 0.79681808 0.81118415 0.78683326 0.76497588]

mean value: 0.8783287861680027

key: test_accuracy
value: [0.93333333 0.88888889 0.8        0.86666667 0.93333333 0.72727273
 0.93181818 0.88636364 0.88636364 0.79545455]

mean value: 0.8649494949494949

key: train_accuracy
value: [0.9975     1.         0.9975     1.         1.         0.79301746
 0.89775561 0.90523691 0.89276808 0.87531172]

mean value: 0.9359089775561098

key: test_fscore
value: [0.93023256 0.88372093 0.7804878  0.88       0.93333333 0.77777778
 0.93333333 0.88888889 0.88888889 0.81632653]

mean value: 0.8712990046084609

key: train_fscore
value: [0.99748111 1.         0.99748111 1.         1.         0.82377919
 0.8992629  0.90594059 0.89434889 0.88479263]

mean value: 0.940308642422994

key: test_precision
value: [0.95238095 0.9047619  0.84210526 0.78571429 0.91304348 0.65625
 0.91304348 0.86956522 0.86956522 0.74074074]

mean value: 0.8447170538060126

key: train_precision
value: [0.99497487 1.         0.99497487 1.         1.         0.71062271
 0.87559809 0.88834951 0.8708134  0.81355932]

mean value: 0.9148892779217023

key: test_recall
value: [0.90909091 0.86363636 0.72727273 1.         0.95454545 0.95454545
 0.95454545 0.90909091 0.90909091 0.90909091]

mean value: 0.9090909090909091

key: train_recall
value: [1.         1.         1.         1.         1.         0.97979798
 0.92424242 0.92424242 0.91919192 0.96969697]

mean value: 0.9717171717171718

key: test_roc_auc
value: [0.93280632 0.88833992 0.79841897 0.86956522 0.93379447 0.72727273
 0.93181818 0.88636364 0.88636364 0.79545455]

mean value: 0.8650197628458498

key: train_roc_auc
value: [0.99752475 1.         0.99752475 1.         1.         0.79531771
 0.8980818  0.90547097 0.8930935  0.8764741 ]

mean value: 0.9363487580285123

key: test_jcc
value: [0.86956522 0.79166667 0.64       0.78571429 0.875      0.63636364
 0.875      0.8        0.8        0.68965517]

mean value: 0.7762964978549687

key: train_jcc
value: [0.99497487 1.         0.99497487 1.         1.         0.70036101
 0.81696429 0.8280543  0.80888889 0.79338843]

mean value: 0.8937606662571818

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.04109359 0.02866888 0.02903247 0.02609515 0.02982569 0.026232
 0.02595305 0.02627039 0.02766967 0.02752471]

mean value: 0.028836560249328614

key: score_time
value: [0.01269889 0.01240039 0.01267838 0.01267815 0.01238513 0.01267934
 0.01254892 0.01264668 0.01244855 0.01284599]

mean value: 0.012601041793823242

key: test_mcc
value: [0.86732843 0.86758893 0.86732843 0.82574419 1.         0.73029674
 0.86452993 0.87177979 0.82158384 0.81818182]

mean value: 0.8534362113672467

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.93333333 0.93333333 0.91111111 1.         0.86363636
 0.93181818 0.93181818 0.90909091 0.90909091]

mean value: 0.9256565656565656

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93023256 0.93333333 0.93023256 0.91304348 1.         0.86956522
 0.93333333 0.93617021 0.9047619  0.90909091]

mean value: 0.9259763505216682

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95238095 0.91304348 0.95238095 0.875      1.         0.83333333
 0.91304348 0.88       0.95       0.90909091]

mean value: 0.9178273103707886

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.95454545 0.90909091 0.95454545 1.         0.90909091
 0.95454545 1.         0.86363636 0.90909091]

mean value: 0.9363636363636364

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93280632 0.93379447 0.93280632 0.91205534 1.         0.86363636
 0.93181818 0.93181818 0.90909091 0.90909091]

mean value: 0.9256916996047431

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.86956522 0.875      0.86956522 0.84       1.         0.76923077
 0.875      0.88       0.82608696 0.83333333]

mean value: 0.863778149386845

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.19880676 0.16679668 0.24343967 0.16726899 0.29743075 0.16902733
 0.17551208 0.1705997  0.18266988 0.17188907]

mean value: 0.19434409141540526

key: score_time
value: [0.02430034 0.02431679 0.0246942  0.02460122 0.02689743 0.02486563
 0.02493906 0.02493906 0.02535796 0.02534413]

mean value: 0.025025582313537596

key: test_mcc
value: [0.86732843 0.82506438 0.60000118 0.69583743 0.78530224 0.86452993
 0.7800135  0.7800135  0.77352678 0.77352678]

mean value: 0.7745144142569061

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.91111111 0.8        0.84444444 0.88888889 0.93181818
 0.88636364 0.88636364 0.88636364 0.88636364]

mean value: 0.8855050505050505

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93023256 0.9047619  0.79069767 0.85106383 0.89361702 0.93333333
 0.87804878 0.89361702 0.88372093 0.88888889]

mean value: 0.8847981942603055

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95238095 0.95       0.80952381 0.8        0.84       0.91304348
 0.94736842 0.84       0.9047619  0.86956522]

mean value: 0.8826643783371472

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.86363636 0.77272727 0.90909091 0.95454545 0.95454545
 0.81818182 0.95454545 0.86363636 0.90909091]

mean value: 0.8909090909090909

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93280632 0.91007905 0.79940711 0.8458498  0.89031621 0.93181818
 0.88636364 0.88636364 0.88636364 0.88636364]

mean value: 0.8855731225296443

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.86956522 0.82608696 0.65384615 0.74074074 0.80769231 0.875
 0.7826087  0.80769231 0.79166667 0.8       ]

mean value: 0.7954899046203394

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.84

Accuracy on Blind test: 0.92

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01551032 0.01521516 0.01527309 0.01511455 0.01513052 0.01546001
 0.01533055 0.01527858 0.01553512 0.01524663]

mean value: 0.015309453010559082

key: score_time
value: [0.01284385 0.01274657 0.01290631 0.01287198 0.01276088 0.01284337
 0.01283884 0.01324463 0.01275945 0.01283479]

mean value: 0.012865066528320312

key: test_mcc
value: [0.55666994 0.38112585 0.68972332 0.19881069 0.46930785 0.77352678
 0.32673202 0.45454545 0.54545455 0.50051733]

mean value: 0.48964137935205826

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.77777778 0.68888889 0.84444444 0.6        0.73333333 0.88636364
 0.65909091 0.72727273 0.77272727 0.75      ]

mean value: 0.743989898989899

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.76190476 0.65       0.84444444 0.57142857 0.73913043 0.88372093
 0.61538462 0.72727273 0.77272727 0.74418605]

mean value: 0.7310199804689188

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        0.72222222 0.82608696 0.6        0.70833333 0.9047619
 0.70588235 0.72727273 0.77272727 0.76190476]

mean value: 0.7529191531685138

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.59090909 0.86363636 0.54545455 0.77272727 0.86363636
 0.54545455 0.72727273 0.77272727 0.72727273]

mean value: 0.7136363636363636

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.77667984 0.68675889 0.84486166 0.59881423 0.73418972 0.88636364
 0.65909091 0.72727273 0.77272727 0.75      ]

mean value: 0.7436758893280633

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.61538462 0.48148148 0.73076923 0.4        0.5862069  0.79166667
 0.44444444 0.57142857 0.62962963 0.59259259]

mean value: 0.5843604128948956

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.6

Accuracy on Blind test: 0.79

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [2.48509145 1.72775126 1.72003531 1.70660305 1.7010572  1.70190072
 1.67853093 1.70248485 1.73082352 1.76075959]

mean value: 1.7915037870407104

key: score_time
value: [0.10180235 0.10169816 0.09832668 0.10011482 0.10132575 0.09362698
 0.09398794 0.09288573 0.10091448 0.09981847]

mean value: 0.09845013618469238

key: test_mcc
value: [0.95652174 0.91452919 0.91106719 0.86758893 1.         0.90909091
 0.95553309 0.87177979 0.82158384 0.81818182]

mean value: 0.9025876492117845

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.93333333 1.         0.95454545
 0.97727273 0.93181818 0.90909091 0.90909091]

mean value: 0.9504040404040404

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97777778 0.95238095 0.95454545 0.93333333 1.         0.95454545
 0.97674419 0.93617021 0.9047619  0.90909091]

mean value: 0.9499350185248255

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95652174 1.         0.95454545 0.91304348 1.         0.95454545
 1.         0.88       0.95       0.90909091]

mean value: 0.9517747035573123

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.95454545 0.95454545 1.         0.95454545
 0.95454545 1.         0.86363636 0.90909091]

mean value: 0.95

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97826087 0.95454545 0.9555336  0.93379447 1.         0.95454545
 0.97727273 0.93181818 0.90909091 0.90909091]

mean value: 0.9503952569169961

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95652174 0.90909091 0.91304348 0.875      1.         0.91304348
 0.95454545 0.88       0.82608696 0.83333333]

mean value: 0.906066534914361

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [2.09929252 1.04539609 1.14330029 0.98038101 0.95856428 0.94347358
 1.00293994 0.99390554 0.91637397 1.01709938]

mean value: 1.110072660446167

key: score_time
value: [0.2254622  0.14489126 0.16381502 0.14184189 0.19981694 0.18624687
 0.16025591 0.16381979 0.13914704 0.18514252]

mean value: 0.1710439443588257

key: test_mcc
value: [0.91106719 0.91452919 0.86732843 0.73663511 0.91485328 0.90909091
 0.91287093 0.87177979 0.82158384 0.81818182]

mean value: 0.8677920483015767

key: train_mcc
value: [0.95500519 0.949995   0.94500356 0.95017516 0.949995   0.95015803
 0.95011693 0.94513697 0.94513697 0.96009355]

mean value: 0.9500816369848698

key: test_accuracy
value: [0.95555556 0.95555556 0.93333333 0.86666667 0.95555556 0.95454545
 0.95454545 0.93181818 0.90909091 0.90909091]

mean value: 0.9325757575757576

key: train_accuracy
value: [0.9775     0.975      0.9725     0.975      0.975      0.97506234
 0.97506234 0.97256858 0.97256858 0.98004988]

mean value: 0.9750311720698255

key: test_fscore
value: [0.95454545 0.95238095 0.93023256 0.86956522 0.95652174 0.95454545
 0.95238095 0.93617021 0.9047619  0.90909091]

mean value: 0.9320195355132859

key: train_fscore
value: [0.97721519 0.97474747 0.9721519  0.9744898  0.97474747 0.97461929
 0.97474747 0.9721519  0.9721519  0.97979798]

mean value: 0.9746820375374823

key: test_precision
value: [0.95454545 1.         0.95238095 0.83333333 0.91666667 0.95454545
 1.         0.88       0.95       0.90909091]

mean value: 0.935056277056277

key: train_precision
value: [0.97969543 0.97474747 0.97461929 0.98453608 0.97474747 0.97959184
 0.97474747 0.97461929 0.97461929 0.97979798]

mean value: 0.977172162274171

key: test_recall
value: [0.95454545 0.90909091 0.90909091 0.90909091 1.         0.95454545
 0.90909091 1.         0.86363636 0.90909091]

mean value: 0.9318181818181818

key: train_recall
value: [0.97474747 0.97474747 0.96969697 0.96464646 0.97474747 0.96969697
 0.97474747 0.96969697 0.96969697 0.97979798]

mean value: 0.9722222222222222

key: test_roc_auc
value: [0.9555336  0.95454545 0.93280632 0.86758893 0.95652174 0.95454545
 0.95454545 0.93181818 0.90909091 0.90909091]

mean value: 0.932608695652174

key: train_roc_auc
value: [0.97747275 0.9749975  0.97247225 0.97489749 0.9749975  0.97499627
 0.97505847 0.97253321 0.97253321 0.98004677]

mean value: 0.9750005419261139

key: test_jcc
value: [0.91304348 0.90909091 0.86956522 0.76923077 0.91666667 0.91304348
 0.90909091 0.88       0.82608696 0.83333333]

mean value: 0.873915171784737

key: train_jcc
value: [0.95544554 0.95073892 0.94581281 0.95024876 0.95073892 0.95049505
 0.95073892 0.94581281 0.94581281 0.96039604]

mean value: 0.9506240562296064

MCC on Blind test: 0.93

Accuracy on Blind test: 0.96

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01235938 0.01207018 0.01216507 0.01198816 0.01195812 0.01208854
 0.01212764 0.01205707 0.01214314 0.01219463]

mean value: 0.012115192413330079

key: score_time
value: [0.01045775 0.01042986 0.0105226  0.01041794 0.01037836 0.01043344
 0.01043797 0.01044178 0.01040506 0.01040316]

mean value: 0.010432791709899903

key: test_mcc
value: [0.68911026 0.77821935 0.64426877 0.60079051 0.70780516 0.63636364
 0.73029674 0.63636364 0.5547002  0.77352678]

mean value: 0.6751445052594404

key: train_mcc
value: [0.73006509 0.73513714 0.7700385  0.74497106 0.74497106 0.71074778
 0.7306343  0.75588396 0.69693637 0.75588396]

mean value: 0.7375269233005722

key: test_accuracy
value: [0.84444444 0.88888889 0.82222222 0.8        0.84444444 0.81818182
 0.86363636 0.81818182 0.77272727 0.88636364]

mean value: 0.8359090909090909

key: train_accuracy
value: [0.865      0.8675     0.885      0.8725     0.8725     0.8553616
 0.86533666 0.87780549 0.8478803  0.87780549]

mean value: 0.8686689526184539

key: test_fscore
value: [0.8372093  0.88372093 0.81818182 0.8        0.85714286 0.81818182
 0.85714286 0.81818182 0.75       0.88888889]

mean value: 0.8328650290278198

key: train_fscore
value: [0.8622449  0.86445013 0.88442211 0.87088608 0.87088608 0.85204082
 0.86294416 0.87780549 0.84073107 0.87780549]

mean value: 0.866421631011566

key: test_precision
value: [0.85714286 0.9047619  0.81818182 0.7826087  0.77777778 0.81818182
 0.9        0.81818182 0.83333333 0.86956522]

mean value: 0.8379735240604806

key: train_precision
value: [0.87113402 0.87564767 0.88       0.87309645 0.87309645 0.86082474
 0.86734694 0.86699507 0.87027027 0.86699507]

mean value: 0.8705406681510427

key: test_recall
value: [0.81818182 0.86363636 0.81818182 0.81818182 0.95454545 0.81818182
 0.81818182 0.81818182 0.68181818 0.90909091]

mean value: 0.8318181818181818

key: train_recall
value: [0.85353535 0.85353535 0.88888889 0.86868687 0.86868687 0.84343434
 0.85858586 0.88888889 0.81313131 0.88888889]

mean value: 0.8626262626262626

key: test_roc_auc
value: [0.84387352 0.88833992 0.82213439 0.80039526 0.84683794 0.81818182
 0.86363636 0.81818182 0.77272727 0.88636364]

mean value: 0.8360671936758893

key: train_roc_auc
value: [0.86488649 0.86736174 0.8850385  0.87246225 0.87246225 0.85521471
 0.86525352 0.87794198 0.84745236 0.87794198]

mean value: 0.8686015769064591

key: test_jcc
value: [0.72       0.79166667 0.69230769 0.66666667 0.75       0.69230769
 0.75       0.69230769 0.6        0.8       ]

mean value: 0.715525641025641

key: train_jcc
value: [0.75784753 0.76126126 0.79279279 0.77130045 0.77130045 0.74222222
 0.75892857 0.78222222 0.72522523 0.78222222]

mean value: 0.7645322947867791

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.6125834  0.7919445  0.91690707 0.64126277 0.27586269 3.05022645
 1.53905249 2.51379037 1.85096502 1.52625299]

mean value: 1.3718847751617431

key: score_time
value: [0.01470351 0.01476789 0.01252818 0.01451087 0.01576161 0.01224637
 0.01332831 0.0270524  0.01303458 0.01383972]

mean value: 0.015177345275878907

key: test_mcc
value: [0.95652174 0.91106719 0.91106719 0.91485328 1.         0.90909091
 0.86452993 0.90909091 0.95553309 0.81818182]

mean value: 0.9149936062423393

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.95555556 1.         0.95454545
 0.93181818 0.95454545 0.97727273 0.90909091]

mean value: 0.9571717171717172

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97777778 0.95454545 0.95454545 0.95652174 1.         0.95454545
 0.93023256 0.95454545 0.97674419 0.90909091]

mean value: 0.9568548988366986

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95652174 0.95454545 0.95454545 0.91666667 1.         0.95454545
 0.95238095 0.95454545 1.         0.90909091]

mean value: 0.9552842085450781

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.95454545 0.95454545 1.         1.         0.95454545
 0.90909091 0.95454545 0.95454545 0.90909091]

mean value: 0.9590909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97826087 0.9555336  0.9555336  0.95652174 1.         0.95454545
 0.93181818 0.95454545 0.97727273 0.90909091]

mean value: 0.9573122529644269

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95652174 0.91304348 0.91304348 0.91666667 1.         0.91304348
 0.86956522 0.91304348 0.95454545 0.83333333]

mean value: 0.9182806324110672

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.05299616 0.07507491 0.09356999 0.1008358  0.09290576 0.09139204
 0.07153034 0.07866263 0.10971498 0.08641815]

mean value: 0.08531007766723633

key: score_time
value: [0.02284002 0.02450013 0.01256418 0.02179718 0.02442145 0.02206469
 0.02130103 0.02369428 0.0252614  0.02076888]

mean value: 0.021921324729919433

key: test_mcc
value: [0.68911026 0.73559956 0.73320158 0.82574419 0.78530224 0.81818182
 0.63636364 0.63636364 0.68252363 0.6882472 ]

mean value: 0.7230637763883909

key: train_mcc
value: [0.90500656 0.93043262 0.91500719 0.91500719 0.90500656 0.91026694
 0.92519156 0.90522754 0.93021868 0.94034232]

mean value: 0.9181707163710942

key: test_accuracy
value: [0.84444444 0.86666667 0.86666667 0.91111111 0.88888889 0.90909091
 0.81818182 0.81818182 0.84090909 0.84090909]

mean value: 0.8605050505050506

key: train_accuracy
value: [0.9525     0.965      0.9575     0.9575     0.9525     0.95511222
 0.96259352 0.95261845 0.96508728 0.97007481]

mean value: 0.9590486284289277

key: test_fscore
value: [0.8372093  0.85714286 0.86363636 0.91304348 0.89361702 0.90909091
 0.81818182 0.81818182 0.84444444 0.85106383]

mean value: 0.8605611842328492

key: train_fscore
value: [0.95214106 0.96517413 0.95717884 0.95717884 0.95214106 0.95477387
 0.96221662 0.95189873 0.96482412 0.97      ]

mean value: 0.9587527276654001

key: test_precision
value: [0.85714286 0.9        0.86363636 0.875      0.84       0.90909091
 0.81818182 0.81818182 0.82608696 0.8       ]

mean value: 0.8507320722755506

key: train_precision
value: [0.94974874 0.95098039 0.95477387 0.95477387 0.94974874 0.95
 0.95979899 0.95431472 0.96       0.96039604]

mean value: 0.9544535373678533

key: test_recall
value: [0.81818182 0.81818182 0.86363636 0.95454545 0.95454545 0.90909091
 0.81818182 0.81818182 0.86363636 0.90909091]

mean value: 0.8727272727272728

key: train_recall
value: [0.95454545 0.97979798 0.95959596 0.95959596 0.95454545 0.95959596
 0.96464646 0.94949495 0.96969697 0.97979798]

mean value: 0.9631313131313132

key: test_roc_auc
value: [0.84387352 0.86561265 0.86660079 0.91205534 0.89031621 0.90909091
 0.81818182 0.81818182 0.84090909 0.84090909]

mean value: 0.8605731225296442

key: train_roc_auc
value: [0.95252025 0.96514651 0.95752075 0.95752075 0.95252025 0.95516744
 0.9626188  0.95257999 0.96514405 0.97019456]

mean value: 0.9590933354419185

key: test_jcc
value: [0.72       0.75       0.76       0.84       0.80769231 0.83333333
 0.69230769 0.69230769 0.73076923 0.74074074]

mean value: 0.7567150997150998

key: train_jcc
value: [0.90865385 0.93269231 0.9178744  0.9178744  0.90865385 0.91346154
 0.92718447 0.90821256 0.93203883 0.94174757]

mean value: 0.9208393764904951

MCC on Blind test: 0.7

Accuracy on Blind test: 0.85

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01488352 0.01467943 0.01454282 0.01407647 0.01490998 0.01458836
 0.02097845 0.01497102 0.01447248 0.01468301]

mean value: 0.01527855396270752

key: score_time
value: [0.01296782 0.01293969 0.0132544  0.0121994  0.01727891 0.01300836
 0.01371241 0.01255512 0.01260257 0.01297855]

mean value: 0.01334972381591797

key: test_mcc
value: [0.70501339 0.73559956 0.60079051 0.64613475 0.78530224 0.77352678
 0.77352678 0.7800135  0.50471461 0.77352678]

mean value: 0.7078148917725633

key: train_mcc
value: [0.73501647 0.71071591 0.78497756 0.75023791 0.72513051 0.72622252
 0.69655581 0.75082817 0.66140847 0.76056935]

mean value: 0.7301662688450195

key: test_accuracy
value: [0.84444444 0.86666667 0.8        0.82222222 0.88888889 0.88636364
 0.88636364 0.88636364 0.75       0.88636364]

mean value: 0.8517676767676767

key: train_accuracy
value: [0.8675     0.855      0.8925     0.875      0.8625     0.86284289
 0.8478803  0.87531172 0.83042394 0.88029925]

mean value: 0.8649258104738154

key: test_fscore
value: [0.82051282 0.85714286 0.8        0.80952381 0.89361702 0.88372093
 0.88372093 0.89361702 0.73170732 0.88372093]

mean value: 0.8457283637503524

key: train_fscore
value: [0.86513995 0.84974093 0.89113924 0.87179487 0.85933504 0.85788114
 0.84155844 0.87179487 0.8238342  0.87817259]

mean value: 0.8610391268444171

key: test_precision
value: [0.94117647 0.9        0.7826087  0.85       0.84       0.9047619
 0.9047619  0.84       0.78947368 0.9047619 ]

mean value: 0.865754456473665

key: train_precision
value: [0.87179487 0.87234043 0.89340102 0.88541667 0.87046632 0.87830688
 0.86631016 0.88541667 0.84574468 0.88265306]

mean value: 0.8751850747942309

key: test_recall
value: [0.72727273 0.81818182 0.81818182 0.77272727 0.95454545 0.86363636
 0.86363636 0.95454545 0.68181818 0.86363636]

mean value: 0.8318181818181818

key: train_recall
value: [0.85858586 0.82828283 0.88888889 0.85858586 0.84848485 0.83838384
 0.81818182 0.85858586 0.8030303  0.87373737]

mean value: 0.8474747474747475

key: test_roc_auc
value: [0.84189723 0.86561265 0.80039526 0.82114625 0.89031621 0.88636364
 0.88636364 0.88636364 0.75       0.88636364]

mean value: 0.8514822134387352

key: train_roc_auc
value: [0.86741174 0.85473547 0.89246425 0.87483748 0.86236124 0.86254167
 0.84751455 0.87510574 0.83008658 0.88021844]

mean value: 0.8647277166140259

key: test_jcc
value: [0.69565217 0.75       0.66666667 0.68       0.80769231 0.79166667
 0.79166667 0.80769231 0.57692308 0.79166667]

mean value: 0.7359626532887402

key: train_jcc
value: [0.76233184 0.73873874 0.80365297 0.77272727 0.75336323 0.75113122
 0.7264574  0.77272727 0.70044053 0.78280543]

mean value: 0.7564375898815598

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.02090478 0.01843095 0.04428411 0.05754352 0.04021621 0.04565191
 0.04121709 0.05066919 0.06244564 0.05169058]

mean value: 0.043305397033691406

key: score_time
value: [0.01230383 0.02153182 0.01792741 0.03710651 0.03666353 0.01897454
 0.02727604 0.02798796 0.02072001 0.02464175]

mean value: 0.02451333999633789

key: test_mcc
value: [0.82506438 0.72299881 0.43884363 0.77865613 0.87476705 0.73960026
 0.68313005 0.73029674 0.75592895 0.73029674]

mean value: 0.7279582734082828

key: train_mcc
value: [0.85528899 0.79408263 0.42430608 0.86966298 0.82348041 0.80626333
 0.81054468 0.86083265 0.76826689 0.87858211]

mean value: 0.7891310741692911

key: test_accuracy
value: [0.91111111 0.84444444 0.66666667 0.88888889 0.93333333 0.86363636
 0.81818182 0.86363636 0.86363636 0.86363636]

mean value: 0.8517171717171718

key: train_accuracy
value: [0.9275     0.89       0.655      0.9325     0.91       0.89526185
 0.89775561 0.9276808  0.87281796 0.93765586]

mean value: 0.8846172069825436

key: test_fscore
value: [0.9047619  0.81081081 0.48275862 0.88888889 0.93617021 0.85
 0.77777778 0.85714286 0.84210526 0.86956522]

mean value: 0.8219981553387051

key: train_fscore
value: [0.9276808  0.87709497 0.46511628 0.928      0.91304348 0.88202247
 0.88515406 0.92225201 0.85302594 0.93946731]

mean value: 0.8592857320609378

key: test_precision
value: [0.95       1.         1.         0.86956522 0.88       0.94444444
 1.         0.9        1.         0.83333333]

mean value: 0.9377342995169082

key: train_precision
value: [0.91625616 0.98125    1.         0.98305085 0.875      0.99367089
 0.99371069 0.98285714 0.99328859 0.90232558]

mean value: 0.9621409897849462

key: test_recall
value: [0.86363636 0.68181818 0.31818182 0.90909091 1.         0.77272727
 0.63636364 0.81818182 0.72727273 0.90909091]

mean value: 0.7636363636363637

key: train_recall
value: [0.93939394 0.79292929 0.3030303  0.87878788 0.95454545 0.79292929
 0.7979798  0.86868687 0.74747475 0.97979798]

mean value: 0.8055555555555556

key: test_roc_auc
value: [0.91007905 0.84090909 0.65909091 0.88932806 0.93478261 0.86363636
 0.81818182 0.86363636 0.86363636 0.86363636]

mean value: 0.8506916996047431

key: train_roc_auc
value: [0.92761776 0.8890389  0.65151515 0.9319682  0.91044104 0.89400159
 0.89652684 0.92695427 0.87127432 0.93817485]

mean value: 0.8837512938485966

key: test_jcc
value: [0.82608696 0.68181818 0.31818182 0.8        0.88       0.73913043
 0.63636364 0.75       0.72727273 0.76923077]

mean value: 0.7128084524171481

key: train_jcc
value: [0.86511628 0.78109453 0.3030303  0.86567164 0.84       0.78894472
 0.79396985 0.85572139 0.74371859 0.88584475]

mean value: 0.7723112058976718

MCC on Blind test: 0.58

Accuracy on Blind test: 0.76

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.06124377 0.04860258 0.05644703 0.0544219  0.04448938 0.02667141
 0.06391263 0.04354548 0.06314015 0.04449439]

mean value: 0.05069687366485596

key: score_time
value: [0.04430127 0.02314234 0.02817941 0.02064776 0.02313566 0.01204276
 0.0358007  0.02944756 0.0219655  0.02373385]

mean value: 0.026239681243896484

key: test_mcc
value: [0.78405645 0.73663511 0.73320158 0.73663511 0.8360602  0.7800135
 0.68313005 0.56694671 0.82158384 0.21483446]

mean value: 0.6893097003051051

key: train_mcc
value: [0.85260278 0.85532995 0.920046   0.90106836 0.86219639 0.8725435
 0.82264299 0.80465299 0.83772405 0.44233239]

mean value: 0.8171139401076141

key: test_accuracy
value: [0.88888889 0.86666667 0.86666667 0.86666667 0.91111111 0.88636364
 0.81818182 0.77272727 0.90909091 0.56818182]

mean value: 0.8354545454545454

key: train_accuracy
value: [0.9225     0.925      0.96       0.95       0.93       0.93516209
 0.90773067 0.89526185 0.91521197 0.66084788]

mean value: 0.9001714463840399

key: test_fscore
value: [0.87804878 0.86956522 0.86363636 0.86956522 0.91666667 0.89361702
 0.77777778 0.73684211 0.91304348 0.68852459]

mean value: 0.8407287218315779

key: train_fscore
value: [0.91598916 0.92822967 0.95979899 0.94818653 0.93170732 0.93658537
 0.899729   0.88268156 0.91943128 0.7443609 ]

mean value: 0.9066699774774758

key: test_precision
value: [0.94736842 0.83333333 0.86363636 0.83333333 0.84615385 0.84
 1.         0.875      0.875      0.53846154]

mean value: 0.8452286835971047

key: train_precision
value: [0.98830409 0.88181818 0.955      0.97340426 0.9009434  0.90566038
 0.97076023 0.9875     0.86607143 0.59281437]

mean value: 0.902227633803653

key: test_recall
value: [0.81818182 0.90909091 0.86363636 0.90909091 1.         0.95454545
 0.63636364 0.63636364 0.95454545 0.95454545]

mean value: 0.8636363636363636

key: train_recall
value: [0.85353535 0.97979798 0.96464646 0.92424242 0.96464646 0.96969697
 0.83838384 0.7979798  0.97979798 1.        ]

mean value: 0.9272727272727272

key: test_roc_auc
value: [0.88735178 0.86758893 0.86660079 0.86758893 0.91304348 0.88636364
 0.81818182 0.77272727 0.90909091 0.56818182]

mean value: 0.8356719367588933

key: train_roc_auc
value: [0.92181718 0.92554255 0.960046   0.94974497 0.93034303 0.9355874
 0.90687665 0.89406379 0.91600736 0.66502463]

mean value: 0.9005053584176151

key: test_jcc
value: [0.7826087  0.76923077 0.76       0.76923077 0.84615385 0.80769231
 0.63636364 0.58333333 0.84       0.525     ]

mean value: 0.7319613357656836

key: train_jcc
value: [0.845      0.86607143 0.92270531 0.90147783 0.87214612 0.88073394
 0.81773399 0.79       0.85087719 0.59281437]

mean value: 0.833956019315672

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.29672027 0.4601624  0.43517661 0.41409087 0.33378577 0.4149332
 0.38982415 0.22151065 0.21765399 0.2491889 ]

mean value: 0.3433046817779541

key: score_time
value: [0.0209012  0.02083015 0.0207603  0.0409019  0.04082274 0.0407865
 0.02054024 0.0209372  0.02071619 0.04051232]

mean value: 0.028770875930786134

key: test_mcc
value: [1.         0.91452919 0.91106719 0.91485328 1.         0.91287093
 0.86452993 0.95553309 0.95553309 0.86452993]

mean value: 0.9293446631562545

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.95555556 0.95555556 0.95555556 1.         0.95454545
 0.93181818 0.97727273 0.97727273 0.93181818]

mean value: 0.963939393939394

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 0.95454545 0.95652174 1.         0.95652174
 0.93023256 0.97777778 0.97674419 0.93023256]

mean value: 0.9634956965290635

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.95454545 0.91666667 1.         0.91666667
 0.95238095 0.95652174 1.         0.95238095]

mean value: 0.9649162431771128

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.95454545 1.         1.         1.
 0.90909091 1.         0.95454545 0.90909091]

mean value: 0.9636363636363636

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.95454545 0.9555336  0.95652174 1.         0.95454545
 0.93181818 0.97727273 0.97727273 0.93181818]

mean value: 0.9639328063241107

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 0.91304348 0.91666667 1.         0.91666667
 0.86956522 0.95652174 0.95454545 0.86956522]

mean value: 0.930566534914361

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.11559272 0.07994819 0.09201694 0.09574533 0.08644199 0.09145522
 0.11973166 0.12139821 0.11484957 0.12906265]

mean value: 0.10462424755096436

key: score_time
value: [0.0227809  0.02421069 0.02284312 0.02478456 0.02595878 0.02617145
 0.0282135  0.02829266 0.02692151 0.02787161]

mean value: 0.025804877281188965

key: test_mcc
value: [0.95652174 0.91106719 0.91106719 0.91485328 1.         0.90909091
 0.86452993 0.90909091 0.86452993 0.86452993]

mean value: 0.9105281028070168

key: train_mcc
value: [0.98004502 0.989999   0.9900495  0.99501169 0.99501169 0.97506905
 0.99502376 0.98009308 0.98514815 0.98009308]

mean value: 0.9865544028626636

key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.95555556 1.         0.95454545
 0.93181818 0.95454545 0.93181818 0.93181818]

mean value: 0.9548989898989899

key: train_accuracy
value: [0.99       0.995      0.995      0.9975     0.9975     0.98753117
 0.99750623 0.99002494 0.9925187  0.99002494]

mean value: 0.9932605985037406

key: test_fscore
value: [0.97777778 0.95454545 0.95454545 0.95652174 1.         0.95454545
 0.93023256 0.95454545 0.93023256 0.93333333]

mean value: 0.9546279784702434

key: train_fscore
value: [0.98984772 0.99494949 0.99497487 0.99746835 0.99746835 0.98734177
 0.99746835 0.98984772 0.9924812  0.98984772]

mean value: 0.9931695554980033

key: test_precision
value: [0.95652174 0.95454545 0.95454545 0.91666667 1.         0.95454545
 0.95238095 0.95454545 0.95238095 0.91304348]

mean value: 0.9509175607001694

key: train_precision
value: [0.99489796 0.99494949 0.99       1.         1.         0.98984772
 1.         0.99489796 0.98507463 0.99489796]

mean value: 0.9944565715102227

key: test_recall
value: [1.         0.95454545 0.95454545 1.         1.         0.95454545
 0.90909091 0.95454545 0.90909091 0.95454545]

mean value: 0.9590909090909091

key: train_recall
value: [0.98484848 0.99494949 1.         0.99494949 0.99494949 0.98484848
 0.99494949 0.98484848 1.         0.98484848]

mean value: 0.9919191919191919

key: test_roc_auc
value: [0.97826087 0.9555336  0.9555336  0.95652174 1.         0.95454545
 0.93181818 0.95454545 0.93181818 0.93181818]

mean value: 0.9550395256916997

key: train_roc_auc
value: [0.98994899 0.9949995  0.9950495  0.99747475 0.99747475 0.98749813
 0.99747475 0.98996119 0.99261084 0.98996119]

mean value: 0.9932453590186605

key: test_jcc
value: [0.95652174 0.91304348 0.91304348 0.91666667 1.         0.91304348
 0.86956522 0.91304348 0.86956522 0.875     ]

mean value: 0.9139492753623188

key: train_jcc
value: [0.9798995  0.98994975 0.99       0.99494949 0.99494949 0.975
 0.99494949 0.9798995  0.98507463 0.9798995 ]

mean value: 0.9864571352920186

MCC on Blind test: 0.96

Accuracy on Blind test: 0.98

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.16882658 0.22207904 0.2207644  0.23970628 0.2196691  0.18807268
 0.20044613 0.23684382 0.19207048 0.23526812]

mean value: 0.21237466335296631

key: score_time
value: [0.02288747 0.02833319 0.03328061 0.03298473 0.03413868 0.04017615
 0.03194618 0.03593063 0.03210187 0.03190231]

mean value: 0.03236818313598633

key: test_mcc
value: [0.70501339 0.64613475 0.55666994 0.51185771 0.51089209 0.68252363
 0.77352678 0.60678804 0.59152048 0.73029674]

mean value: 0.6315223557482503

key: train_mcc
value: [0.98510714 1.         0.99004752 0.99004752 0.99004752 0.99007143
 0.99007143 0.99502376 0.99007143 0.99007143]

mean value: 0.9910559193261419

key: test_accuracy
value: [0.84444444 0.82222222 0.77777778 0.75555556 0.75555556 0.84090909
 0.88636364 0.79545455 0.79545455 0.86363636]

mean value: 0.8137373737373738

key: train_accuracy
value: [0.9925     1.         0.995      0.995      0.995      0.99501247
 0.99501247 0.99750623 0.99501247 0.99501247]

mean value: 0.9955056109725686

key: test_fscore
value: [0.82051282 0.80952381 0.76190476 0.75555556 0.74418605 0.84444444
 0.88372093 0.76923077 0.79069767 0.86956522]

mean value: 0.8049342029726256

key: train_fscore
value: [0.99236641 1.         0.99492386 0.99492386 0.99492386 0.99492386
 0.99492386 0.99746835 0.99492386 0.99492386]

mean value: 0.9954301771720262

key: test_precision
value: [0.94117647 0.85       0.8        0.73913043 0.76190476 0.82608696
 0.9047619  0.88235294 0.80952381 0.83333333]

mean value: 0.8348270612592863

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.77272727 0.72727273 0.77272727 0.72727273 0.86363636
 0.86363636 0.68181818 0.77272727 0.90909091]

mean value: 0.7818181818181819

key: train_recall
value: [0.98484848 1.         0.98989899 0.98989899 0.98989899 0.98989899
 0.98989899 0.99494949 0.98989899 0.98989899]

mean value: 0.990909090909091

key: test_roc_auc
value: [0.84189723 0.82114625 0.77667984 0.75592885 0.75494071 0.84090909
 0.88636364 0.79545455 0.79545455 0.86363636]

mean value: 0.8132411067193676

key: train_roc_auc
value: [0.99242424 1.         0.99494949 0.99494949 0.99494949 0.99494949
 0.99494949 0.99747475 0.99494949 0.99494949]

mean value: 0.9954545454545455

key: test_jcc
value: [0.69565217 0.68       0.61538462 0.60714286 0.59259259 0.73076923
 0.79166667 0.625      0.65384615 0.76923077]

mean value: 0.676128505954593

key: train_jcc
value: [0.98484848 1.         0.98989899 0.98989899 0.98989899 0.98989899
 0.98989899 0.99494949 0.98989899 0.98989899]

mean value: 0.990909090909091

MCC on Blind test: 0.61

Accuracy on Blind test: 0.8

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.919806   1.02738333 0.90889049 0.87410593 1.04062057 1.07893395
 1.05196571 0.85241294 0.64149427 0.65414619]

mean value: 0.9049759387969971

key: score_time
value: [0.01315594 0.01307869 0.01312375 0.02761984 0.0129056  0.01343155
 0.01265907 0.01010847 0.01001835 0.00931072]

mean value: 0.013541197776794434

key: test_mcc
value: [0.95652174 0.91106719 0.91106719 0.91485328 1.         0.91287093
 0.90909091 0.95553309 0.90909091 0.81818182]

mean value: 0.9198277056731419

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97777778 0.95555556 0.95555556 0.95555556 1.         0.95454545
 0.95454545 0.97727273 0.95454545 0.90909091]

mean value: 0.9594444444444444

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97777778 0.95454545 0.95454545 0.95652174 1.         0.95652174
 0.95454545 0.97777778 0.95454545 0.90909091]

mean value: 0.9595871761089152

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95652174 0.95454545 0.95454545 0.91666667 1.         0.91666667
 0.95454545 0.95652174 0.95454545 0.90909091]

mean value: 0.9473649538866931

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.95454545 0.95454545 1.         1.         1.
 0.95454545 1.         0.95454545 0.90909091]

mean value: 0.9727272727272728

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97826087 0.9555336  0.9555336  0.95652174 1.         0.95454545
 0.95454545 0.97727273 0.95454545 0.90909091]

mean value: 0.9595849802371541

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.95652174 0.91304348 0.91304348 0.91666667 1.         0.91666667
 0.91304348 0.95652174 0.91304348 0.83333333]

mean value: 0.9231884057971014

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.1153357  0.06680679 0.06007385 0.09345102 0.07372856 0.1125803
 0.13374567 0.08797312 0.09443545 0.10179043]

mean value: 0.09399209022521973

key: score_time
value: [0.0196929  0.02079773 0.020576   0.02017331 0.03191543 0.02789617
 0.02618909 0.0400703  0.02141643 0.02100539]

mean value: 0.024973273277282715

key: test_mcc
value: [0.28827551 0.59109821 0.44008623 0.20198059 0.28827551 0.50471461
 0.48795004 0.50051733 0.56694671 0.28347335]

mean value: 0.4153318091778042

key: train_mcc
value: [0.84199403 0.87973027 0.96074967 0.91500719 0.88626479 0.96539284
 0.93515962 0.98514265 0.92771103 0.75793176]

mean value: 0.9055083848914132

key: test_accuracy
value: [0.64444444 0.77777778 0.71111111 0.6        0.64444444 0.75
 0.72727273 0.75       0.77272727 0.63636364]

mean value: 0.7014141414141414

key: train_accuracy
value: [0.915      0.9375     0.98       0.9575     0.94       0.98254364
 0.96758105 0.9925187  0.96259352 0.86533666]

mean value: 0.9500573566084788

key: test_fscore
value: [0.61904762 0.72222222 0.64864865 0.60869565 0.61904762 0.76595745
 0.66666667 0.74418605 0.73684211 0.57894737]

mean value: 0.6710261394811038

key: train_fscore
value: [0.90607735 0.93333333 0.97938144 0.95717884 0.93548387 0.98254364
 0.96708861 0.99236641 0.96062992 0.84210526]

mean value: 0.9456188682100336

key: test_precision
value: [0.65       0.92857143 0.8        0.58333333 0.65       0.72
 0.85714286 0.76190476 0.875      0.6875    ]

mean value: 0.7513452380952381

key: train_precision
value: [1.         0.98870056 1.         0.95477387 1.         0.97044335
 0.96954315 1.         1.         1.        ]

mean value: 0.9883460931280301

key: test_recall
value: [0.59090909 0.59090909 0.54545455 0.63636364 0.59090909 0.81818182
 0.54545455 0.72727273 0.63636364 0.5       ]

mean value: 0.6181818181818182

key: train_recall
value: [0.82828283 0.88383838 0.95959596 0.95959596 0.87878788 0.99494949
 0.96464646 0.98484848 0.92424242 0.72727273]

mean value: 0.9106060606060606

key: test_roc_auc
value: [0.64328063 0.77371542 0.70750988 0.60079051 0.64328063 0.75
 0.72727273 0.75       0.77272727 0.63636364]

mean value: 0.7004940711462451

key: train_roc_auc
value: [0.91414141 0.9369687  0.97979798 0.95752075 0.93939394 0.98269642
 0.96754491 0.99242424 0.96212121 0.86363636]

mean value: 0.9496245930011721

key: test_jcc
value: [0.44827586 0.56521739 0.48       0.4375     0.44827586 0.62068966
 0.5        0.59259259 0.58333333 0.40740741]

mean value: 0.5083292103948026

key: train_jcc
value: [0.82828283 0.875      0.95959596 0.9178744  0.87878788 0.96568627
 0.93627451 0.98484848 0.92424242 0.72727273]

mean value: 0.8997865483479294

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.04879665 0.06443977 0.06408358 0.06347299 0.0626421  0.05391741
 0.04901862 0.06241035 0.06304216 0.06274414]

mean value: 0.05945677757263183

key: score_time
value: [0.03340101 0.03197694 0.03278589 0.03180408 0.03367066 0.03720546
 0.0333674  0.02973032 0.03171229 0.03540778]

mean value: 0.0331061840057373

key: test_mcc
value: [0.78405645 0.78405645 0.73320158 0.8360602  0.82574419 0.81818182
 0.86452993 0.72727273 0.77352678 0.77352678]

mean value: 0.7920156928908952

key: train_mcc
value: [0.86529061 0.87073544 0.85500344 0.85510344 0.84528736 0.86562671
 0.87541359 0.86032741 0.8705095  0.8903152 ]

mean value: 0.8653612710221481

key: test_accuracy
value: [0.88888889 0.88888889 0.86666667 0.91111111 0.91111111 0.90909091
 0.93181818 0.86363636 0.88636364 0.88636364]

mean value: 0.8943939393939394

key: train_accuracy
value: [0.9325     0.935      0.9275     0.9275     0.9225     0.93266833
 0.93765586 0.93017456 0.93516209 0.94513716]

mean value: 0.9325798004987531

key: test_fscore
value: [0.87804878 0.87804878 0.86363636 0.91666667 0.91304348 0.90909091
 0.93023256 0.86363636 0.88888889 0.88888889]

mean value: 0.8930181678184095

key: train_fscore
value: [0.93266833 0.93564356 0.92695214 0.9273183  0.92269327 0.93266833
 0.93734336 0.92929293 0.935      0.94472362]

mean value: 0.9324303832120122

key: test_precision
value: [0.94736842 0.94736842 0.86363636 0.84615385 0.875      0.90909091
 0.95238095 0.86363636 0.86956522 0.86956522]

mean value: 0.8943765711786307

key: train_precision
value: [0.92118227 0.91747573 0.92462312 0.92039801 0.91133005 0.92118227
 0.93034826 0.92929293 0.92574257 0.94      ]

mean value: 0.9241575197221089

key: test_recall
value: [0.81818182 0.81818182 0.86363636 1.         0.95454545 0.90909091
 0.90909091 0.86363636 0.90909091 0.90909091]

mean value: 0.8954545454545455

key: train_recall
value: [0.94444444 0.95454545 0.92929293 0.93434343 0.93434343 0.94444444
 0.94444444 0.92929293 0.94444444 0.94949495]

mean value: 0.9409090909090909

key: test_roc_auc
value: [0.88735178 0.88735178 0.86660079 0.91304348 0.91205534 0.90909091
 0.93181818 0.86363636 0.88636364 0.88636364]

mean value: 0.8943675889328064

key: train_roc_auc
value: [0.93261826 0.93519352 0.92751775 0.92756776 0.92261726 0.93281336
 0.93773946 0.93016371 0.93527641 0.94519082]

mean value: 0.9326698310225111

key: test_jcc
value: [0.7826087  0.7826087  0.76       0.84615385 0.84       0.83333333
 0.86956522 0.76       0.8        0.8       ]

mean value: 0.8074269788182832

key: train_jcc
value: [0.87383178 0.87906977 0.86384977 0.86448598 0.85648148 0.87383178
 0.88207547 0.86792453 0.87793427 0.8952381 ]

mean value: 0.8734722914430403

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.55836701 0.52241182 0.42924356 0.42723894 0.42407441 0.45521879
 0.46713424 0.43198848 0.44248724 0.43643761]

mean value: 0.4594602108001709

key: score_time
value: [0.03078747 0.03211427 0.03577662 0.03534436 0.03439927 0.0340023
 0.0342772  0.03501654 0.03661156 0.03467274]

mean value: 0.03430023193359375

key: test_mcc
value: [0.78405645 0.78405645 0.60000118 0.8360602  0.82574419 0.81818182
 0.81818182 0.72727273 0.77352678 0.77352678]

mean value: 0.774060840755457

key: train_mcc
value: [0.86529061 0.87073544 0.81510094 0.809981   0.84528736 0.86562671
 0.90023387 0.86032741 0.8705095  0.8903152 ]

mean value: 0.8593408045055162

key: test_accuracy
value: [0.88888889 0.88888889 0.8        0.91111111 0.91111111 0.90909091
 0.90909091 0.86363636 0.88636364 0.88636364]

mean value: 0.8854545454545455

key: train_accuracy
value: [0.9325     0.935      0.9075     0.905      0.9225     0.93266833
 0.95012469 0.93017456 0.93516209 0.94513716]

mean value: 0.9295766832917706

key: test_fscore
value: [0.87804878 0.87804878 0.79069767 0.91666667 0.91304348 0.90909091
 0.90909091 0.86363636 0.88888889 0.88888889]

mean value: 0.883610133991771

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.93266833 0.93564356 0.90726817 0.9040404  0.92269327 0.93266833
 0.94949495 0.92929293 0.935      0.94472362]

mean value: 0.9293493560888269

key: test_precision
value: [0.94736842 0.94736842 0.80952381 0.84615385 0.875      0.90909091
 0.90909091 0.86363636 0.86956522 0.86956522]

mean value: 0.884636311438371

key: train_precision
value: [0.92118227 0.91747573 0.90049751 0.9040404  0.91133005 0.92118227
 0.94949495 0.92929293 0.92574257 0.94      ]

mean value: 0.9220238678959648

key: test_recall
value: [0.81818182 0.81818182 0.77272727 1.         0.95454545 0.90909091
 0.90909091 0.86363636 0.90909091 0.90909091]

mean value: 0.8863636363636364

key: train_recall
value: [0.94444444 0.95454545 0.91414141 0.9040404  0.93434343 0.94444444
 0.94949495 0.92929293 0.94444444 0.94949495]

mean value: 0.9368686868686869

key: test_roc_auc
value: [0.88735178 0.88735178 0.79940711 0.91304348 0.91205534 0.90909091
 0.90909091 0.86363636 0.88636364 0.88636364]

mean value: 0.8853754940711462

key: train_roc_auc
value: [0.93261826 0.93519352 0.90756576 0.9049905  0.92261726 0.93281336
 0.95011693 0.93016371 0.93527641 0.94519082]

mean value: 0.9296546526573839

key: test_jcc
value: [0.7826087  0.7826087  0.65384615 0.84615385 0.84       0.83333333
 0.83333333 0.76       0.8        0.8       ]

mean value: 0.7931884057971015

key: train_jcc
value: [0.87383178 0.87906977 0.83027523 0.82488479 0.85648148 0.87383178
 0.90384615 0.86792453 0.87793427 0.8952381 ]

mean value: 0.8683317871996343

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.1082418  0.11077762 0.11822391 0.08075857 0.08925223 0.06861591
 0.10095501 0.08981323 0.08645415 0.09447336]

mean value: 0.09475657939910889

key: score_time
value: [0.02976584 0.03569388 0.0412004  0.01653552 0.02434349 0.02014518
 0.02598023 0.02477598 0.03175187 0.02436733]

mean value: 0.027455973625183105

key: test_mcc
value: [0.77865613 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419
 0.69583743 0.68911026 0.95652174 0.82213439]

mean value: 0.7925409622270596

key: train_mcc
value: [0.85688852 0.8716498  0.86172755 0.86177295 0.87664317 0.85687806
 0.86188899 0.871768   0.85679795 0.86188899]

mean value: 0.8637903997161932

key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111
 0.84444444 0.84444444 0.97777778 0.91111111]

mean value: 0.8955555555555555

key: train_accuracy
value: [0.92839506 0.93580247 0.9308642  0.9308642  0.9382716  0.92839506
 0.9308642  0.93580247 0.92839506 0.9308642 ]

mean value: 0.9318518518518518

key: test_fscore
value: [0.88888889 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348
 0.85106383 0.8372093  0.97777778 0.90909091]

mean value: 0.8966281710028886

key: train_fscore
value: [0.92874693 0.93596059 0.93069307 0.93103448 0.93857494 0.92909535
 0.93170732 0.93658537 0.92874693 0.93170732]

mean value: 0.932285229379058

key: test_precision
value: [0.90909091 0.82608696 0.91666667 0.875      0.95454545 0.875
 0.8        0.85714286 0.95652174 0.90909091]

mean value: 0.887914549218897

key: train_precision
value: [0.92195122 0.93137255 0.93069307 0.92647059 0.93170732 0.9223301
 0.92270531 0.92753623 0.92647059 0.92270531]

mean value: 0.9263942288373254

key: test_recall
value: [0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
 0.90909091 0.81818182 1.         0.90909091]

mean value: 0.9069169960474308

key: train_recall
value: [0.93564356 0.94059406 0.93069307 0.93564356 0.94554455 0.93596059
 0.9408867  0.94581281 0.93103448 0.9408867 ]

mean value: 0.9382700092669365

key: test_roc_auc
value: [0.88932806 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534
 0.8458498  0.84387352 0.97826087 0.91106719]

mean value: 0.8957509881422925

key: train_roc_auc
value: [0.92841292 0.93581427 0.93086378 0.93087597 0.93828952 0.92837634
 0.93083939 0.93577769 0.92838853 0.93083939]

mean value: 0.9318477783738965

key: test_jcc
value: [0.8        0.7037037  0.88       0.80769231 0.875      0.84
 0.74074074 0.72       0.95652174 0.83333333]

mean value: 0.815699182460052

key: train_jcc
value: [0.86697248 0.87962963 0.87037037 0.87096774 0.88425926 0.86757991
 0.87214612 0.88073394 0.86697248 0.87214612]

mean value: 0.8731778046396034

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.91950297 2.00819826 2.98527718 1.89961886 0.94165349 1.18352866
 0.94931173 1.02501869 0.95997047 0.98270178]

mean value: 1.4854782104492188

key: score_time
value: [0.01864243 0.04957104 0.03117299 0.01511431 0.01509213 0.0130074
 0.01559639 0.01299739 0.01515102 0.01567769]

mean value: 0.020202279090881348

key: test_mcc
value: [0.82574419 0.68911026 0.86732843 0.77821935 0.82574419 0.82574419
 0.73663511 0.68911026 0.95652174 0.77821935]

mean value: 0.797237707853288

key: train_mcc
value: [0.8965753  0.90127552 0.89135736 0.89139819 0.90627515 0.82225691
 0.89630533 0.84716163 0.88164702 0.89152603]

mean value: 0.8825778442400698

key: test_accuracy
value: [0.91111111 0.84444444 0.93333333 0.88888889 0.91111111 0.91111111
 0.86666667 0.84444444 0.97777778 0.88888889]

mean value: 0.8977777777777778

key: train_accuracy
value: [0.94814815 0.95061728 0.94567901 0.94567901 0.95308642 0.91111111
 0.94814815 0.92345679 0.94074074 0.94567901]

mean value: 0.9412345679012346

key: test_fscore
value: [0.90909091 0.85106383 0.93617021 0.89361702 0.90909091 0.91304348
 0.86956522 0.8372093  0.97777778 0.88372093]

mean value: 0.8980349587999696

key: train_fscore
value: [0.94865526 0.95024876 0.94554455 0.94527363 0.95331695 0.91176471
 0.94840295 0.92457421 0.94146341 0.94634146]

mean value: 0.9415585894135641

key: test_precision
value: [0.95238095 0.83333333 0.91666667 0.875      0.95238095 0.875
 0.83333333 0.85714286 0.95652174 0.9047619 ]

mean value: 0.8956521739130434

key: train_precision
value: [0.93719807 0.955      0.94554455 0.95       0.94634146 0.90731707
 0.94607843 0.91346154 0.93236715 0.93719807]

mean value: 0.9370506345899053

key: test_recall
value: [0.86956522 0.86956522 0.95652174 0.91304348 0.86956522 0.95454545
 0.90909091 0.81818182 1.         0.86363636]

mean value: 0.9023715415019763

key: train_recall
value: [0.96039604 0.94554455 0.94554455 0.94059406 0.96039604 0.91625616
 0.95073892 0.93596059 0.95073892 0.95566502]

mean value: 0.9461834853436082

key: test_roc_auc
value: [0.91205534 0.84387352 0.93280632 0.88833992 0.91205534 0.91205534
 0.86758893 0.84387352 0.97826087 0.88833992]

mean value: 0.8979249011857707

key: train_roc_auc
value: [0.94817832 0.95060479 0.94567868 0.94566649 0.95310442 0.91109838
 0.94814174 0.92342584 0.94071599 0.94565429]

mean value: 0.9412268936253231

key: test_jcc
value: [0.83333333 0.74074074 0.88       0.80769231 0.83333333 0.84
 0.76923077 0.72       0.95652174 0.79166667]

mean value: 0.8172518890127586

key: train_jcc
value: [0.90232558 0.90521327 0.89671362 0.89622642 0.91079812 0.83783784
 0.90186916 0.85972851 0.88940092 0.89814815]

mean value: 0.8898261577031877

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.02403355 0.01210332 0.01167107 0.01175499 0.01173353 0.01152921
 0.01138949 0.01154685 0.01153827 0.01136518]

mean value: 0.012866544723510741

key: score_time
value: [0.01191664 0.01043248 0.01031709 0.01031232 0.00995064 0.01007748
 0.00997305 0.00997305 0.01015615 0.00996041]

mean value: 0.010306930541992188

key: test_mcc
value: [0.86758893 0.55841694 0.60079051 0.61706091 0.74605372 0.73320158
 0.42993591 0.60000118 0.59109821 0.69404997]

mean value: 0.6438197861131165

key: train_mcc
value: [0.67340117 0.66345741 0.67340117 0.68334493 0.69787618 0.66984267
 0.647501   0.71790239 0.68673529 0.6682388 ]

mean value: 0.6781701020371597

key: test_accuracy
value: [0.93333333 0.77777778 0.8        0.8        0.86666667 0.86666667
 0.71111111 0.8        0.77777778 0.84444444]

mean value: 0.8177777777777778

key: train_accuracy
value: [0.8345679  0.82962963 0.8345679  0.83950617 0.84691358 0.83209877
 0.81481481 0.85679012 0.84197531 0.83209877]

mean value: 0.8362962962962963

key: test_fscore
value: [0.93333333 0.77272727 0.8        0.7804878  0.85714286 0.86363636
 0.66666667 0.79069767 0.72222222 0.82926829]

mean value: 0.8016182487708297

key: train_fscore
value: [0.82414698 0.81889764 0.82414698 0.82939633 0.83769634 0.82105263
 0.79108635 0.84895833 0.83505155 0.82291667]

mean value: 0.8253349790533351

key: test_precision
value: [0.95454545 0.80952381 0.81818182 0.88888889 0.94736842 0.86363636
 0.76470588 0.80952381 0.92857143 0.89473684]

mean value: 0.8679682718382409

key: train_precision
value: [0.87709497 0.87150838 0.87709497 0.88268156 0.88888889 0.88135593
 0.91025641 0.90055249 0.87567568 0.87292818]

mean value: 0.8838037458275947

key: test_recall
value: [0.91304348 0.73913043 0.7826087  0.69565217 0.7826087  0.86363636
 0.59090909 0.77272727 0.59090909 0.77272727]

mean value: 0.750395256916996

key: train_recall
value: [0.77722772 0.77227723 0.77722772 0.78217822 0.79207921 0.76847291
 0.69950739 0.80295567 0.79802956 0.77832512]

mean value: 0.774828073940399

key: test_roc_auc
value: [0.93379447 0.77865613 0.80039526 0.80237154 0.86857708 0.86660079
 0.70849802 0.79940711 0.77371542 0.84288538]

mean value: 0.8174901185770751

key: train_roc_auc
value: [0.83442667 0.82948837 0.83442667 0.83936497 0.84677852 0.83225626
 0.81510023 0.85692338 0.84208409 0.83223187]

mean value: 0.8363081012534751

key: test_jcc
value: [0.875      0.62962963 0.66666667 0.64       0.75       0.76
 0.5        0.65384615 0.56521739 0.70833333]

mean value: 0.6748693174780132

key: train_jcc
value: [0.70089286 0.69333333 0.70089286 0.70852018 0.72072072 0.69642857
 0.65437788 0.73755656 0.71681416 0.69911504]

mean value: 0.7028652163950665

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01156735 0.01157784 0.01160192 0.01164412 0.01163125 0.01170754
 0.01092172 0.01057267 0.0117619  0.0108943 ]

mean value: 0.011388063430786133

key: score_time
value: [0.00963807 0.01005244 0.00998616 0.01013279 0.01015091 0.01013255
 0.01019764 0.0097158  0.01023817 0.01015043]

mean value: 0.010039496421813964

key: test_mcc
value: [0.74605372 0.4229249  0.68972332 0.69404997 0.78530224 0.78530224
 0.55841694 0.64426877 0.73559956 0.60637261]

mean value: 0.666801427789491

key: train_mcc
value: [0.7385111  0.69500224 0.73847923 0.75849711 0.72839898 0.76296152
 0.77288136 0.7777832  0.72358281 0.73337398]

mean value: 0.7429471510582903

key: test_accuracy
value: [0.86666667 0.71111111 0.84444444 0.84444444 0.88888889 0.88888889
 0.77777778 0.82222222 0.86666667 0.8       ]

mean value: 0.8311111111111111

key: train_accuracy
value: [0.8691358  0.84691358 0.8691358  0.87901235 0.86419753 0.88148148
 0.88641975 0.88888889 0.8617284  0.86666667]

mean value: 0.871358024691358

key: test_fscore
value: [0.85714286 0.71111111 0.84444444 0.85714286 0.88372093 0.89361702
 0.7826087  0.81818182 0.85714286 0.80851064]

mean value: 0.8313623230625146

key: train_fscore
value: [0.87041565 0.84183673 0.86716792 0.88077859 0.86352357 0.8817734
 0.88613861 0.88943489 0.86341463 0.86633663]

mean value: 0.8710820634544677

key: test_precision
value: [0.94736842 0.72727273 0.86363636 0.80769231 0.95       0.84
 0.75       0.81818182 0.9        0.76      ]

mean value: 0.8364151637835848

key: train_precision
value: [0.85990338 0.86842105 0.87817259 0.86602871 0.86567164 0.8817734
 0.89054726 0.8872549  0.85507246 0.87064677]

mean value: 0.8723492167626019

key: test_recall
value: [0.7826087  0.69565217 0.82608696 0.91304348 0.82608696 0.95454545
 0.81818182 0.81818182 0.81818182 0.86363636]

mean value: 0.8316205533596838

key: train_recall
value: [0.88118812 0.81683168 0.85643564 0.8960396  0.86138614 0.8817734
 0.8817734  0.89162562 0.87192118 0.86206897]

mean value: 0.8701043749695166

key: test_roc_auc
value: [0.86857708 0.71146245 0.84486166 0.84288538 0.89031621 0.89031621
 0.77865613 0.82213439 0.86561265 0.8013834 ]

mean value: 0.8316205533596839

key: train_roc_auc
value: [0.86916549 0.84683949 0.86910452 0.87905428 0.86419061 0.88148076
 0.88643125 0.88888211 0.86170317 0.86667805]

mean value: 0.8713529727356972

key: test_jcc
value: [0.75       0.55172414 0.73076923 0.75       0.79166667 0.80769231
 0.64285714 0.69230769 0.75       0.67857143]

mean value: 0.7145588606795503

key: train_jcc
value: [0.77056277 0.72687225 0.76548673 0.78695652 0.75982533 0.78854626
 0.79555556 0.80088496 0.75965665 0.76419214]

mean value: 0.7718539151085453

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01085877 0.0109148  0.0109849  0.01096702 0.01075697 0.01008892
 0.01093984 0.01145196 0.01152682 0.01099157]

mean value: 0.01094815731048584

key: score_time
value: [0.01825953 0.01752472 0.01740408 0.01779437 0.01755857 0.01792526
 0.01943898 0.01912856 0.01847744 0.01721025]

mean value: 0.018072175979614257

key: test_mcc
value: [0.5169078  0.48698902 0.51185771 0.44008623 0.64752602 0.54071329
 0.37774032 0.37774032 0.58158    0.60000118]

mean value: 0.5081141873870801

key: train_mcc
value: [0.69876844 0.71387102 0.68398976 0.72358281 0.68953436 0.71448494
 0.70937299 0.6938666  0.69877579 0.72349713]

mean value: 0.7049743847733497

key: test_accuracy
value: [0.75555556 0.73333333 0.75555556 0.71111111 0.82222222 0.75555556
 0.68888889 0.68888889 0.75555556 0.8       ]

mean value: 0.7466666666666667

key: train_accuracy
value: [0.84938272 0.85679012 0.84197531 0.8617284  0.84444444 0.85679012
 0.85432099 0.84691358 0.84938272 0.8617284 ]

mean value: 0.8523456790123457

key: test_fscore
value: [0.74418605 0.7        0.75555556 0.75471698 0.81818182 0.78431373
 0.66666667 0.66666667 0.66666667 0.79069767]

mean value: 0.7347651801289877

key: train_fscore
value: [0.84863524 0.85427136 0.84236453 0.86       0.84050633 0.85353535
 0.85138539 0.84653465 0.84938272 0.86138614]

mean value: 0.8508001705741715

key: test_precision
value: [0.8        0.82352941 0.77272727 0.66666667 0.85714286 0.68965517
 0.7        0.7        1.         0.80952381]

mean value: 0.7819245190239105

key: train_precision
value: [0.85074627 0.86734694 0.83823529 0.86868687 0.86010363 0.87564767
 0.87113402 0.85074627 0.85148515 0.86567164]

mean value: 0.8599803745154699

key: test_recall
value: [0.69565217 0.60869565 0.73913043 0.86956522 0.7826087  0.90909091
 0.63636364 0.63636364 0.5        0.77272727]

mean value: 0.7150197628458498

key: train_recall
value: [0.84653465 0.84158416 0.84653465 0.85148515 0.82178218 0.83251232
 0.83251232 0.84236453 0.84729064 0.85714286]

mean value: 0.841974345217773

key: test_roc_auc
value: [0.756917   0.73616601 0.75592885 0.70750988 0.82312253 0.75889328
 0.68774704 0.68774704 0.75       0.79940711]

mean value: 0.7463438735177865

key: train_roc_auc
value: [0.8493757  0.85675267 0.84198654 0.86170317 0.84438863 0.85685022
 0.85437497 0.84692484 0.84938789 0.86173975]

mean value: 0.8523484368141248

key: test_jcc
value: [0.59259259 0.53846154 0.60714286 0.60606061 0.69230769 0.64516129
 0.5        0.5        0.5        0.65384615]

mean value: 0.5835572730734021

key: train_jcc
value: [0.73706897 0.74561404 0.72765957 0.75438596 0.72489083 0.74449339
 0.74122807 0.73390558 0.73819742 0.75652174]

mean value: 0.7403965575347853

MCC on Blind test: 0.41

Accuracy on Blind test: 0.71

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0228219  0.02206445 0.02394438 0.02210402 0.02183819 0.02219343
 0.02140975 0.01981401 0.02201772 0.0217278 ]

mean value: 0.021993565559387206

key: score_time
value: [0.01345062 0.01252699 0.01257086 0.01266456 0.01265097 0.01267982
 0.01264071 0.01242614 0.01275897 0.01246643]

mean value: 0.012683606147766114

key: test_mcc
value: [0.73663511 0.68972332 0.86732843 0.73559956 0.86758893 0.82574419
 0.77865613 0.68911026 0.95652174 0.73320158]

mean value: 0.7880109260079755

key: train_mcc
value: [0.79762457 0.82239025 0.80251189 0.80246793 0.80766419 0.81237958
 0.80741373 0.82237294 0.80261491 0.80741373]

mean value: 0.8084853722573764

key: test_accuracy
value: [0.86666667 0.84444444 0.93333333 0.86666667 0.93333333 0.91111111
 0.88888889 0.84444444 0.97777778 0.86666667]

mean value: 0.8933333333333333

key: train_accuracy
value: [0.89876543 0.91111111 0.90123457 0.90123457 0.9037037  0.90617284
 0.9037037  0.91111111 0.90123457 0.9037037 ]

mean value: 0.9041975308641975

key: test_fscore
value: [0.86363636 0.84444444 0.93617021 0.875      0.93333333 0.91304348
 0.88888889 0.8372093  0.97777778 0.86363636]

mean value: 0.893314016506958

key: train_fscore
value: [0.8992629  0.91176471 0.90147783 0.9009901  0.90464548 0.90686275
 0.9041769  0.91219512 0.90243902 0.9041769 ]

mean value: 0.9047991713233395

key: test_precision
value: [0.9047619  0.86363636 0.91666667 0.84       0.95454545 0.875
 0.86956522 0.85714286 0.95652174 0.86363636]

mean value: 0.8901476566911349

key: train_precision
value: [0.89268293 0.90291262 0.89705882 0.9009901  0.89371981 0.90243902
 0.90196078 0.90338164 0.89371981 0.90196078]

mean value: 0.8990826319784146

key: test_recall
value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
 0.90909091 0.81818182 1.         0.86363636]

mean value: 0.8980237154150198

key: train_recall
value: [0.90594059 0.92079208 0.90594059 0.9009901  0.91584158 0.91133005
 0.90640394 0.92118227 0.91133005 0.90640394]

mean value: 0.9106155196800468

key: test_roc_auc
value: [0.86758893 0.84486166 0.93280632 0.86561265 0.93379447 0.91205534
 0.88932806 0.84387352 0.97826087 0.86660079]

mean value: 0.8934782608695653

key: train_roc_auc
value: [0.8987831  0.91113496 0.90124616 0.90123397 0.9037336  0.90616007
 0.90369702 0.91108618 0.90120958 0.90369702]

mean value: 0.9041981661220309

key: test_jcc
value: [0.76       0.73076923 0.88       0.77777778 0.875      0.84
 0.8        0.72       0.95652174 0.76      ]

mean value: 0.8100068747677444

key: train_jcc
value: [0.81696429 0.83783784 0.8206278  0.81981982 0.82589286 0.82959641
 0.82511211 0.83856502 0.82222222 0.82511211]

mean value: 0.8261750475651821

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.65939665 1.75964832 0.57226229 0.91595173 0.58495402 0.47136617
 0.58204484 0.84763002 0.33321142 0.77313566]

mean value: 0.7499601125717164

key: score_time
value: [0.01269627 0.01352549 0.01349235 0.01917243 0.01356983 0.01362419
 0.01330686 0.01411915 0.01365542 0.01274872]

mean value: 0.013991069793701173

key: test_mcc
value: [0.73663511 0.670374   0.86732843 0.82506438 0.86758893 0.82574419
 0.77821935 0.73559956 1.         0.74410286]

mean value: 0.805065681579282

key: train_mcc
value: [0.83730123 0.85146676 0.82742221 0.8520244  0.83086317 0.81729057
 0.83012449 0.82799641 0.7927359  0.81956701]

mean value: 0.8286792157204221

key: test_accuracy
value: [0.86666667 0.82222222 0.93333333 0.91111111 0.93333333 0.91111111
 0.88888889 0.86666667 1.         0.86666667]

mean value: 0.9

key: train_accuracy
value: [0.91851852 0.92345679 0.91358025 0.92592593 0.91358025 0.90864198
 0.91358025 0.91358025 0.8962963  0.90864198]

mean value: 0.9135802469135802

key: test_fscore
value: [0.86363636 0.8        0.93617021 0.91666667 0.93333333 0.91304348
 0.88372093 0.85714286 1.         0.85      ]

mean value: 0.8953713842038605

key: train_fscore
value: [0.9193154  0.91906005 0.91442543 0.92647059 0.91725768 0.90909091
 0.91002571 0.91183879 0.89756098 0.90537084]

mean value: 0.9130416381528887

key: test_precision
value: [0.9047619  0.94117647 0.91666667 0.88       0.95454545 0.875
 0.9047619  0.9        1.         0.94444444]

mean value: 0.922135684576861

key: train_precision
value: [0.90821256 0.97237569 0.90338164 0.91747573 0.87782805 0.90686275
 0.9516129  0.93298969 0.88888889 0.94148936]

mean value: 0.920111726559678

key: test_recall
value: [0.82608696 0.69565217 0.95652174 0.95652174 0.91304348 0.95454545
 0.86363636 0.81818182 1.         0.77272727]

mean value: 0.8756916996047431

key: train_recall
value: [0.93069307 0.87128713 0.92574257 0.93564356 0.96039604 0.91133005
 0.87192118 0.89162562 0.90640394 0.87192118]

mean value: 0.9076964346680974

key: test_roc_auc
value: [0.86758893 0.82509881 0.93280632 0.91007905 0.93379447 0.91205534
 0.88833992 0.86561265 1.         0.86462451]

mean value: 0.9

key: train_roc_auc
value: [0.91854851 0.92332829 0.9136102  0.92594986 0.91369556 0.90863532
 0.91368336 0.91363459 0.89627128 0.90873287]

mean value: 0.9136089840511145

key: test_jcc
value: [0.76       0.66666667 0.88       0.84615385 0.875      0.84
 0.79166667 0.75       1.         0.73913043]

mean value: 0.8148617614269789

key: train_jcc
value: [0.85067873 0.85024155 0.84234234 0.8630137  0.84716157 0.83333333
 0.83490566 0.83796296 0.81415929 0.8271028 ]

mean value: 0.8400901944397646

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02700424 0.02184582 0.02058792 0.02255273 0.01992631 0.01929307
 0.0199616  0.0216949  0.02253938 0.02359295]

mean value: 0.021899890899658204

key: score_time
value: [0.01251793 0.00972891 0.00979257 0.00993633 0.00901151 0.00906038
 0.00911379 0.00917554 0.00938892 0.00946093]

mean value: 0.009718680381774902

key: test_mcc
value: [0.77865613 0.91106719 0.82506438 0.95643752 0.82213439 0.91485328
 0.95652174 0.77821935 0.95643752 0.91452919]

mean value: 0.8813920675315654

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 0.95555556 0.91111111 0.97777778 0.91111111 0.95555556
 0.97777778 0.88888889 0.97777778 0.95555556]

mean value: 0.94

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.95652174 0.91666667 0.9787234  0.91304348 0.95652174
 0.97777778 0.88372093 0.97674419 0.95238095]

mean value: 0.9400989762770414

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 0.95652174 0.88       0.95833333 0.91304348 0.91666667
 0.95652174 0.9047619  1.         1.        ]

mean value: 0.9394939770374553

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86956522 0.95652174 0.95652174 1.         0.91304348 1.
 1.         0.86363636 0.95454545 0.90909091]

mean value: 0.9422924901185771

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.88932806 0.9555336  0.91007905 0.97727273 0.91106719 0.95652174
 0.97826087 0.88833992 0.97727273 0.95454545]

mean value: 0.9398221343873517

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.91666667 0.84615385 0.95833333 0.84       0.91666667
 0.95652174 0.79166667 0.95454545 0.90909091]

mean value: 0.8889645282253977

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.12013102 0.11766171 0.11774349 0.11807728 0.11954403 0.11755204
 0.13109112 0.13012242 0.13104153 0.13023996]

mean value: 0.12332046031951904

key: score_time
value: [0.01799774 0.0182426  0.01839256 0.01811171 0.01826453 0.02004719
 0.02000332 0.01999044 0.01991343 0.01994133]

mean value: 0.019090485572814942

key: test_mcc
value: [0.82574419 0.64426877 0.91106719 0.78405645 0.78530224 0.8360602
 0.73663511 0.64426877 0.91106719 0.82213439]

mean value: 0.7900604515356031

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.82222222 0.95555556 0.88888889 0.88888889 0.91111111
 0.86666667 0.82222222 0.95555556 0.91111111]

mean value: 0.8933333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.82608696 0.95652174 0.89795918 0.88372093 0.91666667
 0.86956522 0.81818182 0.95454545 0.90909091]

mean value: 0.8941429784525263

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95238095 0.82608696 0.95652174 0.84615385 0.95       0.84615385
 0.83333333 0.81818182 0.95454545 0.90909091]

mean value: 0.8892448855492334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86956522 0.82608696 0.95652174 0.95652174 0.82608696 1.
 0.90909091 0.81818182 0.95454545 0.90909091]

mean value: 0.9025691699604743

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91205534 0.82213439 0.9555336  0.88735178 0.89031621 0.91304348
 0.86758893 0.82213439 0.9555336  0.91106719]

mean value: 0.8936758893280633

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.7037037  0.91666667 0.81481481 0.79166667 0.84615385
 0.76923077 0.69230769 0.91304348 0.83333333]

mean value: 0.8114254304471695

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01040983 0.01041746 0.01033044 0.01143312 0.01089478 0.01075268
 0.01148319 0.01071334 0.01060081 0.01162791]

mean value: 0.010866355895996094

key: score_time
value: [0.00902987 0.00913858 0.00908208 0.00915885 0.00940609 0.00922942
 0.00991488 0.0096333  0.00902796 0.00981998]

mean value: 0.009344100952148438

key: test_mcc
value: [0.46930785 0.51185771 0.82506438 0.60000118 0.43557241 0.37774032
 0.24655092 0.60000118 0.33824342 0.56604076]

mean value: 0.4970380107838396

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73333333 0.75555556 0.91111111 0.8        0.71111111 0.68888889
 0.62222222 0.8        0.66666667 0.77777778]

mean value: 0.7466666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.75555556 0.91666667 0.80851064 0.68292683 0.66666667
 0.56410256 0.79069767 0.61538462 0.79166667]

mean value: 0.7319450604300232

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.76190476 0.77272727 0.88       0.79166667 0.77777778 0.7
 0.64705882 0.80952381 0.70588235 0.73076923]

mean value: 0.7577310695840107

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.69565217 0.73913043 0.95652174 0.82608696 0.60869565 0.63636364
 0.5        0.77272727 0.54545455 0.86363636]

mean value: 0.7144268774703557

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73418972 0.75592885 0.91007905 0.79940711 0.71343874 0.68774704
 0.61956522 0.79940711 0.66403162 0.77964427]

mean value: 0.7463438735177865

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57142857 0.60714286 0.84615385 0.67857143 0.51851852 0.5
 0.39285714 0.65384615 0.44444444 0.65517241]

mean value: 0.5868135376756066

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.58

Accuracy on Blind test: 0.79

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.74629807 1.81791091 1.8328793  1.83209157 1.79719567 1.78587461
 1.83592582 1.77011418 1.84957242 1.9802525 ]

mean value: 1.8248115062713623

key: score_time
value: [0.09639883 0.09250951 0.13806105 0.10098362 0.12009382 0.10071588
 0.10475492 0.10313916 0.1076386  0.10528183]

mean value: 0.10695772171020508

key: test_mcc
value: [0.86758893 0.91106719 0.86732843 0.95643752 0.82574419 0.95652174
 0.82213439 0.77821935 1.         0.95643752]

mean value: 0.894147926437764

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.95555556 0.93333333 0.97777778 0.91111111 0.97777778
 0.91111111 0.88888889 1.         0.97777778]

mean value: 0.9466666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.95652174 0.93617021 0.9787234  0.90909091 0.97777778
 0.90909091 0.88372093 1.         0.97674419]

mean value: 0.946117340172371

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95454545 0.95652174 0.91666667 0.95833333 0.95238095 0.95652174
 0.90909091 0.9047619  1.         1.        ]

mean value: 0.950882269904009

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 0.95652174 1.         0.86956522 1.
 0.90909091 0.86363636 1.         0.95454545]

mean value: 0.9422924901185771

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93379447 0.9555336  0.93280632 0.97727273 0.91205534 0.97826087
 0.91106719 0.88833992 1.         0.97727273]

mean value: 0.9466403162055336

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.91666667 0.88       0.95833333 0.83333333 0.95652174
 0.83333333 0.79166667 1.         0.95454545]

mean value: 0.8999400527009223

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [1.19596362 1.18671298 1.15252948 1.52557898 1.15283728 0.99552059
 2.06139922 0.94499493 1.02960014 0.97231007]

mean value: 1.2217447280883789

key: score_time
value: [0.16833639 0.15999269 0.18151951 0.1587429  0.12559843 0.14782834
 0.18785882 0.23321199 0.17717195 0.24588728]

mean value: 0.17861483097076417

key: test_mcc
value: [0.86758893 0.82213439 0.86732843 0.95643752 0.82574419 0.95652174
 0.82574419 0.77821935 1.         0.91452919]

mean value: 0.8814247933518943

key: train_mcc
value: [0.96049359 0.95061698 0.94568955 0.94078482 0.95556639 0.94569087
 0.95066455 0.96544324 0.94078771 0.94078771]

mean value: 0.9496525422288566

key: test_accuracy
value: [0.93333333 0.91111111 0.93333333 0.97777778 0.91111111 0.97777778
 0.91111111 0.88888889 1.         0.95555556]

mean value: 0.94

key: train_accuracy
value: [0.98024691 0.97530864 0.97283951 0.97037037 0.97777778 0.97283951
 0.97530864 0.98271605 0.97037037 0.97037037]

mean value: 0.9748148148148148

key: test_fscore
value: [0.93333333 0.91304348 0.93617021 0.9787234  0.90909091 0.97777778
 0.91304348 0.88372093 1.         0.95238095]

mean value: 0.9397284476358546

key: train_fscore
value: [0.98019802 0.97524752 0.97270471 0.97014925 0.97766749 0.97283951
 0.97524752 0.98280098 0.97029703 0.97029703]

mean value: 0.9747449079854762

key: test_precision
value: [0.95454545 0.91304348 0.91666667 0.95833333 0.95238095 0.95652174
 0.875      0.9047619  1.         1.        ]

mean value: 0.9431253529079616

key: train_precision
value: [0.98019802 0.97524752 0.97512438 0.975      0.9800995  0.97524752
 0.9800995  0.98039216 0.97512438 0.97512438]

mean value: 0.9771657365473159

key: test_recall
value: [0.91304348 0.91304348 0.95652174 1.         0.86956522 1.
 0.95454545 0.86363636 1.         0.90909091]

mean value: 0.9379446640316206

key: train_recall
value: [0.98019802 0.97524752 0.97029703 0.96534653 0.97524752 0.97044335
 0.97044335 0.98522167 0.96551724 0.96551724]

mean value: 0.9723479490806224

key: test_roc_auc
value: [0.93379447 0.91106719 0.93280632 0.97727273 0.91205534 0.97826087
 0.91205534 0.88833992 1.         0.95454545]

mean value: 0.9400197628458498

key: train_roc_auc
value: [0.98024679 0.97530849 0.97283324 0.970358   0.97777155 0.97284544
 0.97532068 0.98270985 0.97038238 0.97038238]

mean value: 0.9748158806028386

key: test_jcc
value: [0.875      0.84       0.88       0.95833333 0.83333333 0.95652174
 0.84       0.79166667 1.         0.90909091]

mean value: 0.8883945981554677

key: train_jcc
value: [0.96116505 0.95169082 0.9468599  0.94202899 0.95631068 0.94711538
 0.95169082 0.96618357 0.94230769 0.94230769]

mean value: 0.9507660603666303

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02524042 0.02612805 0.02330494 0.02426648 0.0341692  0.03412199
 0.02433777 0.02824688 0.03730392 0.03393507]

mean value: 0.029105472564697265

key: score_time
value: [0.0212667  0.03293991 0.02683496 0.03234029 0.0220933  0.02196932
 0.02211404 0.02257228 0.02409673 0.02418518]

mean value: 0.02504127025604248

key: test_mcc
value: [0.74605372 0.4229249  0.68972332 0.69404997 0.78530224 0.78530224
 0.55841694 0.64426877 0.73559956 0.60637261]

mean value: 0.666801427789491

key: train_mcc
value: [0.7385111  0.69500224 0.73847923 0.75849711 0.72839898 0.76296152
 0.77288136 0.7777832  0.72358281 0.73337398]

mean value: 0.7429471510582903

key: test_accuracy
value: [0.86666667 0.71111111 0.84444444 0.84444444 0.88888889 0.88888889
 0.77777778 0.82222222 0.86666667 0.8       ]

mean value: 0.8311111111111111

key: train_accuracy
value: [0.8691358  0.84691358 0.8691358  0.87901235 0.86419753 0.88148148
 0.88641975 0.88888889 0.8617284  0.86666667]

mean value: 0.871358024691358

key: test_fscore
value: [0.85714286 0.71111111 0.84444444 0.85714286 0.88372093 0.89361702
 0.7826087  0.81818182 0.85714286 0.80851064]

mean value: 0.8313623230625146

key: train_fscore
value: [0.87041565 0.84183673 0.86716792 0.88077859 0.86352357 0.8817734
 0.88613861 0.88943489 0.86341463 0.86633663]

mean value: 0.8710820634544677

key: test_precision
value: [0.94736842 0.72727273 0.86363636 0.80769231 0.95       0.84
 0.75       0.81818182 0.9        0.76      ]

mean value: 0.8364151637835848

key: train_precision
value: [0.85990338 0.86842105 0.87817259 0.86602871 0.86567164 0.8817734
 0.89054726 0.8872549  0.85507246 0.87064677]

mean value: 0.8723492167626019

key: test_recall
value: [0.7826087  0.69565217 0.82608696 0.91304348 0.82608696 0.95454545
 0.81818182 0.81818182 0.81818182 0.86363636]

mean value: 0.8316205533596838

key: train_recall
value: [0.88118812 0.81683168 0.85643564 0.8960396  0.86138614 0.8817734
 0.8817734  0.89162562 0.87192118 0.86206897]

mean value: 0.8701043749695166

key: test_roc_auc
value: [0.86857708 0.71146245 0.84486166 0.84288538 0.89031621 0.89031621
 0.77865613 0.82213439 0.86561265 0.8013834 ]

mean value: 0.8316205533596839

key: train_roc_auc
value: [0.86916549 0.84683949 0.86910452 0.87905428 0.86419061 0.88148076
 0.88643125 0.88888211 0.86170317 0.86667805]

mean value: 0.8713529727356972

key: test_jcc
value: [0.75       0.55172414 0.73076923 0.75       0.79166667 0.80769231
 0.64285714 0.69230769 0.75       0.67857143]

mean value: 0.7145588606795503

key: train_jcc
value: [0.77056277 0.72687225 0.76548673 0.78695652 0.75982533 0.78854626
 0.79555556 0.80088496 0.75965665 0.76419214]

mean value: 0.7718539151085453

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [5.04259777 5.1556654  4.94334316 4.85878825 4.41032028 4.46716976
 1.56981826 3.87288809 5.31235576 4.7354219 ]

mean value: 4.436836862564087

key: score_time
value: [0.02622247 0.01853395 0.02413154 0.02783108 0.02140975 0.02586436
 0.0147388  0.01681423 0.02183342 0.02106977]

mean value: 0.021844935417175294

key: test_mcc
value: [0.82213439 0.91106719 0.95643752 1.         0.86758893 0.91485328
 0.95652174 0.77821935 1.         0.95643752]

mean value: 0.9163259916823262

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.95555556 0.97777778 1.         0.93333333 0.95555556
 0.97777778 0.88888889 1.         0.97777778]

mean value: 0.9577777777777777

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91304348 0.95652174 0.9787234  1.         0.93333333 0.95652174
 0.97777778 0.88372093 1.         0.97674419]

mean value: 0.9576386588167239

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.95652174 0.95833333 1.         0.95454545 0.91666667
 0.95652174 0.9047619  1.         1.        ]

mean value: 0.9560394315829098

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 1.         1.         0.91304348 1.
 1.         0.86363636 1.         0.95454545]

mean value: 0.9600790513833992

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91106719 0.9555336  0.97727273 1.         0.93379447 0.95652174
 0.97826087 0.88833992 1.         0.97727273]

mean value: 0.957806324110672

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84       0.91666667 0.95833333 1.         0.875      0.91666667
 0.95652174 0.79166667 1.         0.95454545]

mean value: 0.9209400527009223

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.0814774  0.10963106 0.09837937 0.11255074 0.09978104 0.06058836
 0.06621599 0.05549955 0.05097103 0.07600141]

mean value: 0.08110959529876709

key: score_time
value: [0.03861284 0.04575562 0.03341079 0.03420973 0.02950549 0.01280618
 0.02309275 0.01282167 0.01282978 0.02283335]

mean value: 0.026587820053100585

key: test_mcc
value: [0.86758893 0.69404997 0.82213439 0.69404997 0.82213439 0.64752602
 0.73663511 0.69404997 0.82213439 0.73559956]

mean value: 0.7535902709223832

key: train_mcc
value: [0.92103402 0.91129269 0.91111057 0.92103017 0.93581427 0.92117074
 0.91605902 0.93126766 0.92602981 0.89630533]

mean value: 0.9191114276703818

key: test_accuracy
value: [0.93333333 0.84444444 0.91111111 0.84444444 0.91111111 0.82222222
 0.86666667 0.84444444 0.91111111 0.86666667]

mean value: 0.8755555555555555

key: train_accuracy
value: [0.96049383 0.95555556 0.95555556 0.96049383 0.96790123 0.96049383
 0.95802469 0.9654321  0.96296296 0.94814815]

mean value: 0.9595061728395062

key: test_fscore
value: [0.93333333 0.85714286 0.91304348 0.85714286 0.91304348 0.82608696
 0.86956522 0.82926829 0.90909091 0.85714286]

mean value: 0.8764860236970523

key: train_fscore
value: [0.96059113 0.95588235 0.95544554 0.960199   0.96790123 0.960199
 0.95823096 0.96601942 0.96277916 0.94840295]

mean value: 0.9595650755455886

key: test_precision
value: [0.95454545 0.80769231 0.91304348 0.80769231 0.91304348 0.79166667
 0.83333333 0.89473684 0.90909091 0.9       ]

mean value: 0.8724844777647981

key: train_precision
value: [0.95588235 0.94660194 0.95544554 0.965      0.96551724 0.96984925
 0.95588235 0.95215311 0.97       0.94607843]

mean value: 0.9582410221215243

key: test_recall
value: [0.91304348 0.91304348 0.91304348 0.91304348 0.91304348 0.86363636
 0.90909091 0.77272727 0.90909091 0.81818182]

mean value: 0.8837944664031621

key: train_recall
value: [0.96534653 0.96534653 0.95544554 0.95544554 0.97029703 0.95073892
 0.96059113 0.98029557 0.95566502 0.95073892]

mean value: 0.9609910744769058

key: test_roc_auc
value: [0.93379447 0.84288538 0.91106719 0.84288538 0.91106719 0.82312253
 0.86758893 0.84288538 0.91106719 0.86561265]

mean value: 0.875197628458498

key: train_roc_auc
value: [0.96050578 0.95557967 0.95555528 0.96048139 0.96790714 0.96051797
 0.95801834 0.96539531 0.96298103 0.94814174]

mean value: 0.9595083646295663

key: test_jcc
value: [0.875      0.75       0.84       0.75       0.84       0.7037037
 0.76923077 0.70833333 0.83333333 0.75      ]

mean value: 0.781960113960114

key: train_jcc
value: [0.92417062 0.91549296 0.91469194 0.92344498 0.93779904 0.92344498
 0.91981132 0.9342723  0.92822967 0.90186916]

mean value: 0.9223226957377971

MCC on Blind test: 0.72

Accuracy on Blind test: 0.86

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02436852 0.01083708 0.01080036 0.01135302 0.01035428 0.01042557
 0.01047254 0.01276612 0.01002574 0.01004267]

mean value: 0.0121445894241333

key: score_time
value: [0.01228738 0.009691   0.00953341 0.00930977 0.00900745 0.00928497
 0.00902605 0.01075983 0.00912833 0.00952101]

mean value: 0.00975492000579834

key: test_mcc
value: [0.78530224 0.46640316 0.82213439 0.82213439 0.78530224 0.77865613
 0.55841694 0.64752602 0.79670588 0.60000118]

mean value: 0.7062582554009066

key: train_mcc
value: [0.72859901 0.7001606  0.67485592 0.74815266 0.71871879 0.75811526
 0.71448494 0.76814813 0.73836061 0.71961678]

mean value: 0.726921270950553

key: test_accuracy
value: [0.88888889 0.73333333 0.91111111 0.91111111 0.88888889 0.88888889
 0.77777778 0.82222222 0.88888889 0.8       ]

mean value: 0.851111111111111

key: train_accuracy
value: [0.86419753 0.84938272 0.83703704 0.87407407 0.85925926 0.87901235
 0.85679012 0.88395062 0.8691358  0.85925926]

mean value: 0.8632098765432099

key: test_fscore
value: [0.88372093 0.73913043 0.91304348 0.91304348 0.88372093 0.88888889
 0.7826087  0.82608696 0.87179487 0.79069767]

mean value: 0.8492736339045742

key: train_fscore
value: [0.86215539 0.84398977 0.83248731 0.87344913 0.85714286 0.87841191
 0.85353535 0.88279302 0.86848635 0.8556962 ]

mean value: 0.8608147293143978

key: test_precision
value: [0.95       0.73913043 0.91304348 0.91304348 0.95       0.86956522
 0.75       0.79166667 1.         0.80952381]

mean value: 0.8685973084886128

key: train_precision
value: [0.87309645 0.87301587 0.85416667 0.87562189 0.8680203  0.885
 0.87564767 0.89393939 0.875      0.88020833]

mean value: 0.8753716577165349

key: test_recall
value: [0.82608696 0.73913043 0.91304348 0.91304348 0.82608696 0.90909091
 0.81818182 0.86363636 0.77272727 0.77272727]

mean value: 0.8353754940711462

key: train_recall
value: [0.85148515 0.81683168 0.81188119 0.87128713 0.84653465 0.87192118
 0.83251232 0.87192118 0.86206897 0.83251232]

mean value: 0.8468955762571331

key: test_roc_auc
value: [0.89031621 0.73320158 0.91106719 0.91106719 0.89031621 0.88932806
 0.77865613 0.82312253 0.88636364 0.79940711]

mean value: 0.8512845849802372

key: train_roc_auc
value: [0.86416622 0.84930254 0.83697508 0.87406721 0.85922792 0.8790299
 0.85685022 0.88398039 0.86915329 0.85932546]

mean value: 0.8632078232453787

key: test_jcc
value: [0.79166667 0.5862069  0.84       0.84       0.79166667 0.8
 0.64285714 0.7037037  0.77272727 0.65384615]

mean value: 0.7422674503019331

key: train_jcc
value: [0.75770925 0.7300885  0.71304348 0.7753304  0.75       0.78318584
 0.74449339 0.79017857 0.76754386 0.74778761]

mean value: 0.7559360895888796

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01528907 0.02077127 0.01836395 0.01844668 0.01875925 0.02194405
 0.01897907 0.02040768 0.02031064 0.02082705]

mean value: 0.019409871101379393

key: score_time
value: [0.00912023 0.01180029 0.01194692 0.01217914 0.01215005 0.01223612
 0.01204348 0.01269436 0.01220083 0.01210046]

mean value: 0.011847186088562011

key: test_mcc
value: [0.78530224 0.64752602 0.86732843 0.73320158 0.59725988 0.78405645
 0.70780516 0.64752602 0.82213439 0.70501339]

mean value: 0.7297153566397614

key: train_mcc
value: [0.86377146 0.84895551 0.82265468 0.87431362 0.80684222 0.81706101
 0.86843671 0.82696893 0.88164702 0.87837337]

mean value: 0.8489024517724476

key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.86666667 0.77777778 0.88888889
 0.84444444 0.82222222 0.91111111 0.84444444]

mean value: 0.86

key: train_accuracy
value: [0.9308642  0.92098765 0.90864198 0.93580247 0.8962963  0.90123457
 0.93333333 0.90864198 0.94074074 0.9382716 ]

mean value: 0.9214814814814815

key: test_fscore
value: [0.88372093 0.81818182 0.93617021 0.86956522 0.73684211 0.87804878
 0.85714286 0.82608696 0.90909091 0.82051282]

mean value: 0.8535362607590927

key: train_fscore
value: [0.92820513 0.91534392 0.91334895 0.93298969 0.8852459  0.89130435
 0.93556086 0.91533181 0.94146341 0.93638677]

mean value: 0.9195180779922804

key: test_precision
value: [0.95       0.85714286 0.91666667 0.86956522 0.93333333 0.94736842
 0.77777778 0.79166667 0.90909091 0.94117647]

mean value: 0.8893788319710382

key: train_precision
value: [0.96276596 0.98295455 0.86666667 0.97311828 0.98780488 0.99393939
 0.90740741 0.85470085 0.93236715 0.96842105]

mean value: 0.9430146185624383

key: test_recall
value: [0.82608696 0.7826087  0.95652174 0.86956522 0.60869565 0.81818182
 0.95454545 0.86363636 0.90909091 0.72727273]

mean value: 0.8316205533596838

key: train_recall
value: [0.8960396  0.85643564 0.96534653 0.8960396  0.8019802  0.80788177
 0.96551724 0.98522167 0.95073892 0.90640394]

mean value: 0.9031605130956446

key: test_roc_auc
value: [0.89031621 0.82312253 0.93280632 0.86660079 0.78162055 0.88735178
 0.84683794 0.82312253 0.91106719 0.84189723]

mean value: 0.8604743083003953

key: train_roc_auc
value: [0.93077842 0.92082866 0.90878164 0.93570453 0.89606399 0.90146564
 0.93325367 0.90845242 0.94071599 0.93835049]

mean value: 0.9214395454323757

key: test_jcc
value: [0.79166667 0.69230769 0.88       0.76923077 0.58333333 0.7826087
 0.75       0.7037037  0.83333333 0.69565217]

mean value: 0.7481836368140716

key: train_jcc
value: [0.86602871 0.84390244 0.84051724 0.87439614 0.79411765 0.80392157
 0.87892377 0.84388186 0.88940092 0.88038278]

mean value: 0.8515473059624479

MCC on Blind test: 0.75

Accuracy on Blind test: 0.87

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.023592   0.01849461 0.01916718 0.02027512 0.01857185 0.01852775
 0.01686931 0.01843333 0.02005339 0.01996136]

mean value: 0.019394588470458985

key: score_time
value: [0.00936127 0.01204491 0.01203346 0.01205182 0.01208186 0.01245189
 0.01205373 0.01204967 0.01202798 0.01207948]

mean value: 0.011823606491088868

key: test_mcc
value: [0.76206649 0.38361073 0.69583743 0.73320158 0.73320158 0.87476705
 0.70780516 0.60637261 0.72645449 0.82574419]

mean value: 0.7049061319646861

key: train_mcc
value: [0.71591321 0.28354195 0.87676217 0.89347743 0.88642848 0.89639025
 0.81462126 0.75056333 0.63579921 0.85520525]

mean value: 0.760870254618254

key: test_accuracy
value: [0.86666667 0.62222222 0.84444444 0.86666667 0.86666667 0.93333333
 0.84444444 0.8        0.84444444 0.91111111]

mean value: 0.84

key: train_accuracy
value: [0.84197531 0.57530864 0.93580247 0.94567901 0.94320988 0.94814815
 0.9037037  0.86419753 0.79012346 0.92592593]

mean value: 0.8674074074074074

key: test_fscore
value: [0.85       0.4137931  0.8372093  0.86956522 0.86956522 0.93617021
 0.85714286 0.80851064 0.8627451  0.91304348]

mean value: 0.8217745125063238

key: train_fscore
value: [0.81395349 0.25862069 0.93193717 0.94358974 0.94292804 0.94865526
 0.90993072 0.87912088 0.82617587 0.92924528]

mean value: 0.8384157138013564

key: test_precision
value: [1.         1.         0.9        0.86956522 0.86956522 0.88
 0.77777778 0.76       0.75862069 0.875     ]

mean value: 0.8690528902215559

key: train_precision
value: [0.98591549 1.         0.98888889 0.9787234  0.94527363 0.94174757
 0.85652174 0.79365079 0.70629371 0.89140271]

mean value: 0.9088417944765346

key: test_recall
value: [0.73913043 0.26086957 0.7826087  0.86956522 0.86956522 1.
 0.95454545 0.86363636 1.         0.95454545]

mean value: 0.8294466403162055

key: train_recall
value: [0.69306931 0.14851485 0.88118812 0.91089109 0.94059406 0.95566502
 0.97044335 0.98522167 0.99507389 0.97044335]

mean value: 0.8451104716382969

key: test_roc_auc
value: [0.86956522 0.63043478 0.8458498  0.86660079 0.86660079 0.93478261
 0.84683794 0.8013834  0.84782609 0.91205534]

mean value: 0.842193675889328

key: train_roc_auc
value: [0.84160855 0.57425743 0.93566795 0.94559333 0.94320343 0.94812954
 0.90353851 0.86389797 0.78961615 0.92581573]

mean value: 0.8671328586060576

key: test_jcc
value: [0.73913043 0.26086957 0.72       0.76923077 0.76923077 0.88
 0.75       0.67857143 0.75862069 0.84      ]

mean value: 0.7165653656688139

key: train_jcc
value: [0.68627451 0.14851485 0.87254902 0.89320388 0.89201878 0.90232558
 0.83474576 0.78431373 0.70383275 0.86784141]

mean value: 0.7585620275637062

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.18216228 0.1695621  0.17470765 0.17133498 0.17510653 0.1782279
 0.17793107 0.173594   0.16895962 0.1765089 ]

mean value: 0.17480950355529784

key: score_time
value: [0.01529431 0.01603556 0.01582551 0.01555681 0.01668692 0.01690388
 0.01652122 0.0161798  0.01550436 0.01662135]

mean value: 0.016112971305847167

key: test_mcc
value: [0.86758893 0.82213439 0.95643752 1.         0.86758893 0.87476705
 0.95652174 0.86732843 1.         1.        ]

mean value: 0.9212366993733405

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.91111111 0.97777778 1.         0.93333333 0.93333333
 0.97777778 0.93333333 1.         1.        ]

mean value: 0.96

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.91304348 0.9787234  1.         0.93333333 0.93617021
 0.97777778 0.93023256 1.         1.        ]

mean value: 0.9602614097866126

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95454545 0.91304348 0.95833333 1.         0.95454545 0.88
 0.95652174 0.95238095 1.         1.        ]

mean value: 0.95693704121965

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.91304348 1.         1.         0.91304348 1.
 1.         0.90909091 1.         1.        ]

mean value: 0.9648221343873518

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93379447 0.91106719 0.97727273 1.         0.93379447 0.93478261
 0.97826087 0.93280632 1.         1.        ]

mean value: 0.9601778656126483

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.84       0.95833333 1.         0.875      0.88
 0.95652174 0.86956522 1.         1.        ]

mean value: 0.9254420289855072

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05877137 0.06073856 0.05816627 0.06933212 0.05485106 0.0548265
 0.07156682 0.08174944 0.06895924 0.07406449]

mean value: 0.06530258655548096

key: score_time
value: [0.01848102 0.02689838 0.02581835 0.02910113 0.01847243 0.02071047
 0.03578973 0.02482224 0.03958559 0.02383661]

mean value: 0.02635159492492676

key: test_mcc
value: [0.82213439 0.91106719 0.91106719 1.         0.86758893 0.91485328
 0.91485328 0.82213439 1.         0.87406293]

mean value: 0.9037761587267465

key: train_mcc
value: [0.98024679 0.98519693 0.98519693 0.99017145 0.97560447 0.98029413
 0.98024679 0.99507389 0.98029509 0.98519729]

mean value: 0.9837523754632594

key: test_accuracy
value: [0.91111111 0.95555556 0.95555556 1.         0.93333333 0.95555556
 0.95555556 0.91111111 1.         0.93333333]

mean value: 0.9511111111111111

key: train_accuracy
value: [0.99012346 0.99259259 0.99259259 0.99506173 0.98765432 0.99012346
 0.99012346 0.99753086 0.99012346 0.99259259]

mean value: 0.9918518518518519

key: test_fscore
value: [0.91304348 0.95652174 0.95652174 1.         0.93333333 0.95652174
 0.95652174 0.90909091 1.         0.92682927]

mean value: 0.9508383945499534

key: train_fscore
value: [0.99009901 0.99255583 0.99255583 0.99502488 0.98746867 0.99019608
 0.99014778 0.99753086 0.99009901 0.99259259]

mean value: 0.9918270548106813

key: test_precision
value: [0.91304348 0.95652174 0.95652174 1.         0.95454545 0.91666667
 0.91666667 0.90909091 1.         1.        ]

mean value: 0.9523056653491436

key: train_precision
value: [0.99009901 0.99502488 0.99502488 1.         1.         0.98536585
 0.99014778 1.         0.99502488 0.9950495 ]

mean value: 0.9945736778626925

key: test_recall
value: [0.91304348 0.95652174 0.95652174 1.         0.91304348 1.
 1.         0.90909091 1.         0.86363636]

mean value: 0.9511857707509881

key: train_recall
value: [0.99009901 0.99009901 0.99009901 0.99009901 0.97524752 0.99507389
 0.99014778 0.99507389 0.98522167 0.99014778]

mean value: 0.9891308588986978

key: test_roc_auc
value: [0.91106719 0.9555336  0.9555336  1.         0.93379447 0.95652174
 0.95652174 0.91106719 1.         0.93181818]

mean value: 0.9511857707509882

key: train_roc_auc
value: [0.9901234  0.99258645 0.99258645 0.9950495  0.98762376 0.9901112
 0.9901234  0.99753695 0.99013559 0.99259864]

mean value: 0.9918475345071454

key: test_jcc
value: [0.84       0.91666667 0.91666667 1.         0.875      0.91666667
 0.91666667 0.83333333 1.         0.86363636]

mean value: 0.9078636363636363

key: train_jcc
value: [0.98039216 0.98522167 0.98522167 0.99009901 0.97524752 0.98058252
 0.9804878  0.99507389 0.98039216 0.98529412]

mean value: 0.9838012536555218

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.11244583 0.14630103 0.19385266 0.16284895 0.19661665 0.19063306
 0.19858932 0.12693048 0.16871762 0.17700362]

mean value: 0.16739392280578613

key: score_time
value: [0.04026365 0.02383375 0.02751327 0.02340198 0.03419042 0.0237093
 0.02365375 0.02253604 0.02923608 0.02369428]

mean value: 0.02720324993133545

key: test_mcc
value: [0.56604076 0.55841694 0.63358389 0.6133209  0.73663511 0.69156407
 0.37747036 0.55533597 0.687125   0.64613475]

mean value: 0.6065627735892382

key: train_mcc
value: [0.99017145 0.99017145 0.98529269 0.98529269 0.98529269 0.98529376
 0.99017193 0.99017193 0.99507389 0.99017193]

mean value: 0.9887104432367875

key: test_accuracy
value: [0.77777778 0.77777778 0.8        0.8        0.86666667 0.82222222
 0.68888889 0.77777778 0.82222222 0.82222222]

mean value: 0.7955555555555556

key: train_accuracy
value: [0.99506173 0.99506173 0.99259259 0.99259259 0.99259259 0.99259259
 0.99506173 0.99506173 0.99753086 0.99506173]

mean value: 0.994320987654321

key: test_fscore
value: [0.76190476 0.77272727 0.76923077 0.82352941 0.86363636 0.84615385
 0.68181818 0.77272727 0.77777778 0.80952381]

mean value: 0.7879029467264761

key: train_fscore
value: [0.99502488 0.99502488 0.9925187  0.9925187  0.9925187  0.99255583
 0.9950495  0.9950495  0.99753086 0.9950495 ]

mean value: 0.9942841071283992

key: test_precision
value: [0.84210526 0.80952381 0.9375     0.75       0.9047619  0.73333333
 0.68181818 0.77272727 1.         0.85      ]

mean value: 0.8281769765322397

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.69565217 0.73913043 0.65217391 0.91304348 0.82608696 1.
 0.68181818 0.77272727 0.63636364 0.77272727]

mean value: 0.7689723320158103

key: train_recall
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
 0.99014778 0.99014778 0.99507389 0.99014778]

mean value: 0.9886382480612593

key: test_roc_auc
value: [0.77964427 0.77865613 0.80335968 0.79743083 0.86758893 0.82608696
 0.68873518 0.77766798 0.81818182 0.82114625]

mean value: 0.7958498023715415

key: train_roc_auc
value: [0.9950495  0.9950495  0.99257426 0.99257426 0.99257426 0.99261084
 0.99507389 0.99507389 0.99753695 0.99507389]

mean value: 0.9943191240306297

key: test_jcc
value: [0.61538462 0.62962963 0.625      0.7        0.76       0.73333333
 0.51724138 0.62962963 0.63636364 0.68      ]

mean value: 0.652658222365119

key: train_jcc
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
 0.99014778 0.99014778 0.99507389 0.99014778]

mean value: 0.9886382480612593

MCC on Blind test: 0.59

Accuracy on Blind test: 0.79

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.66729116 0.64972425 0.65732384 0.6665225  0.65949416 0.65429211
 0.65473723 0.65853596 0.664325   0.65511894]

mean value: 0.6587365150451661

key: score_time
value: [0.01024175 0.00957489 0.00951242 0.00985265 0.00966835 0.00935936
 0.00952435 0.00940204 0.00994349 0.00998735]

mean value: 0.009706664085388183

key: test_mcc
value: [0.82213439 0.86732843 0.95643752 1.         0.82574419 0.91485328
 0.91485328 0.77821935 1.         0.95643752]

mean value: 0.9036007956373757

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.93333333 0.97777778 1.         0.91111111 0.95555556
 0.95555556 0.88888889 1.         0.97777778]

mean value: 0.9511111111111111

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91304348 0.93617021 0.9787234  1.         0.90909091 0.95652174
 0.95652174 0.88372093 1.         0.97674419]

mean value: 0.9510536598912994

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.91666667 0.95833333 1.         0.95238095 0.91666667
 0.91666667 0.9047619  1.         1.        ]

mean value: 0.947851966873706

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 1.         1.         0.86956522 1.
 1.         0.86363636 1.         0.95454545]

mean value: 0.9557312252964427

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91106719 0.93280632 0.97727273 1.         0.91205534 0.95652174
 0.95652174 0.88833992 1.         0.97727273]

mean value: 0.9511857707509881

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84       0.88       0.95833333 1.         0.83333333 0.91666667
 0.91666667 0.79166667 1.         0.95454545]

mean value: 0.9091212121212121

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.06929517 0.06151557 0.07104921 0.03313804 0.04500461 0.03204679
 0.03788662 0.03866816 0.03848624 0.04184365]

mean value: 0.04689340591430664

key: score_time
value: [0.01687455 0.02114534 0.02216387 0.01866746 0.01421213 0.0171454
 0.01545548 0.01302433 0.01309681 0.01861858]

mean value: 0.017040395736694337

key: test_mcc
value: [0.65604724 0.35497208 0.2540839  0.46640316 0.21191154 0.11393242
 0.33797818 0.15717365 0.4000988  0.06320859]

mean value: 0.3015809563232226

key: train_mcc
value: [0.95177249 0.72466772 0.97079432 0.90113034 0.8354634  0.62435788
 0.97541464 0.76507358 0.89222145 0.77839025]

mean value: 0.8419286071129487

key: test_accuracy
value: [0.82222222 0.66666667 0.62222222 0.73333333 0.6        0.55555556
 0.66666667 0.57777778 0.66666667 0.53333333]

mean value: 0.6444444444444444

key: train_accuracy
value: [0.97530864 0.84444444 0.98518519 0.94814815 0.91111111 0.78024691
 0.98765432 0.8691358  0.94320988 0.88395062]

mean value: 0.9128395061728395

key: test_fscore
value: [0.80952381 0.61538462 0.58536585 0.73913043 0.55       0.41176471
 0.68085106 0.48648649 0.51612903 0.4       ]

mean value: 0.5794636001806261

key: train_fscore
value: [0.97461929 0.81524927 0.98492462 0.94516971 0.90217391 0.7192429
 0.98777506 0.84985836 0.93994778 0.87399464]

mean value: 0.8992955544177024

key: test_precision
value: [0.89473684 0.75       0.66666667 0.73913043 0.64705882 0.58333333
 0.64       0.6        0.88888889 0.53846154]

mean value: 0.694827652776771

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 0.98058252 1.         1.         0.95882353]

mean value: 0.9939406053683609

key: test_recall
value: [0.73913043 0.52173913 0.52173913 0.73913043 0.47826087 0.31818182
 0.72727273 0.40909091 0.36363636 0.31818182]

mean value: 0.5136363636363637

key: train_recall
value: [0.95049505 0.68811881 0.97029703 0.8960396  0.82178218 0.56157635
 0.99507389 0.73891626 0.88669951 0.80295567]

mean value: 0.8311954348144174

key: test_roc_auc
value: [0.82411067 0.66996047 0.62450593 0.73320158 0.6027668  0.55039526
 0.66798419 0.57411067 0.66007905 0.52865613]

mean value: 0.6435770750988142

key: train_roc_auc
value: [0.97524752 0.84405941 0.98514851 0.9480198  0.91089109 0.78078818
 0.98763596 0.86945813 0.94334975 0.8841511 ]

mean value: 0.9128749451299809

key: test_jcc
value: [0.68       0.44444444 0.4137931  0.5862069  0.37931034 0.25925926
 0.51612903 0.32142857 0.34782609 0.25      ]

mean value: 0.41983977391744476

key: train_jcc
value: [0.95049505 0.68811881 0.97029703 0.8960396  0.82178218 0.56157635
 0.97584541 0.73891626 0.88669951 0.77619048]

mean value: 0.8265960678312423

MCC on Blind test: 0.49

Accuracy on Blind test: 0.74

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02726126 0.04403973 0.06417942 0.03625083 0.05212307 0.04398036
 0.05202127 0.05687022 0.04713082 0.05117822]

mean value: 0.04750351905822754

key: score_time
value: [0.02289844 0.02290368 0.03534818 0.02457857 0.02440238 0.01246762
 0.02006912 0.02470255 0.02361917 0.02284813]

mean value: 0.023383784294128417

key: test_mcc
value: [0.77865613 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613
 0.73663511 0.68911026 0.95652174 0.73559956]

mean value: 0.7876062675297144

key: train_mcc
value: [0.86211613 0.8717805  0.87664317 0.86667805 0.89175679 0.87164354
 0.871768   0.871768   0.86176621 0.85704185]

mean value: 0.8702962246613828

key: test_accuracy
value: [0.88888889 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889
 0.86666667 0.84444444 0.97777778 0.86666667]

mean value: 0.8933333333333333

key: train_accuracy
value: [0.9308642  0.93580247 0.9382716  0.93333333 0.94567901 0.93580247
 0.93580247 0.93580247 0.9308642  0.92839506]

mean value: 0.9350617283950617

key: test_fscore
value: [0.88888889 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889
 0.86956522 0.8372093  0.97777778 0.85714286]

mean value: 0.8931868862110025

key: train_fscore
value: [0.93170732 0.93627451 0.93857494 0.93333333 0.94634146 0.93627451
 0.93658537 0.93658537 0.93137255 0.92944039]

mean value: 0.9356489742025249

key: test_precision
value: [0.90909091 0.875      0.91666667 0.86956522 0.91304348 0.86956522
 0.83333333 0.85714286 0.95652174 0.9       ]

mean value: 0.8899929418407679

key: train_precision
value: [0.91826923 0.92718447 0.93170732 0.93103448 0.93269231 0.93170732
 0.92753623 0.92753623 0.92682927 0.91826923]

mean value: 0.9272766084215948

key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091
 0.90909091 0.81818182 1.         0.81818182]

mean value: 0.8976284584980238

key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.93564356 0.96039604 0.9408867
 0.94581281 0.94581281 0.93596059 0.9408867 ]

mean value: 0.9442032873238062

key: test_roc_auc
value: [0.88932806 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806
 0.86758893 0.84387352 0.97826087 0.86561265]

mean value: 0.8932806324110671

key: train_roc_auc
value: [0.93090036 0.93582646 0.93828952 0.93333902 0.94571526 0.93578988
 0.93577769 0.93577769 0.93085158 0.92836414]

mean value: 0.9350631614885626

key: test_jcc
value: [0.8        0.80769231 0.88       0.76923077 0.84       0.8
 0.76923077 0.72       0.95652174 0.75      ]

mean value: 0.8092675585284281

key: train_jcc
value: [0.87214612 0.88018433 0.88425926 0.875      0.89814815 0.88018433
 0.88073394 0.88073394 0.87155963 0.86818182]

mean value: 0.8791131530840937

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.45930314 0.3597424  0.36897445 0.41965199 0.49196625 0.47425413
 0.46620059 0.42436361 0.34532022 0.39895368]

mean value: 0.4208730459213257

key: score_time
value: [0.02300835 0.02374816 0.02298093 0.02300811 0.0216465  0.02457333
 0.01253843 0.0218122  0.02498055 0.02483153]

mean value: 0.0223128080368042

key: test_mcc
value: [0.77865613 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613
 0.73663511 0.64613475 0.95652174 0.73559956]

mean value: 0.7833087165986725

key: train_mcc
value: [0.86211613 0.8717805  0.87664317 0.86667805 0.93581427 0.87164354
 0.80250226 0.92620337 0.81237958 0.85704185]

mean value: 0.8682802726942564

key: test_accuracy
value: [0.88888889 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889
 0.86666667 0.82222222 0.97777778 0.86666667]

mean value: 0.8911111111111111

key: train_accuracy
value: [0.9308642  0.93580247 0.9382716  0.93333333 0.96790123 0.93580247
 0.90123457 0.96296296 0.90617284 0.92839506]

mean value: 0.9340740740740741

key: test_fscore
value: [0.88888889 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889
 0.86956522 0.80952381 0.97777778 0.85714286]

mean value: 0.8904183369308254

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.93170732 0.93627451 0.93857494 0.93333333 0.96790123 0.93627451
 0.90196078 0.96350365 0.90686275 0.92944039]

mean value: 0.9345833411498392

key: test_precision
value: [0.90909091 0.875      0.91666667 0.86956522 0.91304348 0.86956522
 0.83333333 0.85       0.95652174 0.9       ]

mean value: 0.8892786561264822

key: train_precision
value: [0.91826923 0.92718447 0.93170732 0.93103448 0.96551724 0.93170732
 0.89756098 0.95192308 0.90243902 0.91826923]

mean value: 0.9275612362765229

key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091
 0.90909091 0.77272727 1.         0.81818182]

mean value: 0.8930830039525691

key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.93564356 0.97029703 0.9408867
 0.90640394 0.97536946 0.91133005 0.9408867 ]

mean value: 0.9417451104716383

key: test_roc_auc
value: [0.88932806 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806
 0.86758893 0.82114625 0.97826087 0.86561265]

mean value: 0.8910079051383399

key: train_roc_auc
value: [0.93090036 0.93582646 0.93828952 0.93333902 0.96790714 0.93578988
 0.90122177 0.96293225 0.90616007 0.92836414]

mean value: 0.9340730624786616

key: test_jcc
value: [0.8        0.80769231 0.88       0.76923077 0.84       0.8
 0.76923077 0.68       0.95652174 0.75      ]

mean value: 0.8052675585284281

key: train_jcc
value: [0.87214612 0.88018433 0.88425926 0.875      0.93779904 0.88018433
 0.82142857 0.92957746 0.82959641 0.86818182]

mean value: 0.8778357351592567

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.04131389 0.09903002 0.04775429 0.06405687 0.17856431 0.06696653
 0.1108768  0.04235101 0.0490725  0.04135633]

mean value: 0.0741342544555664

key: score_time
value: [0.01273394 0.02248192 0.02041554 0.01248813 0.01281857 0.01917338
 0.01525044 0.01229382 0.01236773 0.01228189]

mean value: 0.015230536460876465

key: test_mcc
value: [0.77865613 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419
 0.69583743 0.68911026 0.95652174 0.82213439]

mean value: 0.7925409622270596

key: train_mcc
value: [0.86190245 0.8716498  0.86172755 0.86177295 0.87664317 0.85188889
 0.86188899 0.87680228 0.85679795 0.86176621]

mean value: 0.8642840257675023

key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111
 0.84444444 0.84444444 0.97777778 0.91111111]

mean value: 0.8955555555555555

key: train_accuracy
value: [0.9308642  0.93580247 0.9308642  0.9308642  0.9382716  0.92592593
 0.9308642  0.9382716  0.92839506 0.9308642 ]

mean value: 0.9320987654320988

key: test_fscore
value: [0.88888889 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348
 0.85106383 0.8372093  0.97777778 0.90909091]

mean value: 0.8966281710028886

key: train_fscore
value: [0.93137255 0.93596059 0.93069307 0.93103448 0.93857494 0.92647059
 0.93170732 0.93917275 0.92874693 0.93137255]

mean value: 0.9325105763259832

key: test_precision
value: [0.90909091 0.82608696 0.91666667 0.875      0.95454545 0.875
 0.8        0.85714286 0.95652174 0.90909091]

mean value: 0.887914549218897

key: train_precision
value: [0.9223301  0.93137255 0.93069307 0.92647059 0.93170732 0.92195122
 0.92270531 0.92788462 0.92647059 0.92682927]

mean value: 0.9268414626156831

key: test_recall
value: [0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
 0.90909091 0.81818182 1.         0.90909091]

mean value: 0.9069169960474308

key: train_recall
value: [0.94059406 0.94059406 0.93069307 0.93564356 0.94554455 0.93103448
 0.9408867  0.95073892 0.93103448 0.93596059]

mean value: 0.9382724479344486

key: test_roc_auc
value: [0.88932806 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534
 0.8458498  0.84387352 0.97826087 0.91106719]

mean value: 0.8957509881422925

key: train_roc_auc
value: [0.93088816 0.93581427 0.93086378 0.93087597 0.93828952 0.92591328
 0.93083939 0.93824075 0.92838853 0.93085158]

mean value: 0.9320965224601278

key: test_jcc
value: [0.8        0.7037037  0.88       0.80769231 0.875      0.84
 0.74074074 0.72       0.95652174 0.83333333]

mean value: 0.815699182460052

key: train_jcc
value: [0.87155963 0.87962963 0.87037037 0.87096774 0.88425926 0.8630137
 0.87214612 0.8853211  0.86697248 0.87155963]

mean value: 0.8735799662583039

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [2.01325941 1.78600645 1.50203204 1.71028256 1.41295767 1.62343645
 1.50840998 1.89245319 1.44183517 1.61304617]

mean value: 1.6503719091415405

key: score_time
value: [0.01474905 0.01609755 0.02074885 0.02179193 0.01591682 0.01258969
 0.01645231 0.02142692 0.01871753 0.01511955]

mean value: 0.017361021041870116

key: test_mcc
value: [0.82574419 0.68911026 0.86732843 0.77821935 0.82574419 0.82574419
 0.73663511 0.68911026 1.         0.77821935]

mean value: 0.8015855339402445

key: train_mcc
value: [0.8965753  0.89630533 0.89630786 0.89139819 0.90627515 0.82225691
 0.89630533 0.91128405 0.88164702 0.89152603]

mean value: 0.8889881173544825

key: test_accuracy
value: [0.91111111 0.84444444 0.93333333 0.88888889 0.91111111 0.91111111
 0.86666667 0.84444444 1.         0.88888889]

mean value: 0.9

key: train_accuracy
value: [0.94814815 0.94814815 0.94814815 0.94567901 0.95308642 0.91111111
 0.94814815 0.95555556 0.94074074 0.94567901]

mean value: 0.9444444444444444

key: test_fscore
value: [0.90909091 0.85106383 0.93617021 0.89361702 0.90909091 0.91304348
 0.86956522 0.8372093  1.         0.88372093]

mean value: 0.9002571810221919

key: train_fscore
value: [0.94865526 0.94789082 0.94814815 0.94527363 0.95331695 0.91176471
 0.94840295 0.95609756 0.94146341 0.94634146]

mean value: 0.9447354902197866

key: test_precision
value: [0.95238095 0.83333333 0.91666667 0.875      0.95238095 0.875
 0.83333333 0.85714286 1.         0.9047619 ]

mean value: 0.9

key: train_precision
value: [0.93719807 0.95024876 0.94581281 0.95       0.94634146 0.90731707
 0.94607843 0.9468599  0.93236715 0.93719807]

mean value: 0.939942172046439

key: test_recall
value: [0.86956522 0.86956522 0.95652174 0.91304348 0.86956522 0.95454545
 0.90909091 0.81818182 1.         0.86363636]

mean value: 0.9023715415019763

key: train_recall
value: [0.96039604 0.94554455 0.95049505 0.94059406 0.96039604 0.91625616
 0.95073892 0.96551724 0.95073892 0.95566502]

mean value: 0.9496341998731893

key: test_roc_auc
value: [0.91205534 0.84387352 0.93280632 0.88833992 0.91205534 0.91205534
 0.86758893 0.84387352 1.         0.88833992]

mean value: 0.900098814229249

key: train_roc_auc
value: [0.94817832 0.94814174 0.94815393 0.94566649 0.95310442 0.91109838
 0.94814174 0.9555309  0.94071599 0.94565429]

mean value: 0.944438618738721

key: test_jcc
value: [0.83333333 0.74074074 0.88       0.80769231 0.83333333 0.84
 0.76923077 0.72       1.         0.79166667]

mean value: 0.8215997150997151

key: train_jcc
value: [0.90232558 0.9009434  0.90140845 0.89622642 0.91079812 0.83783784
 0.90186916 0.91588785 0.88940092 0.89814815]

mean value: 0.8954845882476823

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01473141 0.01323748 0.01226425 0.01224113 0.01230812 0.01203942
 0.01149964 0.0120852  0.01192832 0.01190805]

mean value: 0.012424302101135255

key: score_time
value: [0.01256609 0.0105772  0.01097941 0.01110196 0.01087761 0.01021123
 0.0105443  0.01062822 0.01049519 0.01022792]

mean value: 0.010820913314819335

key: test_mcc
value: [0.77865613 0.60079051 0.60079051 0.65604724 0.70780516 0.73320158
 0.42993591 0.60000118 0.59109821 0.64613475]

mean value: 0.6344461181857074

key: train_mcc
value: [0.67799996 0.63062266 0.64891459 0.66881392 0.69328869 0.66530582
 0.64321841 0.71511705 0.67817152 0.66530582]

mean value: 0.6686758440784181

key: test_accuracy
value: [0.88888889 0.8        0.8        0.82222222 0.84444444 0.86666667
 0.71111111 0.8        0.77777778 0.82222222]

mean value: 0.8133333333333334

key: train_accuracy
value: [0.83703704 0.81234568 0.82222222 0.83209877 0.84444444 0.82962963
 0.81234568 0.85432099 0.83703704 0.82962963]

mean value: 0.8311111111111111

key: test_fscore
value: [0.88888889 0.8        0.8        0.80952381 0.82926829 0.86363636
 0.66666667 0.79069767 0.72222222 0.80952381]

mean value: 0.7980427727563293

key: train_fscore
value: [0.82722513 0.79787234 0.81052632 0.82105263 0.83464567 0.81794195
 0.7877095  0.84432718 0.828125   0.81794195]

mean value: 0.8187367666976243

key: test_precision
value: [0.90909091 0.81818182 0.81818182 0.89473684 0.94444444 0.86363636
 0.76470588 0.80952381 0.92857143 0.85      ]

mean value: 0.8601073316088796

key: train_precision
value: [0.87777778 0.86206897 0.86516854 0.87640449 0.88826816 0.88068182
 0.90967742 0.90909091 0.87845304 0.88068182]

mean value: 0.8828272936910883

key: test_recall
value: [0.86956522 0.7826087  0.7826087  0.73913043 0.73913043 0.86363636
 0.59090909 0.77272727 0.59090909 0.77272727]

mean value: 0.750395256916996

key: train_recall
value: [0.78217822 0.74257426 0.76237624 0.77227723 0.78712871 0.7635468
 0.69458128 0.78817734 0.78325123 0.7635468 ]

mean value: 0.7639638101741208

key: test_roc_auc
value: [0.88932806 0.80039526 0.80039526 0.82411067 0.84683794 0.86660079
 0.70849802 0.79940711 0.77371542 0.82114625]

mean value: 0.8130434782608695

key: train_roc_auc
value: [0.83690192 0.81217383 0.82207482 0.83195142 0.84430327 0.8297932
 0.81263718 0.85448471 0.83717017 0.8297932 ]

mean value: 0.8311283714578355

key: test_jcc
value: [0.8        0.66666667 0.66666667 0.68       0.70833333 0.76
 0.5        0.65384615 0.56521739 0.68      ]

mean value: 0.6680730211817169

key: train_jcc
value: [0.70535714 0.66371681 0.68141593 0.69642857 0.71621622 0.69196429
 0.64976959 0.73059361 0.70666667 0.69196429]

mean value: 0.6934093104519392

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01203966 0.01195383 0.01233935 0.01260471 0.01238465 0.01232862
 0.01189423 0.01254082 0.01260471 0.0126884 ]

mean value: 0.012337899208068848

key: score_time
value: [0.01046038 0.01066375 0.01018858 0.01057577 0.01045346 0.0106113
 0.01064634 0.01088023 0.01084447 0.01107097]

mean value: 0.010639524459838868

key: test_mcc
value: [0.70780516 0.4229249  0.68972332 0.73559956 0.78530224 0.69583743
 0.55841694 0.64426877 0.69404997 0.55841694]

mean value: 0.6492345229004394

key: train_mcc
value: [0.7284056  0.69072841 0.70964919 0.73836061 0.7234551  0.75324391
 0.75343373 0.76814813 0.70374345 0.72863208]

mean value: 0.7297800213531322

key: test_accuracy
value: [0.84444444 0.71111111 0.84444444 0.86666667 0.88888889 0.84444444
 0.77777778 0.82222222 0.84444444 0.77777778]

mean value: 0.8222222222222222

key: train_accuracy
value: [0.86419753 0.84444444 0.85432099 0.8691358  0.8617284  0.87654321
 0.87654321 0.88395062 0.85185185 0.86419753]

mean value: 0.8646913580246913

key: test_fscore
value: [0.82926829 0.71111111 0.84444444 0.875      0.88372093 0.85106383
 0.7826087  0.81818182 0.82926829 0.7826087 ]

mean value: 0.8207276110427367

key: train_fscore
value: [0.86419753 0.83804627 0.84987277 0.86977887 0.86138614 0.87562189
 0.875      0.88279302 0.85148515 0.86284289]

mean value: 0.8631024534573951

key: test_precision
value: [0.94444444 0.72727273 0.86363636 0.84       0.95       0.8
 0.75       0.81818182 0.89473684 0.75      ]

mean value: 0.8338272195640617

key: train_precision
value: [0.86206897 0.87165775 0.87434555 0.86341463 0.86138614 0.88442211
 0.88832487 0.89393939 0.85572139 0.87373737]

mean value: 0.8729018186387163

key: test_recall
value: [0.73913043 0.69565217 0.82608696 0.91304348 0.82608696 0.90909091
 0.81818182 0.81818182 0.77272727 0.81818182]

mean value: 0.8136363636363636

key: train_recall
value: [0.86633663 0.80693069 0.82673267 0.87623762 0.86138614 0.86699507
 0.86206897 0.87192118 0.84729064 0.85221675]

mean value: 0.8538116373213676

key: test_roc_auc
value: [0.84683794 0.71146245 0.84486166 0.86561265 0.89031621 0.8458498
 0.77865613 0.82213439 0.84288538 0.77865613]

mean value: 0.8227272727272728

key: train_roc_auc
value: [0.8642028  0.84435205 0.85425304 0.86915329 0.86172755 0.87656684
 0.87657904 0.88398039 0.85186314 0.86422719]

mean value: 0.864690533092718

key: test_jcc
value: [0.70833333 0.55172414 0.73076923 0.77777778 0.79166667 0.74074074
 0.64285714 0.69230769 0.70833333 0.64285714]

mean value: 0.6987367198574095

key: train_jcc
value: [0.76086957 0.72123894 0.73893805 0.76956522 0.75652174 0.77876106
 0.77777778 0.79017857 0.74137931 0.75877193]

mean value: 0.7594002164212214

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01207423 0.01183081 0.01203465 0.01215744 0.01158547 0.01139283
 0.01193786 0.01136422 0.01127291 0.01138806]

mean value: 0.011703848838806152

key: score_time
value: [0.01589179 0.01923251 0.01931143 0.01723266 0.01773953 0.01866388
 0.01476097 0.01722598 0.01753187 0.01763558]

mean value: 0.017522621154785156

key: test_mcc
value: [0.47603428 0.48698902 0.51185771 0.38799274 0.64752602 0.48698902
 0.42178301 0.33402405 0.58158    0.60000118]

mean value: 0.49347770092446475

key: train_mcc
value: [0.69385167 0.70422287 0.66913791 0.71410816 0.68986411 0.70498382
 0.70498382 0.6842722  0.68897398 0.70403264]

mean value: 0.6958431183577889

key: test_accuracy
value: [0.73333333 0.73333333 0.75555556 0.68888889 0.82222222 0.73333333
 0.71111111 0.66666667 0.75555556 0.8       ]

mean value: 0.74

key: train_accuracy
value: [0.84691358 0.85185185 0.8345679  0.85679012 0.84444444 0.85185185
 0.85185185 0.84197531 0.84444444 0.85185185]

mean value: 0.8476543209876544

key: test_fscore
value: [0.71428571 0.7        0.75555556 0.73076923 0.81818182 0.76
 0.69767442 0.63414634 0.66666667 0.79069767]

mean value: 0.7267977419945656

key: train_fscore
value: [0.84577114 0.84848485 0.8337469  0.85353535 0.83969466 0.84771574
 0.84771574 0.84       0.84367246 0.85      ]

mean value: 0.8450336829707287

key: test_precision
value: [0.78947368 0.82352941 0.77272727 0.65517241 0.85714286 0.67857143
 0.71428571 0.68421053 1.         0.80952381]

mean value: 0.7784637118335207

key: train_precision
value: [0.85       0.86597938 0.8358209  0.87113402 0.86387435 0.87434555
 0.87434555 0.85279188 0.85       0.86294416]

mean value: 0.8601235783219559

key: test_recall
value: [0.65217391 0.60869565 0.73913043 0.82608696 0.7826087  0.86363636
 0.68181818 0.59090909 0.5        0.77272727]

mean value: 0.7017786561264823

key: train_recall
value: [0.84158416 0.83168317 0.83168317 0.83663366 0.81683168 0.8226601
 0.8226601  0.82758621 0.83743842 0.83743842]

mean value: 0.8306199092815685

key: test_roc_auc
value: [0.73517787 0.73616601 0.75592885 0.68577075 0.82312253 0.73616601
 0.71047431 0.66501976 0.75       0.79940711]

mean value: 0.7397233201581027

key: train_roc_auc
value: [0.84690045 0.85180218 0.8345608  0.85674048 0.84437643 0.85192411
 0.85192411 0.84201093 0.84446179 0.85188753]

mean value: 0.8476588791884114

key: test_jcc
value: [0.55555556 0.53846154 0.60714286 0.57575758 0.69230769 0.61290323
 0.53571429 0.46428571 0.5        0.65384615]

mean value: 0.5735974598877824

key: train_jcc
value: [0.73275862 0.73684211 0.71489362 0.74449339 0.72368421 0.73568282
 0.73568282 0.72413793 0.72961373 0.73913043]

mean value: 0.731691968406008

MCC on Blind test: 0.41

Accuracy on Blind test: 0.71

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.02174282 0.02407169 0.01865649 0.01932096 0.01913285 0.01908374
 0.01905656 0.01899648 0.01942968 0.02316642]

mean value: 0.020265769958496094

key: score_time
value: [0.01160741 0.01241922 0.01122999 0.01308322 0.01144195 0.01153398
 0.01152825 0.01177454 0.01203108 0.01304173]

mean value: 0.01196913719177246

key: test_mcc
value: [0.73663511 0.68972332 0.86732843 0.77821935 0.86758893 0.82574419
 0.77865613 0.68911026 0.95652174 0.73320158]

mean value: 0.7922729043083154

key: train_mcc
value: [0.79762457 0.81737922 0.80251189 0.80246793 0.80766419 0.81237958
 0.80246793 0.81736586 0.79262493 0.80741373]

mean value: 0.8059899831841102

key: test_accuracy
value: [0.86666667 0.84444444 0.93333333 0.88888889 0.93333333 0.91111111
 0.88888889 0.84444444 0.97777778 0.86666667]

mean value: 0.8955555555555555

key: train_accuracy
value: [0.89876543 0.90864198 0.90123457 0.90123457 0.9037037  0.90617284
 0.90123457 0.90864198 0.8962963  0.9037037 ]

mean value: 0.902962962962963

key: test_fscore
value: [0.86363636 0.84444444 0.93617021 0.89361702 0.93333333 0.91304348
 0.88888889 0.8372093  0.97777778 0.86363636]

mean value: 0.8951757186346175

key: train_fscore
value: [0.8992629  0.90909091 0.90147783 0.9009901  0.90464548 0.90686275
 0.90147783 0.90953545 0.89705882 0.9041769 ]

mean value: 0.903457897428805

key: test_precision
value: [0.9047619  0.86363636 0.91666667 0.875      0.95454545 0.875
 0.86956522 0.85714286 0.95652174 0.86363636]

mean value: 0.893647656691135

key: train_precision
value: [0.89268293 0.90243902 0.89705882 0.9009901  0.89371981 0.90243902
 0.90147783 0.90291262 0.89268293 0.90196078]

mean value: 0.8988363869926886

key: test_recall
value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
 0.90909091 0.81818182 1.         0.86363636]

mean value: 0.8980237154150198

key: train_recall
value: [0.90594059 0.91584158 0.90594059 0.9009901  0.91584158 0.91133005
 0.90147783 0.91625616 0.90147783 0.90640394]

mean value: 0.9081500268253426

key: test_roc_auc
value: [0.86758893 0.84486166 0.93280632 0.88833992 0.93379447 0.91205534
 0.88932806 0.84387352 0.97826087 0.86660079]

mean value: 0.8957509881422925

key: train_roc_auc
value: [0.8987831  0.90865971 0.90124616 0.90123397 0.9037336  0.90616007
 0.90123397 0.90862313 0.89628347 0.90369702]

mean value: 0.9029654196946788

key: test_jcc
value: [0.76       0.73076923 0.88       0.80769231 0.875      0.84
 0.8        0.72       0.95652174 0.76      ]

mean value: 0.8129983277591973

key: train_jcc
value: [0.81696429 0.83333333 0.8206278  0.81981982 0.82589286 0.82959641
 0.8206278  0.83408072 0.81333333 0.82511211]

mean value: 0.8239388472392957

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.31762528 2.59691238 1.63772154 1.77554345 1.59011173 1.17077398
 2.11147809 1.44927478 0.70069718 1.59665656]

mean value: 1.5946794986724853

key: score_time
value: [0.01862407 0.02381897 0.01357055 0.01248121 0.02450323 0.02156973
 0.01986384 0.02018833 0.01420259 0.01296401]

mean value: 0.018178653717041016

key: test_mcc
value: [0.73663511 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419
 0.77821935 0.73559956 1.         0.69404997]

mean value: 0.7927653674534292

key: train_mcc
value: [0.83247548 0.8716498  0.82799641 0.8520244  0.83086317 0.79284035
 0.83012449 0.83313446 0.78773172 0.81956701]

mean value: 0.8278407283052743

key: test_accuracy
value: [0.86666667 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111
 0.88888889 0.86666667 1.         0.84444444]

mean value: 0.8955555555555555

key: train_accuracy
value: [0.91604938 0.93580247 0.91358025 0.92592593 0.91358025 0.89382716
 0.91358025 0.91604938 0.89382716 0.90864198]

mean value: 0.9130864197530864

key: test_fscore
value: [0.86363636 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348
 0.88372093 0.85714286 1.         0.82926829]

mean value: 0.89360194458532

key: train_fscore
value: [0.91707317 0.93596059 0.91525424 0.92647059 0.91725768 0.88772846
 0.91002571 0.91414141 0.89486553 0.90537084]

mean value: 0.9124148220877728

key: test_precision
value: [0.9047619  0.82608696 0.91666667 0.875      0.95454545 0.875
 0.9047619  0.9        1.         0.89473684]

mean value: 0.9051559729362934

key: train_precision
value: [0.90384615 0.93137255 0.8957346  0.91747573 0.87782805 0.94444444
 0.9516129  0.93782383 0.88834951 0.94148936]

mean value: 0.9189977140608518

key: test_recall
value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545
 0.86363636 0.81818182 1.         0.77272727]

mean value: 0.8843873517786561

key: train_recall
value: [0.93069307 0.94059406 0.93564356 0.93564356 0.96039604 0.83743842
 0.87192118 0.89162562 0.90147783 0.87192118]

mean value: 0.9077354533482905

key: test_roc_auc
value: [0.86758893 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534
 0.88833992 0.86561265 1.         0.84288538]

mean value: 0.8953557312252964

key: train_roc_auc
value: [0.91608545 0.93581427 0.91363459 0.92594986 0.91369556 0.89396674
 0.91368336 0.91610984 0.89380822 0.90873287]

mean value: 0.913148075891333

key: test_jcc
value: [0.76       0.7037037  0.88       0.80769231 0.875      0.84
 0.79166667 0.75       1.         0.70833333]

mean value: 0.8116396011396011

key: train_jcc
value: [0.84684685 0.87962963 0.84375    0.8630137  0.84716157 0.79812207
 0.83490566 0.84186047 0.80973451 0.8271028 ]

mean value: 0.8392127255393006

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.03762817 0.0222249  0.02336216 0.02422976 0.02328444 0.02246857
 0.02320361 0.02444267 0.02557874 0.02451849]

mean value: 0.025094151496887207

key: score_time
value: [0.01054001 0.01044989 0.0104568  0.01041985 0.01052427 0.01045942
 0.01032162 0.0105691  0.01049876 0.01033282]

mean value: 0.010457253456115723

key: test_mcc
value: [0.77865613 0.91106719 0.82506438 0.91106719 0.82213439 0.91485328
 0.95652174 0.77821935 0.95643752 0.95643752]

mean value: 0.8810458680630867

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88888889 0.95555556 0.91111111 0.95555556 0.91111111 0.95555556
 0.97777778 0.88888889 0.97777778 0.97777778]

mean value: 0.94

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.95652174 0.91666667 0.95652174 0.91304348 0.95652174
 0.97777778 0.88372093 0.97674419 0.97674419]

mean value: 0.9403151331311089

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 0.95652174 0.88       0.95652174 0.91304348 0.91666667
 0.95652174 0.9047619  1.         1.        ]

mean value: 0.9393128176171655

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86956522 0.95652174 0.95652174 0.95652174 0.91304348 1.
 1.         0.86363636 0.95454545 0.95454545]

mean value: 0.9424901185770751

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.88932806 0.9555336  0.91007905 0.9555336  0.91106719 0.95652174
 0.97826087 0.88833992 0.97727273 0.97727273]

mean value: 0.9399209486166008

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.91666667 0.84615385 0.91666667 0.84       0.91666667
 0.95652174 0.79166667 0.95454545 0.95454545]

mean value: 0.8893433161041857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.14324355 0.14098144 0.14159226 0.14177227 0.14036822 0.14066744
 0.13828254 0.1379981  0.14093471 0.14011884]

mean value: 0.14059593677520751

key: score_time
value: [0.02067876 0.02081728 0.02086139 0.02107191 0.02067494 0.0208261
 0.01972151 0.02075052 0.02083349 0.02075028]

mean value: 0.02069861888885498

key: test_mcc
value: [0.82574419 0.73320158 0.86758893 0.68911026 0.82574419 0.78530224
 0.78530224 0.64426877 0.86732843 0.82213439]

mean value: 0.784572523307409

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.86666667 0.93333333 0.84444444 0.91111111 0.88888889
 0.88888889 0.82222222 0.93333333 0.91111111]

mean value: 0.8911111111111111

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.86956522 0.93333333 0.85106383 0.90909091 0.89361702
 0.89361702 0.81818182 0.93023256 0.90909091]

mean value: 0.8916883526659143

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95238095 0.86956522 0.95454545 0.83333333 0.95238095 0.84
 0.84       0.81818182 0.95238095 0.90909091]

mean value: 0.8921859589685677

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86956522 0.86956522 0.91304348 0.86956522 0.86956522 0.95454545
 0.95454545 0.81818182 0.90909091 0.90909091]

mean value: 0.8936758893280632

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91205534 0.86660079 0.93379447 0.84387352 0.91205534 0.89031621
 0.89031621 0.82213439 0.93280632 0.91106719]

mean value: 0.8915019762845849

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.76923077 0.875      0.74074074 0.83333333 0.80769231
 0.80769231 0.69230769 0.86956522 0.83333333]

mean value: 0.8062229035055122

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01231742 0.01216841 0.0121696  0.01225901 0.01252937 0.01230884
 0.01246262 0.01228213 0.01244259 0.01268888]

mean value: 0.01236288547515869

key: score_time
value: [0.0103507  0.01036882 0.01041508 0.0105629  0.0106926  0.01041341
 0.01040411 0.0103581  0.01045394 0.01082134]

mean value: 0.010484099388122559

key: test_mcc
value: [0.43557241 0.68972332 0.73559956 0.51185771 0.43557241 0.38019877
 0.2903816  0.73663511 0.33824342 0.670374  ]

mean value: 0.5224158297692548

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71111111 0.84444444 0.86666667 0.75555556 0.71111111 0.68888889
 0.64444444 0.86666667 0.66666667 0.82222222]

mean value: 0.7577777777777778

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.68292683 0.84444444 0.875      0.75555556 0.68292683 0.69565217
 0.6        0.86956522 0.61538462 0.84      ]

mean value: 0.7461455665225548

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.86363636 0.84       0.77272727 0.77777778 0.66666667
 0.66666667 0.83333333 0.70588235 0.75      ]

mean value: 0.7654468211527035

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.60869565 0.82608696 0.91304348 0.73913043 0.60869565 0.72727273
 0.54545455 0.90909091 0.54545455 0.95454545]

mean value: 0.7377470355731225

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71343874 0.84486166 0.86561265 0.75592885 0.71343874 0.68972332
 0.64229249 0.86758893 0.66403162 0.82509881]

mean value: 0.758201581027668

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.51851852 0.73076923 0.77777778 0.60714286 0.51851852 0.53333333
 0.42857143 0.76923077 0.44444444 0.72413793]

mean value: 0.6052444809341361

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.55

Accuracy on Blind test: 0.77

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.9608767  1.95501304 2.26225162 1.84889507 2.05728984 1.85792685
 4.34983826 2.38245821 2.61548281 2.64022255]

mean value: 2.3930254936218263

key: score_time
value: [0.10743237 0.22150254 0.09776163 0.13473463 0.10169816 0.17076182
 0.25548291 0.19140983 0.15087056 0.12873602]

mean value: 0.15603904724121093

key: test_mcc
value: [0.86758893 0.91106719 0.86732843 0.95643752 0.82574419 0.95652174
 0.86758893 0.77821935 1.         0.95643752]

mean value: 0.8986933809832185

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.95555556 0.93333333 0.97777778 0.91111111 0.97777778
 0.93333333 0.88888889 1.         0.97777778]

mean value: 0.9488888888888889

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.95652174 0.93617021 0.9787234  0.90909091 0.97777778
 0.93333333 0.88372093 1.         0.97674419]

mean value: 0.9485415825966135

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95454545 0.95652174 0.91666667 0.95833333 0.95238095 0.95652174
 0.91304348 0.9047619  1.         1.        ]

mean value: 0.9512775268210051

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 0.95652174 1.         0.86956522 1.
 0.95454545 0.86363636 1.         0.95454545]

mean value: 0.9468379446640316

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93379447 0.9555336  0.93280632 0.97727273 0.91205534 0.97826087
 0.93379447 0.88833992 1.         0.97727273]

mean value: 0.9489130434782609

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.91666667 0.88       0.95833333 0.83333333 0.95652174
 0.875      0.79166667 1.         0.95454545]

mean value: 0.9041067193675889

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [1.02290845 1.19409299 1.70667768 1.77199078 1.73614359 2.20723319
 1.984725   1.91829991 1.85758781 1.00323558]

mean value: 1.6402894973754882

key: score_time
value: [0.15309381 0.17785645 0.17673326 0.22488499 0.2151401  0.18284273
 0.1778214  0.29034138 0.14019394 0.16655993]

mean value: 0.19054679870605468

key: test_mcc
value: [0.86758893 0.82213439 0.86732843 0.95643752 0.82574419 0.91106719
 0.86758893 0.77821935 1.         0.87406293]

mean value: 0.8770171874100532

key: train_mcc
value: [0.95556748 0.95061698 0.94568955 0.94078482 0.95556639 0.95066455
 0.95556748 0.97532008 0.94578446 0.93590713]

mean value: 0.951146893201595

key: test_accuracy
value: [0.93333333 0.91111111 0.93333333 0.97777778 0.91111111 0.95555556
 0.93333333 0.88888889 1.         0.93333333]

mean value: 0.9377777777777778

key: train_accuracy
value: [0.97777778 0.97530864 0.97283951 0.97037037 0.97777778 0.97530864
 0.97777778 0.98765432 0.97283951 0.96790123]

mean value: 0.9755555555555555

key: test_fscore
value: [0.93333333 0.91304348 0.93617021 0.9787234  0.90909091 0.95454545
 0.93333333 0.88372093 1.         0.92682927]

mean value: 0.9368790324110416

key: train_fscore
value: [0.97777778 0.97524752 0.97270471 0.97014925 0.97766749 0.97524752
 0.97777778 0.98771499 0.97270471 0.96774194]

mean value: 0.9754733705067631

key: test_precision
value: [0.95454545 0.91304348 0.91666667 0.95833333 0.95238095 0.95454545
 0.91304348 0.9047619  1.         1.        ]

mean value: 0.9467320722755506

key: train_precision
value: [0.97536946 0.97524752 0.97512438 0.975      0.9800995  0.9800995
 0.98019802 0.98529412 0.98       0.975     ]

mean value: 0.978143250341417

key: test_recall
value: [0.91304348 0.91304348 0.95652174 1.         0.86956522 0.95454545
 0.95454545 0.86363636 1.         0.86363636]

mean value: 0.9288537549407114

key: train_recall
value: [0.98019802 0.97524752 0.97029703 0.96534653 0.97524752 0.97044335
 0.97536946 0.99014778 0.96551724 0.96059113]

mean value: 0.9728405599180607

key: test_roc_auc
value: [0.93379447 0.91106719 0.93280632 0.97727273 0.91205534 0.9555336
 0.93379447 0.88833992 1.         0.93181818]

mean value: 0.9376482213438735

key: train_roc_auc
value: [0.97778374 0.97530849 0.97283324 0.970358   0.97777155 0.97532068
 0.97778374 0.98764815 0.97285763 0.96791933]

mean value: 0.9755584548602644

key: test_jcc
value: [0.875      0.84       0.88       0.95833333 0.83333333 0.91304348
 0.875      0.79166667 1.         0.86363636]

mean value: 0.8830013175230567

key: train_jcc
value: [0.95652174 0.95169082 0.9468599  0.94202899 0.95631068 0.95169082
 0.95652174 0.97572816 0.9468599  0.9375    ]

mean value: 0.9521712747994935

MCC on Blind test: 0.93

Accuracy on Blind test: 0.96

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02517962 0.0100472  0.01011395 0.01026773 0.00996494 0.01005149
 0.01018238 0.00989962 0.00991511 0.00988817]

mean value: 0.01155102252960205

key: score_time
value: [0.00965428 0.00895143 0.00905299 0.00892019 0.0089519  0.00888443
 0.00896573 0.0087378  0.00882101 0.00882554]

mean value: 0.00897653102874756

key: test_mcc
value: [0.70780516 0.4229249  0.68972332 0.73559956 0.78530224 0.69583743
 0.55841694 0.64426877 0.69404997 0.55841694]

mean value: 0.6492345229004394

key: train_mcc
value: [0.7284056  0.69072841 0.70964919 0.73836061 0.7234551  0.75324391
 0.75343373 0.76814813 0.70374345 0.72863208]

mean value: 0.7297800213531322

key: test_accuracy
value: [0.84444444 0.71111111 0.84444444 0.86666667 0.88888889 0.84444444
 0.77777778 0.82222222 0.84444444 0.77777778]

mean value: 0.8222222222222222

key: train_accuracy
value: [0.86419753 0.84444444 0.85432099 0.8691358  0.8617284  0.87654321
 0.87654321 0.88395062 0.85185185 0.86419753]

mean value: 0.8646913580246913

key: test_fscore
value: [0.82926829 0.71111111 0.84444444 0.875      0.88372093 0.85106383
 0.7826087  0.81818182 0.82926829 0.7826087 ]

mean value: 0.8207276110427367

key: train_fscore
value: [0.86419753 0.83804627 0.84987277 0.86977887 0.86138614 0.87562189
 0.875      0.88279302 0.85148515 0.86284289]

mean value: 0.8631024534573951

key: test_precision
value: [0.94444444 0.72727273 0.86363636 0.84       0.95       0.8
 0.75       0.81818182 0.89473684 0.75      ]

mean value: 0.8338272195640617

key: train_precision
value: [0.86206897 0.87165775 0.87434555 0.86341463 0.86138614 0.88442211
 0.88832487 0.89393939 0.85572139 0.87373737]

mean value: 0.8729018186387163

key: test_recall
value: [0.73913043 0.69565217 0.82608696 0.91304348 0.82608696 0.90909091
 0.81818182 0.81818182 0.77272727 0.81818182]

mean value: 0.8136363636363636

key: train_recall
value: [0.86633663 0.80693069 0.82673267 0.87623762 0.86138614 0.86699507
 0.86206897 0.87192118 0.84729064 0.85221675]

mean value: 0.8538116373213676

key: test_roc_auc
value: [0.84683794 0.71146245 0.84486166 0.86561265 0.89031621 0.8458498
 0.77865613 0.82213439 0.84288538 0.77865613]

mean value: 0.8227272727272728

key: train_roc_auc
value: [0.8642028  0.84435205 0.85425304 0.86915329 0.86172755 0.87656684
 0.87657904 0.88398039 0.85186314 0.86422719]

mean value: 0.864690533092718

key: test_jcc
value: [0.70833333 0.55172414 0.73076923 0.77777778 0.79166667 0.74074074
 0.64285714 0.69230769 0.70833333 0.64285714]

mean value: 0.6987367198574095

key: train_jcc
value: [0.76086957 0.72123894 0.73893805 0.76956522 0.75652174 0.77876106
 0.77777778 0.79017857 0.74137931 0.75877193]

mean value: 0.7594002164212214

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [1.47456765 1.54497075 1.56331825 1.58203959 1.53839636 1.51352477
 1.57174468 1.49755764 1.60622501 1.62361526]

mean value: 1.5515959978103637

key: score_time
value: [0.01256537 0.0133667  0.01274014 0.01221132 0.01370311 0.01287436
 0.01300812 0.01307845 0.01412868 0.01365328]

mean value: 0.013132953643798828

key: test_mcc
value: [0.82213439 0.91106719 0.95643752 1.         0.86758893 0.91485328
 0.95652174 0.77821935 1.         0.95643752]

mean value: 0.9163259916823262

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.95555556 0.97777778 1.         0.93333333 0.95555556
 0.97777778 0.88888889 1.         0.97777778]

mean value: 0.9577777777777777

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91304348 0.95652174 0.9787234  1.         0.93333333 0.95652174
 0.97777778 0.88372093 1.         0.97674419]

mean value: 0.9576386588167239

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.95652174 0.95833333 1.         0.95454545 0.91666667
 0.95652174 0.9047619  1.         1.        ]

mean value: 0.9560394315829098

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 1.         1.         0.91304348 1.
 1.         0.86363636 1.         0.95454545]

mean value: 0.9600790513833992

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91106719 0.9555336  0.97727273 1.         0.93379447 0.95652174
 0.97826087 0.88833992 1.         0.97727273]

mean value: 0.957806324110672

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84       0.91666667 0.95833333 1.         0.875      0.91666667
 0.95652174 0.79166667 1.         0.95454545]

mean value: 0.9209400527009223

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.12674332 0.10389161 0.06731343 0.08088517 0.083004   0.08519578
 0.10193396 0.09383464 0.2874248  0.05935693]

mean value: 0.10895836353302002

key: score_time
value: [0.02858162 0.0240407  0.0236733  0.03531265 0.02388859 0.0231142
 0.0168426  0.01304436 0.01560354 0.01269364]

mean value: 0.02167952060699463

key: test_mcc
value: [0.86758893 0.69404997 0.82213439 0.69404997 0.82213439 0.69583743
 0.73663511 0.69404997 0.82213439 0.73559956]

mean value: 0.7584214113139539

key: train_mcc
value: [0.91606106 0.91129269 0.91605902 0.92103017 0.93581427 0.92117074
 0.91605902 0.93126766 0.92602981 0.89139819]

mean value: 0.9186182632580657

key: test_accuracy
value: [0.93333333 0.84444444 0.91111111 0.84444444 0.91111111 0.84444444
 0.86666667 0.84444444 0.91111111 0.86666667]

mean value: 0.8777777777777778

key: train_accuracy
value: [0.95802469 0.95555556 0.95802469 0.96049383 0.96790123 0.96049383
 0.95802469 0.9654321  0.96296296 0.94567901]

mean value: 0.9592592592592593

key: test_fscore
value: [0.93333333 0.85714286 0.91304348 0.85714286 0.91304348 0.85106383
 0.86956522 0.82926829 0.90909091 0.85714286]

mean value: 0.8789837110236018

key: train_fscore
value: [0.95802469 0.95588235 0.95781638 0.960199   0.96790123 0.960199
 0.95823096 0.96601942 0.96277916 0.94607843]

mean value: 0.9593130629395346

key: test_precision
value: [0.95454545 0.80769231 0.91304348 0.80769231 0.91304348 0.8
 0.83333333 0.89473684 0.90909091 0.9       ]

mean value: 0.8733178110981314

key: train_precision
value: [0.95566502 0.94660194 0.960199   0.965      0.96551724 0.96984925
 0.95588235 0.95215311 0.97       0.94146341]

mean value: 0.9582331336586875

key: test_recall
value: [0.91304348 0.91304348 0.91304348 0.91304348 0.91304348 0.90909091
 0.90909091 0.77272727 0.90909091 0.81818182]

mean value: 0.8883399209486166

key: train_recall
value: [0.96039604 0.96534653 0.95544554 0.95544554 0.97029703 0.95073892
 0.96059113 0.98029557 0.95566502 0.95073892]

mean value: 0.9604960249719553

key: test_roc_auc
value: [0.93379447 0.84288538 0.91106719 0.84288538 0.91106719 0.8458498
 0.86758893 0.84288538 0.91106719 0.86561265]

mean value: 0.8774703557312253

key: train_roc_auc
value: [0.95803053 0.95557967 0.95801834 0.96048139 0.96790714 0.96051797
 0.95801834 0.96539531 0.96298103 0.94566649]

mean value: 0.9592596205433351

key: test_jcc
value: [0.875      0.75       0.84       0.75       0.84       0.74074074
 0.76923077 0.70833333 0.83333333 0.75      ]

mean value: 0.7856638176638177

key: train_jcc
value: [0.91943128 0.91549296 0.91904762 0.92344498 0.93779904 0.92344498
 0.91981132 0.9342723  0.92822967 0.89767442]

mean value: 0.9218648556530884

MCC on Blind test: 0.7

Accuracy on Blind test: 0.85

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02621555 0.01159739 0.01051664 0.01107073 0.01101708 0.01104641
 0.01100183 0.01075768 0.01108098 0.01129007]

mean value: 0.01255943775177002

key: score_time
value: [0.02028179 0.01012492 0.00889969 0.00954199 0.00957394 0.0095036
 0.00955319 0.0093534  0.00974369 0.0098424 ]

mean value: 0.010641860961914062

key: test_mcc
value: [0.74605372 0.51089209 0.82213439 0.82213439 0.78530224 0.82574419
 0.55841694 0.64752602 0.83484711 0.64426877]

mean value: 0.7197319857368281

key: train_mcc
value: [0.72859901 0.7001606  0.6847458  0.74821952 0.72358281 0.75811526
 0.72914356 0.77300001 0.73836061 0.73425986]

mean value: 0.7318187035926328

key: test_accuracy
value: [0.86666667 0.75555556 0.91111111 0.91111111 0.88888889 0.91111111
 0.77777778 0.82222222 0.91111111 0.82222222]

mean value: 0.8577777777777778

key: train_accuracy
value: [0.86419753 0.84938272 0.84197531 0.87407407 0.8617284  0.87901235
 0.86419753 0.88641975 0.8691358  0.86666667]

mean value: 0.865679012345679

key: test_fscore
value: [0.85714286 0.76595745 0.91304348 0.91304348 0.88372093 0.91304348
 0.7826087  0.82608696 0.9        0.81818182]

mean value: 0.8572829139322266

key: train_fscore
value: [0.86215539 0.84398977 0.83756345 0.87281796 0.86       0.87841191
 0.86146096 0.88557214 0.86848635 0.86363636]

mean value: 0.8634094288327001

key: test_precision
value: [0.94736842 0.75       0.91304348 0.91304348 0.95       0.875
 0.75       0.79166667 1.         0.81818182]

mean value: 0.8708303862422855

key: train_precision
value: [0.87309645 0.87301587 0.859375   0.87939698 0.86868687 0.885
 0.8814433  0.89447236 0.875      0.88601036]

mean value: 0.877549719680029

key: test_recall
value: [0.7826087  0.7826087  0.91304348 0.91304348 0.82608696 0.95454545
 0.81818182 0.86363636 0.81818182 0.81818182]

mean value: 0.8490118577075099

key: train_recall
value: [0.85148515 0.81683168 0.81683168 0.86633663 0.85148515 0.87192118
 0.84236453 0.87684729 0.86206897 0.84236453]

mean value: 0.8498536799492757

key: test_roc_auc
value: [0.86857708 0.75494071 0.91106719 0.91106719 0.89031621 0.91205534
 0.77865613 0.82312253 0.90909091 0.82213439]

mean value: 0.858102766798419

key: train_roc_auc
value: [0.86416622 0.84930254 0.84191338 0.87405502 0.86170317 0.8790299
 0.86425157 0.88644345 0.86915329 0.86672682]

mean value: 0.8656745354338389

key: test_jcc
value: [0.75       0.62068966 0.84       0.84       0.79166667 0.84
 0.64285714 0.7037037  0.81818182 0.69230769]

mean value: 0.7539406678889438

key: train_jcc
value: [0.75770925 0.7300885  0.72052402 0.77433628 0.75438596 0.78318584
 0.75663717 0.79464286 0.76754386 0.76      ]

mean value: 0.7599053737883451

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0166471  0.02165747 0.01920605 0.01865745 0.01901174 0.02208638
 0.01923394 0.02118444 0.03245139 0.02038074]

mean value: 0.021051669120788576

key: score_time
value: [0.01027346 0.01216531 0.0124557  0.01228547 0.01229858 0.01237655
 0.01228023 0.01235175 0.02047372 0.01226282]

mean value: 0.012922358512878419

key: test_mcc
value: [0.78530224 0.64752602 0.86732843 0.73320158 0.59725988 0.78405645
 0.70780516 0.64752602 0.82213439 0.70501339]

mean value: 0.7297153566397614

key: train_mcc
value: [0.86377146 0.84895551 0.81816266 0.86902982 0.80684222 0.81282858
 0.86843671 0.81827627 0.88164702 0.87785481]

mean value: 0.8465805044965292

key: test_accuracy
value: [0.88888889 0.82222222 0.93333333 0.86666667 0.77777778 0.88888889
 0.84444444 0.82222222 0.91111111 0.84444444]

mean value: 0.86

key: train_accuracy
value: [0.9308642  0.92098765 0.90617284 0.93333333 0.8962963  0.89876543
 0.93333333 0.9037037  0.94074074 0.9382716 ]

mean value: 0.9202469135802469

key: test_fscore
value: [0.88372093 0.81818182 0.93617021 0.86956522 0.73684211 0.87804878
 0.85714286 0.82608696 0.90909091 0.82051282]

mean value: 0.8535362607590927

key: train_fscore
value: [0.92820513 0.91534392 0.91121495 0.93059126 0.8852459  0.88828338
 0.93556086 0.91116173 0.94146341 0.93670886]

mean value: 0.9183779402635586

key: test_precision
value: [0.95       0.85714286 0.91666667 0.86956522 0.93333333 0.94736842
 0.77777778 0.79166667 0.90909091 0.94117647]

mean value: 0.8893788319710382

key: train_precision
value: [0.96276596 0.98295455 0.86283186 0.96791444 0.98780488 0.99390244
 0.90740741 0.84745763 0.93236715 0.96354167]

mean value: 0.940894796783545

key: test_recall
value: [0.82608696 0.7826087  0.95652174 0.86956522 0.60869565 0.81818182
 0.95454545 0.86363636 0.90909091 0.72727273]

mean value: 0.8316205533596838

key: train_recall
value: [0.8960396  0.85643564 0.96534653 0.8960396  0.8019802  0.80295567
 0.96551724 0.98522167 0.95073892 0.91133005]

mean value: 0.9031605130956446

key: test_roc_auc
value: [0.89031621 0.82312253 0.93280632 0.86660079 0.78162055 0.88735178
 0.84683794 0.82312253 0.91106719 0.84189723]

mean value: 0.8604743083003953

key: train_roc_auc
value: [0.93077842 0.92082866 0.90631859 0.93324148 0.89606399 0.89900258
 0.93325367 0.90350193 0.94071599 0.93833829]

mean value: 0.9202043603375115

key: test_jcc
value: [0.79166667 0.69230769 0.88       0.76923077 0.58333333 0.7826087
 0.75       0.7037037  0.83333333 0.69565217]

mean value: 0.7481836368140716

key: train_jcc
value: [0.86602871 0.84390244 0.83690987 0.87019231 0.79411765 0.79901961
 0.87892377 0.83682008 0.88940092 0.88095238]

mean value: 0.8496267734106784

MCC on Blind test: 0.75

Accuracy on Blind test: 0.87

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.03038692 0.02198148 0.02177858 0.01954556 0.02221322 0.02213001
 0.02184844 0.02069044 0.01776695 0.01800418]

mean value: 0.021634578704833984

key: score_time
value: [0.01426578 0.01328993 0.01486778 0.01319575 0.01274991 0.01249719
 0.01248622 0.01217604 0.01213098 0.01217985]

mean value: 0.012983942031860351

key: test_mcc
value: [0.78530224 0.69404997 0.82213439 0.78405645 0.78530224 0.82213439
 0.69583743 0.64752602 0.87406293 0.62869461]

mean value: 0.7539100674057231

key: train_mcc
value: [0.82016416 0.89684043 0.91614635 0.88695876 0.90644294 0.89949116
 0.9023231  0.81395079 0.84022048 0.79853924]

mean value: 0.868107740114024

key: test_accuracy
value: [0.88888889 0.84444444 0.91111111 0.88888889 0.88888889 0.91111111
 0.84444444 0.82222222 0.93333333 0.8       ]

mean value: 0.8733333333333333

key: train_accuracy
value: [0.90617284 0.94814815 0.95802469 0.94320988 0.95308642 0.94814815
 0.95061728 0.90123457 0.91604938 0.89135802]

mean value: 0.931604938271605

key: test_fscore
value: [0.88372093 0.85714286 0.91304348 0.89795918 0.88372093 0.90909091
 0.85106383 0.82608696 0.92682927 0.75675676]

mean value: 0.8705415099991635

key: train_fscore
value: [0.89893617 0.94890511 0.95760599 0.94403893 0.95238095 0.94601542
 0.95192308 0.90909091 0.91005291 0.87978142]

mean value: 0.9298730887557013

key: test_precision
value: [0.95       0.80769231 0.91304348 0.84615385 0.95       0.90909091
 0.8        0.79166667 1.         0.93333333]

mean value: 0.8900980541197933

key: train_precision
value: [0.97126437 0.93301435 0.96482412 0.92822967 0.96446701 0.98924731
 0.92957746 0.84388186 0.98285714 0.98773006]

mean value: 0.9495093349997615

key: test_recall
value: [0.82608696 0.91304348 0.91304348 0.95652174 0.82608696 0.90909091
 0.90909091 0.86363636 0.86363636 0.63636364]

mean value: 0.8616600790513834

key: train_recall
value: [0.83663366 0.96534653 0.95049505 0.96039604 0.94059406 0.90640394
 0.97536946 0.98522167 0.84729064 0.79310345]

mean value: 0.916085450909623

key: test_roc_auc
value: [0.89031621 0.84288538 0.91106719 0.88735178 0.89031621 0.91106719
 0.8458498  0.82312253 0.93181818 0.79644269]

mean value: 0.8730237154150198

key: train_roc_auc
value: [0.90600156 0.94819051 0.95800615 0.94325221 0.95305565 0.94825148
 0.95055602 0.90102668 0.91621958 0.89160123]

mean value: 0.9316161049602497

key: test_jcc
value: [0.79166667 0.75       0.84       0.81481481 0.79166667 0.83333333
 0.74074074 0.7037037  0.86363636 0.60869565]

mean value: 0.7738257941736203

key: train_jcc
value: [0.81642512 0.90277778 0.91866029 0.89400922 0.90909091 0.89756098
 0.90825688 0.83333333 0.83495146 0.78536585]

mean value: 0.8700431810959086

MCC on Blind test: 0.84

Accuracy on Blind test: 0.92

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.1846261  0.16588259 0.17944908 0.16845226 0.20947981 0.17672706
 0.16525626 0.16993427 0.16802812 0.17575526]

mean value: 0.17635908126831054

key: score_time
value: [0.0156827  0.01687837 0.01521039 0.01514649 0.02424717 0.01539278
 0.01648426 0.01540637 0.01509404 0.02262664]

mean value: 0.017216920852661133

key: test_mcc
value: [0.82213439 0.86732843 0.95643752 1.         0.86758893 0.91485328
 0.91106719 0.73559956 1.         1.        ]

mean value: 0.9075009309091597

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.93333333 0.97777778 1.         0.93333333 0.95555556
 0.95555556 0.86666667 1.         1.        ]

mean value: 0.9533333333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91304348 0.93617021 0.9787234  1.         0.93333333 0.95652174
 0.95454545 0.85714286 1.         1.        ]

mean value: 0.9529480479434226

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.91666667 0.95833333 1.         0.95454545 0.91666667
 0.95454545 0.9        1.         1.        ]

mean value: 0.9513801054018445

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 1.         1.         0.91304348 1.
 0.95454545 0.81818182 1.         1.        ]

mean value: 0.9555335968379447

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91106719 0.93280632 0.97727273 1.         0.93379447 0.95652174
 0.9555336  0.86561265 1.         1.        ]

mean value: 0.9532608695652174

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84       0.88       0.95833333 1.         0.875      0.91666667
 0.91304348 0.75       1.         1.        ]

mean value: 0.9133043478260869

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.93

Accuracy on Blind test: 0.96

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05411959 0.06604648 0.08949947 0.05077219 0.0609808  0.0644958
 0.07106686 0.05174732 0.07121181 0.07347536]

mean value: 0.06534156799316407

key: score_time
value: [0.02211094 0.03330112 0.0351975  0.02889776 0.02099347 0.02941012
 0.02830839 0.02586412 0.02633166 0.02234125]

mean value: 0.027275633811950684

key: test_mcc
value: [0.82213439 0.91106719 0.91106719 1.         0.86758893 0.91485328
 0.91485328 0.82213439 1.         0.87406293]

mean value: 0.9037761587267465

key: train_mcc
value: [0.98024679 0.98519693 0.98519693 0.99017145 0.97560447 0.98029413
 0.98024679 0.99507389 0.98029509 0.98519729]

mean value: 0.9837523754632594

key: test_accuracy
value: [0.91111111 0.95555556 0.95555556 1.         0.93333333 0.95555556
 0.95555556 0.91111111 1.         0.93333333]

mean value: 0.9511111111111111

key: train_accuracy
value: [0.99012346 0.99259259 0.99259259 0.99506173 0.98765432 0.99012346
 0.99012346 0.99753086 0.99012346 0.99259259]

mean value: 0.9918518518518519

key: test_fscore
value: [0.91304348 0.95652174 0.95652174 1.         0.93333333 0.95652174
 0.95652174 0.90909091 1.         0.92682927]

mean value: 0.9508383945499534

key: train_fscore
value: [0.99009901 0.99255583 0.99255583 0.99502488 0.98746867 0.99019608
 0.99014778 0.99753086 0.99009901 0.99259259]

mean value: 0.9918270548106813

key: test_precision
value: [0.91304348 0.95652174 0.95652174 1.         0.95454545 0.91666667
 0.91666667 0.90909091 1.         1.        ]

mean value: 0.9523056653491436

key: train_precision
value: [0.99009901 0.99502488 0.99502488 1.         1.         0.98536585
 0.99014778 1.         0.99502488 0.9950495 ]

mean value: 0.9945736778626925

key: test_recall
value: [0.91304348 0.95652174 0.95652174 1.         0.91304348 1.
 1.         0.90909091 1.         0.86363636]

mean value: 0.9511857707509881

key: train_recall
value: [0.99009901 0.99009901 0.99009901 0.99009901 0.97524752 0.99507389
 0.99014778 0.99507389 0.98522167 0.99014778]

mean value: 0.9891308588986978

key: test_roc_auc
value: [0.91106719 0.9555336  0.9555336  1.         0.93379447 0.95652174
 0.95652174 0.91106719 1.         0.93181818]

mean value: 0.9511857707509882

key: train_roc_auc
value: [0.9901234  0.99258645 0.99258645 0.9950495  0.98762376 0.9901112
 0.9901234  0.99753695 0.99013559 0.99259864]

mean value: 0.9918475345071454

key: test_jcc
value: [0.84       0.91666667 0.91666667 1.         0.875      0.91666667
 0.91666667 0.83333333 1.         0.86363636]

mean value: 0.9078636363636363

key: train_jcc
value: [0.98039216 0.98522167 0.98522167 0.99009901 0.97524752 0.98058252
 0.9804878  0.99507389 0.98039216 0.98529412]

mean value: 0.9838012536555218

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.10892105 0.16002131 0.14787793 0.20454001 0.19232368 0.16132021
 0.145684   0.18226552 0.45024633 0.14779735]

mean value: 0.19009974002838134

key: score_time
value: [0.01462197 0.01497269 0.02363658 0.03207684 0.01924872 0.03103209
 0.02994967 0.02701902 0.04569006 0.01469874]

mean value: 0.02529463768005371

key: test_mcc
value: [0.670374   0.55841694 0.63358389 0.6133209  0.73663511 0.69156407
 0.4229249  0.55533597 0.72299881 0.64613475]

mean value: 0.625128933432823

key: train_mcc
value: [0.99017145 0.99017145 0.98529269 0.98529269 0.98529269 0.98529376
 0.99017193 0.99017193 0.99507389 0.99017193]

mean value: 0.9887104432367875

key: test_accuracy
value: [0.82222222 0.77777778 0.8        0.8        0.86666667 0.82222222
 0.71111111 0.77777778 0.84444444 0.82222222]

mean value: 0.8044444444444444

key: train_accuracy
value: [0.99506173 0.99506173 0.99259259 0.99259259 0.99259259 0.99259259
 0.99506173 0.99506173 0.99753086 0.99506173]

mean value: 0.994320987654321

key: test_fscore
value: [0.8        0.77272727 0.76923077 0.82352941 0.86363636 0.84615385
 0.71111111 0.77272727 0.81081081 0.80952381]

mean value: 0.7979450667685961

key: train_fscore
value: [0.99502488 0.99502488 0.9925187  0.9925187  0.9925187  0.99255583
 0.9950495  0.9950495  0.99753086 0.9950495 ]

mean value: 0.9942841071283992

key: test_precision
value: [0.94117647 0.80952381 0.9375     0.75       0.9047619  0.73333333
 0.69565217 0.77272727 1.         0.85      ]

mean value: 0.8394674964847599

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.69565217 0.73913043 0.65217391 0.91304348 0.82608696 1.
 0.72727273 0.77272727 0.68181818 0.77272727]

mean value: 0.7780632411067193

key: train_recall
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
 0.99014778 0.99014778 0.99507389 0.99014778]

mean value: 0.9886382480612593

key: test_roc_auc
value: [0.82509881 0.77865613 0.80335968 0.79743083 0.86758893 0.82608696
 0.71146245 0.77766798 0.84090909 0.82114625]

mean value: 0.8049407114624506

key: train_roc_auc
value: [0.9950495  0.9950495  0.99257426 0.99257426 0.99257426 0.99261084
 0.99507389 0.99507389 0.99753695 0.99507389]

mean value: 0.9943191240306297

key: test_jcc
value: [0.66666667 0.62962963 0.625      0.7        0.76       0.73333333
 0.55172414 0.62962963 0.68181818 0.68      ]

mean value: 0.6657801579008475

key: train_jcc
value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167
 0.99014778 0.99014778 0.99507389 0.99014778]

mean value: 0.9886382480612593

MCC on Blind test: 0.62

Accuracy on Blind test: 0.81

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.69311619 0.70693564 0.67376351 0.71329546 0.67440891 0.70312452
 0.67457604 0.7066412  0.69384813 0.67763829]

mean value: 0.691734790802002

key: score_time
value: [0.00995064 0.00967097 0.01044416 0.00957108 0.014184   0.01022768
 0.00957584 0.0112319  0.01048136 0.01039362]

mean value: 0.010573124885559082

key: test_mcc
value: [0.82213439 0.82506438 0.95643752 1.         0.82574419 0.91485328
 0.91485328 0.82213439 1.         0.95643752]

mean value: 0.9037658939474779

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.91111111 0.97777778 1.         0.91111111 0.95555556
 0.95555556 0.91111111 1.         0.97777778]

mean value: 0.9511111111111111

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91304348 0.91666667 0.9787234  1.         0.90909091 0.95652174
 0.95652174 0.90909091 1.         0.97674419]

mean value: 0.9516403031672055

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91304348 0.88       0.95833333 1.         0.95238095 0.91666667
 0.91666667 0.90909091 1.         1.        ]

mean value: 0.9446182006399397

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.91304348 0.95652174 1.         1.         0.86956522 1.
 1.         0.90909091 1.         0.95454545]

mean value: 0.9602766798418972

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91106719 0.91007905 0.97727273 1.         0.91205534 0.95652174
 0.95652174 0.91106719 1.         0.97727273]

mean value: 0.9511857707509881

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84       0.84615385 0.95833333 1.         0.83333333 0.91666667
 0.91666667 0.83333333 1.         0.95454545]

mean value: 0.9099032634032634

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.96

Accuracy on Blind test: 0.98

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.1388762  0.1777842  0.13639593 0.05101848 0.0506146  0.0657537
 0.11698508 0.15806198 0.03034258 0.0545578 ]

mean value: 0.09803905487060546

key: score_time
value: [0.01342249 0.02337122 0.01860356 0.03219318 0.02325344 0.01318908
 0.01368833 0.02427459 0.01500654 0.0172019 ]

mean value: 0.019420433044433593

key: test_mcc
value: [0.60000118 0.55666994 0.38019877 0.56261436 0.22004311 0.2903816
 0.21191154 0.24356483 0.5216284  0.33797818]

mean value: 0.3924991910418209

key: train_mcc
value: [0.9901234  0.97541644 0.99017145 0.98519693 0.72864068 0.76507358
 0.93772687 0.89576137 0.98529376 0.78773172]

mean value: 0.9041136193904215

key: test_accuracy
value: [0.8        0.77777778 0.68888889 0.77777778 0.6        0.64444444
 0.6        0.62222222 0.75555556 0.66666667]

mean value: 0.6933333333333334

key: train_accuracy
value: [0.99506173 0.98765432 0.99506173 0.99259259 0.84691358 0.8691358
 0.96790123 0.94567901 0.99259259 0.89382716]

mean value: 0.9486419753086419

key: test_fscore
value: [0.80851064 0.79166667 0.68181818 0.8        0.52631579 0.6
 0.64       0.60465116 0.71794872 0.68085106]

mean value: 0.6851762220825608

key: train_fscore
value: [0.9950495  0.98771499 0.99502488 0.99255583 0.81871345 0.84985836
 0.96897375 0.94300518 0.99255583 0.89486553]

mean value: 0.9438317292087527

key: test_precision
value: [0.79166667 0.76       0.71428571 0.74074074 0.66666667 0.66666667
 0.57142857 0.61904762 0.82352941 0.64      ]

mean value: 0.6994032057267351

key: train_precision
value: [0.9950495  0.9804878  1.         0.99502488 1.         1.
 0.93981481 0.99453552 1.         0.88834951]

mean value: 0.9793262033954039

key: test_recall
value: [0.82608696 0.82608696 0.65217391 0.86956522 0.43478261 0.54545455
 0.72727273 0.59090909 0.63636364 0.72727273]

mean value: 0.6835968379446641

key: train_recall
value: [0.9950495  0.9950495  0.99009901 0.99009901 0.69306931 0.73891626
 1.         0.89655172 0.98522167 0.90147783]

mean value: 0.9185533824318393

key: test_roc_auc
value: [0.79940711 0.77667984 0.68972332 0.7756917  0.60375494 0.64229249
 0.6027668  0.6215415  0.75296443 0.66798419]

mean value: 0.6932806324110672

key: train_roc_auc
value: [0.9950617  0.98767254 0.9950495  0.99258645 0.84653465 0.86945813
 0.96782178 0.94580061 0.99261084 0.89380822]

mean value: 0.9486404428620202

key: test_jcc
value: [0.67857143 0.65517241 0.51724138 0.66666667 0.35714286 0.42857143
 0.47058824 0.43333333 0.56       0.51612903]

mean value: 0.5283416774941345

key: train_jcc
value: [0.99014778 0.97572816 0.99009901 0.98522167 0.69306931 0.73891626
 0.93981481 0.89215686 0.98522167 0.80973451]

mean value: 0.90001100521683

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.05427146 0.05056667 0.02967739 0.04050016 0.04019976 0.04014754
 0.04688692 0.03133988 0.03438783 0.03365636]

mean value: 0.040163397789001465

key: score_time
value: [0.02168655 0.02730417 0.03774905 0.0235827  0.02416897 0.02359438
 0.0278523  0.02273226 0.02274179 0.02754211]

mean value: 0.02589542865753174

key: test_mcc
value: [0.82574419 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613
 0.73663511 0.68911026 0.95652174 0.77821935]

mean value: 0.7965770525024884

key: train_mcc
value: [0.85731376 0.86693826 0.88152664 0.86177295 0.89175679 0.87164354
 0.871768   0.871768   0.86176621 0.85221434]

mean value: 0.8688468510791374

key: test_accuracy
value: [0.91111111 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889
 0.86666667 0.84444444 0.97777778 0.88888889]

mean value: 0.8977777777777778

key: train_accuracy
value: [0.92839506 0.93333333 0.94074074 0.9308642  0.94567901 0.93580247
 0.93580247 0.93580247 0.9308642  0.92592593]

mean value: 0.934320987654321

key: test_fscore
value: [0.90909091 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889
 0.86956522 0.8372093  0.97777778 0.88372093]

mean value: 0.8978648955401747

key: train_fscore
value: [0.92944039 0.93398533 0.9408867  0.93103448 0.94634146 0.93627451
 0.93658537 0.93658537 0.93137255 0.92718447]

mean value: 0.9349690621598662

key: test_precision
value: [0.95238095 0.875      0.91666667 0.86956522 0.91304348 0.86956522
 0.83333333 0.85714286 0.95652174 0.9047619 ]

mean value: 0.8947981366459627

key: train_precision
value: [0.9138756  0.92270531 0.93627451 0.92647059 0.93269231 0.93170732
 0.92753623 0.92753623 0.92682927 0.9138756 ]

mean value: 0.9259502965047404

key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091
 0.90909091 0.81818182 1.         0.86363636]

mean value: 0.9021739130434783

key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.93564356 0.96039604 0.9408867
 0.94581281 0.94581281 0.93596059 0.9408867 ]

mean value: 0.9442032873238062

key: test_roc_auc
value: [0.91205534 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806
 0.86758893 0.84387352 0.97826087 0.88833992]

mean value: 0.8978260869565218

key: train_roc_auc
value: [0.9284373  0.93336341 0.94075257 0.93087597 0.94571526 0.93578988
 0.93577769 0.93577769 0.93085158 0.92588889]

mean value: 0.934323025898649

key: test_jcc
value: [0.83333333 0.80769231 0.88       0.76923077 0.84       0.8
 0.76923077 0.72       0.95652174 0.79166667]

mean value: 0.8167675585284281

key: train_jcc
value: [0.86818182 0.87614679 0.88837209 0.87096774 0.89814815 0.88018433
 0.88073394 0.88073394 0.87155963 0.86425339]

mean value: 0.8779281838677705

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.43447208 0.54856634 0.45230627 0.30222797 0.46330261 0.58453083
 0.51913071 0.37581491 0.53456688 0.40650988]

mean value: 0.4621428489685059

key: score_time
value: [0.02360606 0.03693914 0.02723217 0.01753807 0.02427649 0.02521634
 0.02481031 0.03186703 0.04090238 0.03361082]

mean value: 0.028599882125854494

key: test_mcc
value: [0.82574419 0.77821935 0.86732843 0.77821935 0.82213439 0.78530224
 0.73663511 0.64613475 0.95652174 0.77821935]

mean value: 0.7974458893590411

key: train_mcc
value: [0.85731376 0.86693826 0.88152664 0.90618217 0.93581427 0.91606106
 0.80250226 0.92620337 0.86176621 0.85221434]

mean value: 0.8806522363523859

key: test_accuracy
value: [0.91111111 0.88888889 0.93333333 0.88888889 0.91111111 0.88888889
 0.86666667 0.82222222 0.97777778 0.88888889]

mean value: 0.8977777777777778

key: train_accuracy
value: [0.92839506 0.93333333 0.94074074 0.95308642 0.96790123 0.95802469
 0.90123457 0.96296296 0.9308642  0.92592593]

mean value: 0.9402469135802469

key: test_fscore
value: [0.90909091 0.89361702 0.93617021 0.89361702 0.91304348 0.89361702
 0.86956522 0.80952381 0.97777778 0.88372093]

mean value: 0.8979743398872972

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.92944039 0.93398533 0.9408867  0.9528536  0.96790123 0.95802469
 0.90196078 0.96350365 0.93137255 0.92718447]

mean value: 0.9407113391803744

key: test_precision
value: [0.95238095 0.875      0.91666667 0.875      0.91304348 0.84
 0.83333333 0.85       0.95652174 0.9047619 ]

mean value: 0.8916708074534161

key: train_precision
value: [0.9138756  0.92270531 0.93627451 0.95522388 0.96551724 0.96039604
 0.89756098 0.95192308 0.92682927 0.9138756 ]

mean value: 0.9344181502391634

key: test_recall
value: [0.86956522 0.91304348 0.95652174 0.91304348 0.91304348 0.95454545
 0.90909091 0.77272727 1.         0.86363636]

mean value: 0.9065217391304348

key: train_recall
value: [0.94554455 0.94554455 0.94554455 0.95049505 0.97029703 0.95566502
 0.90640394 0.97536946 0.93596059 0.9408867 ]

mean value: 0.9471711456859971

key: test_roc_auc
value: [0.91205534 0.88833992 0.93280632 0.88833992 0.91106719 0.89031621
 0.86758893 0.82114625 0.97826087 0.88833992]

mean value: 0.8978260869565218

key: train_roc_auc
value: [0.9284373  0.93336341 0.94075257 0.95308004 0.96790714 0.95803053
 0.90122177 0.96293225 0.93085158 0.92588889]

mean value: 0.9402465492854705

key: test_jcc
value: [0.83333333 0.80769231 0.88       0.80769231 0.84       0.80769231
 0.76923077 0.68       0.95652174 0.79166667]

mean value: 0.8173829431438128

key: train_jcc
value: [0.86818182 0.87614679 0.88837209 0.90995261 0.93779904 0.91943128
 0.82142857 0.92957746 0.87155963 0.86425339]

mean value: 0.888670269242401

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.04649901 0.10190964 0.07873082 0.10802031 0.04602194 0.10666394
 0.04753137 0.04401898 0.05204272 0.05098581]

mean value: 0.06824245452880859

key: score_time
value: [0.01550841 0.02776861 0.01159406 0.01329851 0.01062369 0.01794052
 0.01888156 0.01782799 0.01066375 0.0106461 ]

mean value: 0.015475320816040038

key: test_mcc
value: [0.86452993 0.77352678 0.77352678 0.6882472  0.86452993 0.68252363
 0.77352678 0.90909091 0.81818182 0.95553309]

mean value: 0.8103216868800538

key: train_mcc
value: [0.85858586 0.88929729 0.87383768 0.86873119 0.86391186 0.87374852
 0.86373551 0.85876112 0.85354624 0.85380763]

mean value: 0.8657962897237084

key: test_accuracy
value: [0.93181818 0.88636364 0.88636364 0.84090909 0.93181818 0.84090909
 0.88636364 0.95454545 0.90909091 0.97727273]

mean value: 0.9045454545454545

key: train_accuracy
value: [0.92929293 0.94444444 0.93686869 0.93434343 0.93181818 0.93686869
 0.93181818 0.92929293 0.92676768 0.92676768]

mean value: 0.9328282828282828

key: test_fscore
value: [0.93333333 0.88372093 0.88372093 0.85106383 0.93023256 0.84444444
 0.88372093 0.95454545 0.90909091 0.97777778]

mean value: 0.9051651097816362

key: train_fscore
value: [0.92929293 0.94527363 0.93734336 0.93467337 0.93266833 0.93702771
 0.93233083 0.93       0.92695214 0.9276808 ]

mean value: 0.9333243089480099

key: test_precision
value: [0.91304348 0.9047619  0.9047619  0.8        0.95238095 0.82608696
 0.9047619  0.95454545 0.90909091 0.95652174]

mean value: 0.9025955204216074

key: train_precision
value: [0.92929293 0.93137255 0.93034826 0.93       0.92118227 0.93467337
 0.92537313 0.92079208 0.92462312 0.91625616]

mean value: 0.9263913856612664

key: test_recall
value: [0.95454545 0.86363636 0.86363636 0.90909091 0.90909091 0.86363636
 0.86363636 0.95454545 0.90909091 1.        ]

mean value: 0.9090909090909091

key: train_recall
value: [0.92929293 0.95959596 0.94444444 0.93939394 0.94444444 0.93939394
 0.93939394 0.93939394 0.92929293 0.93939394]

mean value: 0.9404040404040405

key: test_roc_auc
value: [0.93181818 0.88636364 0.88636364 0.84090909 0.93181818 0.84090909
 0.88636364 0.95454545 0.90909091 0.97727273]

mean value: 0.9045454545454545

key: train_roc_auc
value: [0.92929293 0.94444444 0.93686869 0.93434343 0.93181818 0.93686869
 0.93181818 0.92929293 0.92676768 0.92676768]

mean value: 0.9328282828282828

key: test_jcc
value: [0.875      0.79166667 0.79166667 0.74074074 0.86956522 0.73076923
 0.79166667 0.91304348 0.83333333 0.95652174]

mean value: 0.8293973739625913

key: train_jcc
value: [0.86792453 0.89622642 0.88207547 0.87735849 0.87383178 0.88151659
 0.87323944 0.86915888 0.86384977 0.86511628]

mean value: 0.8750297628491411

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [2.08121514 2.80702353 3.87393689 2.53061247 2.8521018  2.31932235
 3.42499232 2.182899   3.24924803 2.97179842]

mean value: 2.8293149948120115

key: score_time
value: [0.01505876 0.01849389 0.02364302 0.03510594 0.01178694 0.0123539
 0.03271341 0.02871919 0.02154374 0.01550484]

mean value: 0.021492362022399902

key: test_mcc
value: [0.86452993 0.77352678 0.81818182 0.68252363 0.82158384 0.68252363
 0.77352678 0.90909091 0.81818182 0.95553309]

mean value: 0.8099202235722511

key: train_mcc
value: [0.81822356 0.84865804 0.88393985 0.8939508  0.82866339 0.82832509
 0.89903576 0.88393985 0.89903576 0.88393985]

mean value: 0.8667711957739941

key: test_accuracy
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.90909091 0.84090909
 0.88636364 0.95454545 0.90909091 0.97727273]

mean value: 0.9045454545454545

key: train_accuracy
value: [0.90909091 0.92424242 0.94191919 0.9469697  0.91414141 0.91414141
 0.94949495 0.94191919 0.94949495 0.94191919]

mean value: 0.9333333333333333

key: test_fscore
value: [0.93333333 0.88372093 0.90909091 0.84444444 0.9047619  0.84444444
 0.88372093 0.95454545 0.90909091 0.97777778]

mean value: 0.9044931037954294

key: train_fscore
value: [0.90954774 0.925      0.94235589 0.94710327 0.91542289 0.91457286
 0.94974874 0.94235589 0.94974874 0.94235589]

mean value: 0.9338211919756527

key: test_precision
value: [0.91304348 0.9047619  0.90909091 0.82608696 0.95       0.82608696
 0.9047619  0.95454545 0.90909091 0.95652174]

mean value: 0.9053990212685865

key: train_precision
value: [0.905      0.91584158 0.93532338 0.94472362 0.90196078 0.91
 0.945      0.93532338 0.945      0.93532338]

mean value: 0.9273496135816325

key: test_recall
value: [0.95454545 0.86363636 0.90909091 0.86363636 0.86363636 0.86363636
 0.86363636 0.95454545 0.90909091 1.        ]

mean value: 0.9045454545454545

key: train_recall
value: [0.91414141 0.93434343 0.94949495 0.94949495 0.92929293 0.91919192
 0.95454545 0.94949495 0.95454545 0.94949495]

mean value: 0.9404040404040405

key: test_roc_auc
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.90909091 0.84090909
 0.88636364 0.95454545 0.90909091 0.97727273]

mean value: 0.9045454545454545

key: train_roc_auc
value: [0.90909091 0.92424242 0.94191919 0.9469697  0.91414141 0.91414141
 0.94949495 0.94191919 0.94949495 0.94191919]

mean value: 0.9333333333333333

key: test_jcc
value: [0.875      0.79166667 0.83333333 0.73076923 0.82608696 0.73076923
 0.79166667 0.91304348 0.83333333 0.95652174]

mean value: 0.8282190635451505

key: train_jcc
value: [0.83410138 0.86046512 0.89099526 0.89952153 0.8440367  0.84259259
 0.90430622 0.89099526 0.90430622 0.89099526]

mean value: 0.8762315541890235

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01285553 0.01226926 0.01220226 0.01206255 0.01200318 0.01196909
 0.01202083 0.01223373 0.01212716 0.01206827]

mean value: 0.01218118667602539

key: score_time
value: [0.0109396  0.01060057 0.01065397 0.01066351 0.01072145 0.01063704
 0.01068616 0.01052976 0.01061583 0.01066804]

mean value: 0.010671591758728028

key: test_mcc
value: [0.72727273 0.46225016 0.62330229 0.6882472  0.5547002  0.45454545
 0.60678804 0.68252363 0.6882472  0.6882472 ]

mean value: 0.6176124107532563

key: train_mcc
value: [0.6873189  0.70131223 0.65677139 0.73180407 0.66882888 0.71147617
 0.6771364  0.6771364  0.66144272 0.6724898 ]

mean value: 0.6845716943284235

key: test_accuracy
value: [0.86363636 0.72727273 0.79545455 0.84090909 0.77272727 0.72727273
 0.79545455 0.84090909 0.84090909 0.84090909]

mean value: 0.8045454545454546

key: train_accuracy
value: [0.84090909 0.84848485 0.82575758 0.86363636 0.83080808 0.85353535
 0.83585859 0.83585859 0.82828283 0.83333333]

mean value: 0.8396464646464646

key: test_fscore
value: [0.86363636 0.7        0.75675676 0.82926829 0.75       0.72727273
 0.76923077 0.8372093  0.82926829 0.82926829]

mean value: 0.7891910797270979

key: train_fscore
value: [0.83018868 0.83957219 0.81401617 0.85561497 0.81743869 0.84491979
 0.82479784 0.82479784 0.8172043  0.82162162]

mean value: 0.8290172105750199

key: test_precision
value: [0.86363636 0.77777778 0.93333333 0.89473684 0.83333333 0.72727273
 0.88235294 0.85714286 0.89473684 0.89473684]

mean value: 0.8559059859988652

key: train_precision
value: [0.89017341 0.89204545 0.87283237 0.90909091 0.88757396 0.89772727
 0.88439306 0.88439306 0.87356322 0.88372093]

mean value: 0.8875513656998492

key: test_recall
value: [0.86363636 0.63636364 0.63636364 0.77272727 0.68181818 0.72727273
 0.68181818 0.81818182 0.77272727 0.77272727]

mean value: 0.7363636363636363

key: train_recall
value: [0.77777778 0.79292929 0.76262626 0.80808081 0.75757576 0.7979798
 0.77272727 0.77272727 0.76767677 0.76767677]

mean value: 0.7777777777777778

key: test_roc_auc
value: [0.86363636 0.72727273 0.79545455 0.84090909 0.77272727 0.72727273
 0.79545455 0.84090909 0.84090909 0.84090909]

mean value: 0.8045454545454546

key: train_roc_auc
value: [0.84090909 0.84848485 0.82575758 0.86363636 0.83080808 0.85353535
 0.83585859 0.83585859 0.82828283 0.83333333]

mean value: 0.8396464646464646

key: test_jcc
value: [0.76       0.53846154 0.60869565 0.70833333 0.6        0.57142857
 0.625      0.72       0.70833333 0.70833333]

mean value: 0.6548585762064023

key: train_jcc
value: [0.70967742 0.7235023  0.68636364 0.74766355 0.69124424 0.73148148
 0.70183486 0.70183486 0.69090909 0.69724771]

mean value: 0.7081759154482379

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01238537 0.01245689 0.01242876 0.01219463 0.01234508 0.01249957
 0.01238346 0.01238561 0.01246309 0.01236248]

mean value: 0.012390494346618652

key: score_time
value: [0.01070952 0.01077819 0.01080084 0.01070762 0.01050544 0.01057076
 0.0105443  0.01068759 0.01069641 0.01076317]

mean value: 0.010676383972167969

key: test_mcc
value: [0.82158384 0.32118203 0.81818182 0.59152048 0.59152048 0.50051733
 0.63636364 0.77352678 0.77352678 0.77352678]

mean value: 0.6601449963922943

key: train_mcc
value: [0.74250948 0.68434524 0.75299597 0.77793654 0.70837286 0.74243371
 0.74243371 0.72230514 0.75253485 0.7577304 ]

mean value: 0.7383597900292419

key: test_accuracy
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
 0.81818182 0.88636364 0.88636364 0.88636364]

mean value: 0.8295454545454546

key: train_accuracy
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
 0.87121212 0.86111111 0.87626263 0.87878788]

mean value: 0.8689393939393939

key: test_fscore
value: [0.91304348 0.63414634 0.90909091 0.8        0.79069767 0.75555556
 0.81818182 0.88372093 0.88888889 0.88888889]

mean value: 0.8282214484981508

key: train_fscore
value: [0.87218045 0.83377309 0.87841191 0.89       0.84895833 0.87088608
 0.87088608 0.86005089 0.87657431 0.88      ]

mean value: 0.868172113199113

key: test_precision
value: [0.875      0.68421053 0.90909091 0.7826087  0.80952381 0.73913043
 0.81818182 0.9047619  0.86956522 0.86956522]

mean value: 0.8261638533091622

key: train_precision
value: [0.86567164 0.87292818 0.86341463 0.88118812 0.87634409 0.87309645
 0.87309645 0.86666667 0.87437186 0.87128713]

mean value: 0.8718065205643388

key: test_recall
value: [0.95454545 0.59090909 0.90909091 0.81818182 0.77272727 0.77272727
 0.81818182 0.86363636 0.90909091 0.90909091]

mean value: 0.8318181818181818

key: train_recall
value: [0.87878788 0.7979798  0.89393939 0.8989899  0.82323232 0.86868687
 0.86868687 0.85353535 0.87878788 0.88888889]

mean value: 0.8651515151515151

key: test_roc_auc
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
 0.81818182 0.88636364 0.88636364 0.88636364]

mean value: 0.8295454545454546

key: train_roc_auc
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
 0.87121212 0.86111111 0.87626263 0.87878788]

mean value: 0.8689393939393939

key: test_jcc
value: [0.84       0.46428571 0.83333333 0.66666667 0.65384615 0.60714286
 0.69230769 0.79166667 0.8        0.8       ]

mean value: 0.7149249084249084

key: train_jcc
value: [0.77333333 0.71493213 0.78318584 0.8018018  0.73755656 0.77130045
 0.77130045 0.75446429 0.78026906 0.78571429]

mean value: 0.7673858190211428

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01081586 0.01136518 0.01138735 0.01129532 0.01170659 0.01099205
 0.01889086 0.01352763 0.03923583 0.01395297]

mean value: 0.015316963195800781

key: score_time
value: [0.01401186 0.03292465 0.02555895 0.02621388 0.0230813  0.01509142
 0.07618952 0.06447315 0.09006906 0.05066252]

mean value: 0.0418276309967041

key: test_mcc
value: [0.59648091 0.50051733 0.54545455 0.50471461 0.27386128 0.47245559
 0.60678804 0.50051733 0.54772256 0.5547002 ]

mean value: 0.510321238961513

key: train_mcc
value: [0.68718427 0.69199863 0.68700889 0.66697297 0.70710678 0.71366109
 0.70739557 0.67275618 0.66182722 0.66670068]

mean value: 0.6862612279819491

key: test_accuracy
value: [0.79545455 0.75       0.77272727 0.75       0.63636364 0.72727273
 0.79545455 0.75       0.77272727 0.77272727]

mean value: 0.7522727272727272

key: train_accuracy
value: [0.84343434 0.8459596  0.84343434 0.83333333 0.85353535 0.85606061
 0.85353535 0.83585859 0.83080808 0.83333333]

mean value: 0.8429292929292929

key: test_fscore
value: [0.80851064 0.74418605 0.77272727 0.73170732 0.61904762 0.68421053
 0.76923077 0.75555556 0.7826087  0.75      ]

mean value: 0.7417784440411851

key: train_fscore
value: [0.84102564 0.84711779 0.84183673 0.83076923 0.85279188 0.85117493
 0.85128205 0.83116883 0.8286445  0.83248731]

mean value: 0.8408298907247728

key: test_precision
value: [0.76       0.76190476 0.77272727 0.78947368 0.65       0.8125
 0.88235294 0.73913043 0.75       0.83333333]

mean value: 0.7751422428134973

key: train_precision
value: [0.85416667 0.84079602 0.85051546 0.84375    0.85714286 0.88108108
 0.86458333 0.85561497 0.83937824 0.83673469]

mean value: 0.8523763327523514

key: test_recall
value: [0.86363636 0.72727273 0.77272727 0.68181818 0.59090909 0.59090909
 0.68181818 0.77272727 0.81818182 0.68181818]

mean value: 0.7181818181818181

key: train_recall
value: [0.82828283 0.85353535 0.83333333 0.81818182 0.84848485 0.82323232
 0.83838384 0.80808081 0.81818182 0.82828283]

mean value: 0.8297979797979798

key: test_roc_auc
value: [0.79545455 0.75       0.77272727 0.75       0.63636364 0.72727273
 0.79545455 0.75       0.77272727 0.77272727]

mean value: 0.7522727272727273

key: train_roc_auc
value: [0.84343434 0.8459596  0.84343434 0.83333333 0.85353535 0.85606061
 0.85353535 0.83585859 0.83080808 0.83333333]

mean value: 0.842929292929293

key: test_jcc
value: [0.67857143 0.59259259 0.62962963 0.57692308 0.44827586 0.52
 0.625      0.60714286 0.64285714 0.6       ]

mean value: 0.5920992589785693

key: train_jcc
value: [0.72566372 0.73478261 0.72687225 0.71052632 0.74336283 0.74090909
 0.74107143 0.71111111 0.70742358 0.71304348]

mean value: 0.7254766409492254

MCC on Blind test: 0.41

Accuracy on Blind test: 0.71

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0494976  0.01854587 0.01976705 0.01910186 0.01809573 0.01886225
 0.01888537 0.0258832  0.02618337 0.02646255]

mean value: 0.024128484725952148

key: score_time
value: [0.03666711 0.01181078 0.01129031 0.0110836  0.01145768 0.01238108
 0.01312828 0.015697   0.01585555 0.01609397]

mean value: 0.015546536445617676

key: test_mcc
value: [0.86452993 0.77352678 0.81818182 0.6882472  0.7800135  0.68252363
 0.73029674 0.90909091 0.81818182 0.77352678]

mean value: 0.7838119120880734

key: train_mcc
value: [0.7979798  0.81322466 0.81322466 0.8133907  0.80824576 0.81314168
 0.80812204 0.79814268 0.80812204 0.7979798 ]

mean value: 0.8071573819260374

key: test_accuracy
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.88636364 0.84090909
 0.86363636 0.95454545 0.90909091 0.88636364]

mean value: 0.8909090909090909

key: train_accuracy
value: [0.8989899  0.90656566 0.90656566 0.90656566 0.9040404  0.90656566
 0.9040404  0.8989899  0.9040404  0.8989899 ]

mean value: 0.9035353535353535

key: test_fscore
value: [0.93333333 0.88372093 0.90909091 0.85106383 0.87804878 0.84444444
 0.85714286 0.95454545 0.90909091 0.88888889]

mean value: 0.8909370337044393

key: train_fscore
value: [0.8989899  0.90726817 0.90726817 0.90537084 0.905      0.90632911
 0.90452261 0.9        0.90452261 0.8989899 ]

mean value: 0.9038261322876402

key: test_precision
value: [0.91304348 0.9047619  0.90909091 0.8        0.94736842 0.82608696
 0.9        0.95454545 0.90909091 0.86956522]

mean value: 0.8933553250715722

key: train_precision
value: [0.8989899  0.90049751 0.90049751 0.91709845 0.8960396  0.90862944
 0.9        0.89108911 0.9        0.8989899 ]

mean value: 0.9011831422946928

key: test_recall
value: [0.95454545 0.86363636 0.90909091 0.90909091 0.81818182 0.86363636
 0.81818182 0.95454545 0.90909091 0.90909091]

mean value: 0.8909090909090909

key: train_recall
value: [0.8989899  0.91414141 0.91414141 0.89393939 0.91414141 0.9040404
 0.90909091 0.90909091 0.90909091 0.8989899 ]

mean value: 0.9065656565656566

key: test_roc_auc
value: [0.93181818 0.88636364 0.90909091 0.84090909 0.88636364 0.84090909
 0.86363636 0.95454545 0.90909091 0.88636364]

mean value: 0.890909090909091

key: train_roc_auc
value: [0.8989899  0.90656566 0.90656566 0.90656566 0.9040404  0.90656566
 0.9040404  0.8989899  0.9040404  0.8989899 ]

mean value: 0.9035353535353535

key: test_jcc
value: [0.875      0.79166667 0.83333333 0.74074074 0.7826087  0.73076923
 0.75       0.91304348 0.83333333 0.8       ]

mean value: 0.8050495478756349

key: train_jcc
value: [0.81651376 0.83027523 0.83027523 0.8271028  0.82648402 0.8287037
 0.82568807 0.81818182 0.82568807 0.81651376]

mean value: 0.8245426472329047

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [2.06339478 2.06864357 2.89429951 1.91644716 1.80405879 2.09717655
 4.19174433 4.01769924 1.65347528 3.00509381]

mean value: 2.5712033033370973

key: score_time
value: [0.01534939 0.01772881 0.0145216  0.01324105 0.01330876 0.02885771
 0.02873945 0.01542616 0.02645946 0.02501845]

mean value: 0.019865083694458007

key: test_mcc
value: [0.82158384 0.7800135  0.77352678 0.73029674 0.86452993 0.63900965
 0.81818182 0.86452993 0.81818182 0.86452993]

mean value: 0.7974383949974557

key: train_mcc
value: [1.         0.99496218 1.         1.         0.99496218 1.
 0.99496218 1.         1.         1.        ]

mean value: 0.9984886553739265

key: test_accuracy
value: [0.90909091 0.88636364 0.88636364 0.86363636 0.93181818 0.81818182
 0.90909091 0.93181818 0.90909091 0.93181818]

mean value: 0.8977272727272727

key: train_accuracy
value: [1.         0.99747475 1.         1.         0.99747475 1.
 0.99747475 1.         1.         1.        ]

mean value: 0.9992424242424243

key: test_fscore
value: [0.91304348 0.87804878 0.88888889 0.85714286 0.93023256 0.82608696
 0.90909091 0.93023256 0.90909091 0.93023256]

mean value: 0.8972090453902583

key: train_fscore
value: [1.         0.99746835 1.         1.         0.99746835 1.
 0.99746835 1.         1.         1.        ]

mean value: 0.9992405063291139

key: test_precision
value: [0.875      0.94736842 0.86956522 0.9        0.95238095 0.79166667
 0.90909091 0.95238095 0.90909091 0.95238095]

mean value: 0.9058924980435278

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.95454545 0.81818182 0.90909091 0.81818182 0.90909091 0.86363636
 0.90909091 0.90909091 0.90909091 0.90909091]

mean value: 0.8909090909090909

key: train_recall
value: [1.         0.99494949 1.         1.         0.99494949 1.
 0.99494949 1.         1.         1.        ]

mean value: 0.9984848484848485

key: test_roc_auc
value: [0.90909091 0.88636364 0.88636364 0.86363636 0.93181818 0.81818182
 0.90909091 0.93181818 0.90909091 0.93181818]

mean value: 0.8977272727272728

key: train_roc_auc
value: [1.         0.99747475 1.         1.         0.99747475 1.
 0.99747475 1.         1.         1.        ]

mean value: 0.9992424242424243

key: test_jcc
value: [0.84       0.7826087  0.8        0.75       0.86956522 0.7037037
 0.83333333 0.86956522 0.83333333 0.86956522]

mean value: 0.8151674718196458

key: train_jcc
value: [1.         0.99494949 1.         1.         0.99494949 1.
 0.99494949 1.         1.         1.        ]

mean value: 0.9984848484848485

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.03948331 0.0421505  0.02812696 0.02692723 0.0463903  0.02427006
 0.02560449 0.04395103 0.02709556 0.02732158]

mean value: 0.033132100105285646

key: score_time
value: [0.01248169 0.01416612 0.01280332 0.01256847 0.01260996 0.0124898
 0.01261306 0.01800823 0.01289749 0.01290941]

mean value: 0.013354754447937012

key: test_mcc
value: [0.91287093 0.81818182 0.81818182 0.82158384 0.95553309 0.77352678
 0.81818182 0.77352678 0.86452993 0.87177979]

mean value: 0.8427896597116962

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95454545 0.90909091 0.90909091 0.90909091 0.97727273 0.88636364
 0.90909091 0.88636364 0.93181818 0.93181818]

mean value: 0.9204545454545454

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95238095 0.90909091 0.90909091 0.9047619  0.97674419 0.88888889
 0.90909091 0.88372093 0.93023256 0.93617021]

mean value: 0.9200172360489035

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.90909091 0.90909091 0.95       1.         0.86956522
 0.90909091 0.9047619  0.95238095 0.88      ]

mean value: 0.9283980801806888

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 0.90909091 0.86363636 0.95454545 0.90909091
 0.90909091 0.86363636 0.90909091 1.        ]

mean value: 0.9136363636363636

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.90909091 0.90909091 0.90909091 0.97727273 0.88636364
 0.90909091 0.88636364 0.93181818 0.93181818]

mean value: 0.9204545454545455

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90909091 0.83333333 0.83333333 0.82608696 0.95454545 0.8
 0.83333333 0.79166667 0.86956522 0.88      ]

mean value: 0.8530955204216074

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.20858884 0.16574883 0.16682291 0.16697645 0.16598296 0.16459179
 0.17711449 0.17074442 0.16750717 0.21235633]

mean value: 0.17664341926574706

key: score_time
value: [0.02438045 0.0246768  0.02469969 0.02466249 0.02455401 0.02451372
 0.02488565 0.02486062 0.02491927 0.02723241]

mean value: 0.024938511848449706

key: test_mcc
value: [0.77352678 0.7800135  0.81818182 0.6882472  0.77352678 0.73960026
 0.77352678 0.90909091 0.81818182 0.77352678]

mean value: 0.7847422639174152

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.88636364 0.88636364 0.90909091 0.84090909 0.88636364 0.86363636
 0.88636364 0.95454545 0.90909091 0.88636364]

mean value: 0.8909090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.87804878 0.90909091 0.85106383 0.88372093 0.875
 0.88372093 0.95454545 0.90909091 0.88888889]

mean value: 0.8922059521245206

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.86956522 0.94736842 0.90909091 0.8        0.9047619  0.80769231
 0.9047619  0.95454545 0.90909091 0.86956522]

mean value: 0.8876442245778631

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.81818182 0.90909091 0.90909091 0.86363636 0.95454545
 0.86363636 0.95454545 0.90909091 0.90909091]

mean value: 0.9

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.88636364 0.88636364 0.90909091 0.84090909 0.88636364 0.86363636
 0.88636364 0.95454545 0.90909091 0.88636364]

mean value: 0.890909090909091

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.7826087  0.83333333 0.74074074 0.79166667 0.77777778
 0.79166667 0.91304348 0.83333333 0.8       ]

mean value: 0.8064170692431563

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.82

Accuracy on Blind test: 0.91

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0146656  0.02988982 0.01498175 0.01482368 0.01461172 0.01457262
 0.01456785 0.01468992 0.01473713 0.01458812]

mean value: 0.016212821006774902

key: score_time
value: [0.01279306 0.0220902  0.02778959 0.02868342 0.01232696 0.01224971
 0.01235223 0.01237464 0.01246667 0.01231074]

mean value: 0.01654372215270996

key: test_mcc
value: [0.45643546 0.36363636 0.50051733 0.31851103 0.41294832 0.36980013
 0.63900965 0.63636364 0.50471461 0.54772256]

mean value: 0.4749659098162487

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.68181818 0.75       0.65909091 0.70454545 0.68181818
 0.81818182 0.81818182 0.75       0.77272727]

mean value: 0.7363636363636363

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.68181818 0.74418605 0.65116279 0.68292683 0.70833333
 0.80952381 0.81818182 0.76595745 0.7826087 ]

mean value: 0.7358984666081136

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.68181818 0.76190476 0.66666667 0.73684211 0.65384615
 0.85       0.81818182 0.72       0.75      ]

mean value: 0.738925968768074

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.68181818 0.68181818 0.72727273 0.63636364 0.63636364 0.77272727
 0.77272727 0.81818182 0.81818182 0.81818182]

mean value: 0.7363636363636363

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.72727273 0.68181818 0.75       0.65909091 0.70454545 0.68181818
 0.81818182 0.81818182 0.75       0.77272727]

mean value: 0.7363636363636363

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.51724138 0.59259259 0.48275862 0.51851852 0.5483871
 0.68       0.69230769 0.62068966 0.64285714]

mean value: 0.5850908253778109

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.52

Accuracy on Blind test: 0.76

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [2.39341021 2.50459886 1.97631717 2.46608806 2.39209723 2.58320475
 2.78075337 2.49290299 3.35915875 2.54243302]

mean value: 2.549096441268921

key: score_time
value: [0.12756467 0.15043545 0.09386826 0.12860036 0.1271193  0.13112164
 0.22609282 0.12711215 0.23179317 0.12944078]

mean value: 0.14731485843658448

key: test_mcc
value: [1.         0.91287093 0.90909091 0.82158384 0.86452993 0.82158384
 0.86452993 1.         0.95553309 0.91287093]

mean value: 0.9062593395597373

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.95454545 0.95454545 0.90909091 0.93181818 0.90909091
 0.93181818 1.         0.97727273 0.95454545]

mean value: 0.9522727272727273

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 0.95454545 0.91304348 0.93023256 0.91304348
 0.93333333 1.         0.97674419 0.95652174]

mean value: 0.952984518009796

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.95454545 0.875      0.95238095 0.875
 0.91304348 1.         1.         0.91666667]

mean value: 0.9486636551853943

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.95454545 0.95454545 0.90909091 0.95454545
 0.95454545 1.         0.95454545 1.        ]

mean value: 0.9590909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.95454545 0.95454545 0.90909091 0.93181818 0.90909091
 0.93181818 1.         0.97727273 0.95454545]

mean value: 0.9522727272727273

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 0.91304348 0.84       0.86956522 0.84
 0.875      1.         0.95454545 0.91666667]

mean value: 0.9117911725955204

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.94391346 0.95903611 0.93460608 0.94266438 0.93002152 0.95997882
 1.03751016 0.9221313  0.98442721 0.95550966]

mean value: 0.9569798707962036

key: score_time
value: [0.15100384 0.11824775 0.21430612 0.18143821 0.22737098 0.19880962
 0.22887874 0.20321155 0.23416042 0.13647127]

mean value: 0.18938984870910644

key: test_mcc
value: [1.         0.87177979 0.82158384 0.7800135  0.81818182 0.82158384
 0.86452993 0.95553309 0.95553309 0.86452993]

mean value: 0.8753268816111682

key: train_mcc
value: [0.94445649 0.95465504 0.94949495 0.94954339 0.94954339 0.95465504
 0.94954339 0.94445649 0.94949495 0.94949495]

mean value: 0.9495338084295475

key: test_accuracy
value: [1.         0.93181818 0.90909091 0.88636364 0.90909091 0.90909091
 0.93181818 0.97727273 0.97727273 0.93181818]

mean value: 0.9363636363636363

key: train_accuracy
value: [0.97222222 0.97727273 0.97474747 0.97474747 0.97474747 0.97727273
 0.97474747 0.97222222 0.97474747 0.97474747]

mean value: 0.9747474747474747

key: test_fscore
value: [1.         0.92682927 0.9047619  0.89361702 0.90909091 0.91304348
 0.93333333 0.97777778 0.97674419 0.93333333]

mean value: 0.9368531212173918

key: train_fscore
value: [0.9721519  0.97744361 0.97474747 0.97461929 0.97461929 0.97709924
 0.97461929 0.9721519  0.97474747 0.97474747]

mean value: 0.9746946935394861

key: test_precision
value: [1.         1.         0.95       0.84       0.90909091 0.875
 0.91304348 0.95652174 1.         0.91304348]

mean value: 0.9356699604743083
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: train_precision
value: [0.97461929 0.97014925 0.97474747 0.97959184 0.97959184 0.98461538
 0.97959184 0.97461929 0.97474747 0.97474747]

mean value: 0.9767021151473436

key: test_recall
value: [1.         0.86363636 0.86363636 0.95454545 0.90909091 0.95454545
 0.95454545 1.         0.95454545 0.95454545]

mean value: 0.9409090909090909

key: train_recall
value: [0.96969697 0.98484848 0.97474747 0.96969697 0.96969697 0.96969697
 0.96969697 0.96969697 0.97474747 0.97474747]

mean value: 0.9727272727272728

key: test_roc_auc
value: [1.         0.93181818 0.90909091 0.88636364 0.90909091 0.90909091
 0.93181818 0.97727273 0.97727273 0.93181818]

mean value: 0.9363636363636364

key: train_roc_auc
value: [0.97222222 0.97727273 0.97474747 0.97474747 0.97474747 0.97727273
 0.97474747 0.97222222 0.97474747 0.97474747]

mean value: 0.9747474747474747

key: test_jcc
value: [1.         0.86363636 0.82608696 0.80769231 0.83333333 0.84
 0.875      0.95652174 0.95454545 0.875     ]

mean value: 0.8831816154859633

key: train_jcc
value: [0.94581281 0.95588235 0.95073892 0.95049505 0.95049505 0.95522388
 0.95049505 0.94581281 0.95073892 0.95073892]

mean value: 0.9506433746585062

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01132703 0.01045918 0.01127529 0.01106215 0.01119161 0.01149607
 0.01135707 0.01120472 0.01115823 0.01116991]

mean value: 0.011170125007629395

key: score_time
value: [0.00996137 0.0095737  0.01019239 0.00985885 0.00994515 0.01000214
 0.01003671 0.01025009 0.01001143 0.0100646 ]

mean value: 0.009989643096923828

key: test_mcc
value: [0.82158384 0.32118203 0.81818182 0.59152048 0.59152048 0.50051733
 0.63636364 0.77352678 0.77352678 0.77352678]

mean value: 0.6601449963922943

key: train_mcc
value: [0.74250948 0.68434524 0.75299597 0.77793654 0.70837286 0.74243371
 0.74243371 0.72230514 0.75253485 0.7577304 ]

mean value: 0.7383597900292419

key: test_accuracy
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
 0.81818182 0.88636364 0.88636364 0.88636364]

mean value: 0.8295454545454546

key: train_accuracy
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
 0.87121212 0.86111111 0.87626263 0.87878788]

mean value: 0.8689393939393939

key: test_fscore
value: [0.91304348 0.63414634 0.90909091 0.8        0.79069767 0.75555556
 0.81818182 0.88372093 0.88888889 0.88888889]

mean value: 0.8282214484981508

key: train_fscore
value: [0.87218045 0.83377309 0.87841191 0.89       0.84895833 0.87088608
 0.87088608 0.86005089 0.87657431 0.88      ]

mean value: 0.868172113199113

key: test_precision
value: [0.875      0.68421053 0.90909091 0.7826087  0.80952381 0.73913043
 0.81818182 0.9047619  0.86956522 0.86956522]

mean value: 0.8261638533091622

key: train_precision
value: [0.86567164 0.87292818 0.86341463 0.88118812 0.87634409 0.87309645
 0.87309645 0.86666667 0.87437186 0.87128713]

mean value: 0.8718065205643388

key: test_recall
value: [0.95454545 0.59090909 0.90909091 0.81818182 0.77272727 0.77272727
 0.81818182 0.86363636 0.90909091 0.90909091]

mean value: 0.8318181818181818

key: train_recall
value: [0.87878788 0.7979798  0.89393939 0.8989899  0.82323232 0.86868687
 0.86868687 0.85353535 0.87878788 0.88888889]

mean value: 0.8651515151515151

key: test_roc_auc
value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75
 0.81818182 0.88636364 0.88636364 0.88636364]

mean value: 0.8295454545454546

key: train_roc_auc
value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212
 0.87121212 0.86111111 0.87626263 0.87878788]

mean value: 0.8689393939393939

key: test_jcc
value: [0.84       0.46428571 0.83333333 0.66666667 0.65384615 0.60714286
 0.69230769 0.79166667 0.8        0.8       ]

mean value: 0.7149249084249084

key: train_jcc
value: [0.77333333 0.71493213 0.78318584 0.8018018  0.73755656 0.77130045
 0.77130045 0.75446429 0.78026906 0.78571429]

mean value: 0.7673858190211428

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [1.60504842 1.49270439 1.54242682 1.5466218  2.61647606 0.22439003
 1.24004292 1.27762127 0.6597116  1.25325251]

mean value: 1.3458295822143556

key: score_time
value: [0.01259804 0.01326776 0.0133779  0.01286626 0.01313877 0.01219416
 0.01778555 0.01312637 0.01357436 0.01212931]

mean value: 0.013405847549438476

key: test_mcc
value: [1.         0.86452993 0.86452993 0.86452993 0.95553309 0.82158384
 0.90909091 0.95553309 0.95553309 1.        ]

mean value: 0.9190863807668139

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
 0.95454545 0.97727273 0.97727273 1.        ]

mean value: 0.9590909090909091

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.93023256 0.93333333 0.93023256 0.97674419 0.91304348
 0.95454545 0.97674419 0.97674419 1.        ]

mean value: 0.9591619940558263

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95238095 0.91304348 0.95238095 1.         0.875
 0.95454545 1.         1.         1.        ]

mean value: 0.9647350837568229

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.95454545 0.90909091 0.95454545 0.95454545
 0.95454545 0.95454545 0.95454545 1.        ]

mean value: 0.9545454545454546

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
 0.95454545 0.97727273 0.97727273 1.        ]

mean value: 0.9590909090909091

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.86956522 0.875      0.86956522 0.95454545 0.84
 0.91304348 0.95454545 0.95454545 1.        ]

mean value: 0.9230810276679842

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.0410018  0.07751536 0.08132076 0.07177877 0.071033   0.06880021
 0.0640676  0.07675624 0.0497086  0.07524252]

mean value: 0.067722487449646

key: score_time
value: [0.01247358 0.02159452 0.02081919 0.01254892 0.02184677 0.01257586
 0.03190947 0.01236224 0.0167625  0.01244116]

mean value: 0.017533421516418457

key: test_mcc
value: [0.68252363 0.87177979 0.77352678 0.68252363 0.87177979 0.6882472
 0.72727273 0.86452993 0.81818182 0.90909091]

mean value: 0.7889456217849123

key: train_mcc
value: [0.92434853 0.91937955 0.89903576 0.91415307 0.92434853 0.93435535
 0.92948262 0.90909091 0.90414419 0.90913729]

mean value: 0.9167475808639807

key: test_accuracy
value: [0.84090909 0.93181818 0.88636364 0.84090909 0.93181818 0.84090909
 0.86363636 0.93181818 0.90909091 0.95454545]

mean value: 0.8931818181818182

key: train_accuracy
value: [0.96212121 0.95959596 0.94949495 0.95707071 0.96212121 0.96717172
 0.96464646 0.95454545 0.9520202  0.95454545]

mean value: 0.9583333333333334

key: test_fscore
value: [0.84444444 0.92682927 0.88888889 0.84444444 0.92682927 0.85106383
 0.86363636 0.93333333 0.90909091 0.95454545]

mean value: 0.8943106204756438

key: train_fscore
value: [0.96240602 0.96       0.94974874 0.95717884 0.96240602 0.96708861
 0.965      0.95454545 0.95238095 0.95477387]

mean value: 0.9585528498971682

key: test_precision
value: [0.82608696 1.         0.86956522 0.82608696 1.         0.8
 0.86363636 0.91304348 0.90909091 0.95454545]

mean value: 0.896205533596838

key: train_precision
value: [0.95522388 0.95049505 0.945      0.95477387 0.95522388 0.96954315
 0.95544554 0.95454545 0.94527363 0.95      ]

mean value: 0.9535524458194542

key: test_recall
value: [0.86363636 0.86363636 0.90909091 0.86363636 0.86363636 0.90909091
 0.86363636 0.95454545 0.90909091 0.95454545]

mean value: 0.8954545454545455

key: train_recall
value: [0.96969697 0.96969697 0.95454545 0.95959596 0.96969697 0.96464646
 0.97474747 0.95454545 0.95959596 0.95959596]

mean value: 0.9636363636363636

key: test_roc_auc
value: [0.84090909 0.93181818 0.88636364 0.84090909 0.93181818 0.84090909
 0.86363636 0.93181818 0.90909091 0.95454545]

mean value: 0.8931818181818182

key: train_roc_auc
value: [0.96212121 0.95959596 0.94949495 0.95707071 0.96212121 0.96717172
 0.96464646 0.95454545 0.9520202  0.95454545]

mean value: 0.9583333333333334

key: test_jcc
value: [0.73076923 0.86363636 0.8        0.73076923 0.86363636 0.74074074
 0.76       0.875      0.83333333 0.91304348]

mean value: 0.8110928741146133

key: train_jcc
value: [0.92753623 0.92307692 0.90430622 0.9178744  0.92753623 0.93627451
 0.93236715 0.91304348 0.90909091 0.91346154]

mean value: 0.9204567588451691

MCC on Blind test: 0.7

Accuracy on Blind test: 0.85

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01386571 0.01546836 0.01592708 0.01628208 0.01617885 0.0131793
 0.01606131 0.0107379  0.01122975 0.01280212]

mean value: 0.014173245429992676

key: score_time
value: [0.01187468 0.01450491 0.01421571 0.01432848 0.01400518 0.01220989
 0.0135181  0.00963759 0.00962281 0.01071024]

mean value: 0.012462759017944336

key: test_mcc
value: [0.81818182 0.50051733 0.81818182 0.63900965 0.50471461 0.54772256
 0.6882472  0.81818182 0.86452993 0.86452993]

mean value: 0.7063816679047357

key: train_mcc
value: [0.74751288 0.68774638 0.76286954 0.77297377 0.67365307 0.73771253
 0.77281598 0.72786709 0.73308094 0.72786709]

mean value: 0.7344099272046745

key: test_accuracy
value: [0.90909091 0.75       0.90909091 0.81818182 0.75       0.77272727
 0.84090909 0.90909091 0.93181818 0.93181818]

mean value: 0.8522727272727273

key: train_accuracy
value: [0.87373737 0.84343434 0.88131313 0.88636364 0.83585859 0.86868687
 0.88636364 0.86363636 0.86616162 0.86363636]

mean value: 0.8669191919191919

key: test_fscore
value: [0.90909091 0.74418605 0.90909091 0.82608696 0.73170732 0.76190476
 0.82926829 0.90909091 0.93333333 0.93333333]

mean value: 0.8487092768633621

key: train_fscore
value: [0.87309645 0.83937824 0.8797954  0.88491049 0.82939633 0.86666667
 0.88549618 0.86082474 0.8630491  0.86082474]

mean value: 0.8643438322870827

key: test_precision
value: [0.90909091 0.76190476 0.90909091 0.79166667 0.78947368 0.8
 0.89473684 0.90909091 0.91304348 0.91304348]

mean value: 0.8591141638681684

key: train_precision
value: [0.87755102 0.86170213 0.89119171 0.89637306 0.86338798 0.88020833
 0.89230769 0.87894737 0.88359788 0.87894737]

mean value: 0.8804214539130206

key: test_recall
value: [0.90909091 0.72727273 0.90909091 0.86363636 0.68181818 0.72727273
 0.77272727 0.90909091 0.95454545 0.95454545]

mean value: 0.8409090909090909

key: train_recall
value: [0.86868687 0.81818182 0.86868687 0.87373737 0.7979798  0.85353535
 0.87878788 0.84343434 0.84343434 0.84343434]

mean value: 0.848989898989899

key: test_roc_auc
value: [0.90909091 0.75       0.90909091 0.81818182 0.75       0.77272727
 0.84090909 0.90909091 0.93181818 0.93181818]

mean value: 0.8522727272727273

key: train_roc_auc
value: [0.87373737 0.84343434 0.88131313 0.88636364 0.83585859 0.86868687
 0.88636364 0.86363636 0.86616162 0.86363636]

mean value: 0.8669191919191919

key: test_jcc
value: [0.83333333 0.59259259 0.83333333 0.7037037  0.57692308 0.61538462
 0.70833333 0.83333333 0.875      0.875     ]

mean value: 0.7446937321937322

key: train_jcc
value: [0.77477477 0.72321429 0.78538813 0.79357798 0.70852018 0.76470588
 0.79452055 0.75565611 0.75909091 0.75565611]

mean value: 0.7615104905950141

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01290369 0.02171993 0.0183773  0.02184057 0.02121806 0.01895308
 0.01816034 0.01787543 0.02349734 0.01725245]

mean value: 0.019179821014404297

key: score_time
value: [0.00926185 0.01158953 0.0118134  0.01216078 0.01294613 0.01253366
 0.01261091 0.01218534 0.01236916 0.01217318]

mean value: 0.01196439266204834

key: test_mcc
value: [0.54794903 0.77352678 0.62330229 0.64715023 0.58554004 0.32539569
 0.77352678 0.91287093 0.77352678 0.79349205]

mean value: 0.6756280609159855

key: train_mcc
value: [0.73305263 0.92036649 0.81060226 0.8786935  0.5976219  0.57346234
 0.87374852 0.82790197 0.86140292 0.73125738]

mean value: 0.7808109898653016

key: test_accuracy
value: [0.75       0.88636364 0.79545455 0.81818182 0.77272727 0.63636364
 0.88636364 0.95454545 0.88636364 0.88636364]

mean value: 0.8272727272727273

key: train_accuracy
value: [0.8510101  0.95959596 0.90151515 0.93686869 0.76767677 0.74747475
 0.93686869 0.91161616 0.92929293 0.85606061]

mean value: 0.8797979797979798

key: test_fscore
value: [0.68571429 0.88372093 0.82352941 0.8        0.80769231 0.5
 0.88372093 0.95238095 0.88888889 0.89795918]

mean value: 0.8123606890579727

key: train_fscore
value: [0.8259587  0.95854922 0.90780142 0.93333333 0.80991736 0.66216216
 0.93670886 0.90666667 0.93203883 0.8707483 ]

mean value: 0.8743884855867281

key: test_precision
value: [0.92307692 0.9047619  0.72413793 0.88888889 0.7        0.8
 0.9047619  1.         0.86956522 0.81481481]

mean value: 0.8530007584730224

key: train_precision
value: [0.9929078  0.98404255 0.85333333 0.98870056 0.68531469 1.
 0.93908629 0.96045198 0.89719626 0.79012346]

mean value: 0.9091156928519439

key: test_recall
value: [0.54545455 0.86363636 0.95454545 0.72727273 0.95454545 0.36363636
 0.86363636 0.90909091 0.90909091 1.        ]

mean value: 0.8090909090909091

key: train_recall
value: [0.70707071 0.93434343 0.96969697 0.88383838 0.98989899 0.49494949
 0.93434343 0.85858586 0.96969697 0.96969697]

mean value: 0.8712121212121212

key: test_roc_auc
value: [0.75       0.88636364 0.79545455 0.81818182 0.77272727 0.63636364
 0.88636364 0.95454545 0.88636364 0.88636364]

mean value: 0.8272727272727273

key: train_roc_auc
value: [0.8510101  0.95959596 0.90151515 0.93686869 0.76767677 0.74747475
 0.93686869 0.91161616 0.92929293 0.85606061]

mean value: 0.8797979797979798

key: test_jcc
value: [0.52173913 0.79166667 0.7        0.66666667 0.67741935 0.33333333
 0.79166667 0.90909091 0.8        0.81481481]

mean value: 0.7006397542512549

key: train_jcc
value: [0.70351759 0.92039801 0.83116883 0.875      0.68055556 0.49494949
 0.88095238 0.82926829 0.87272727 0.77108434]

mean value: 0.7859621763275807

MCC on Blind test: 0.73

Accuracy on Blind test: 0.86

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.02216911 0.01927686 0.01920557 0.02066255 0.01956725 0.01928711
 0.01948738 0.02105021 0.02236152 0.0231297 ]

mean value: 0.020619726181030272

key: score_time
value: [0.01260042 0.01255798 0.01213145 0.01228118 0.01218748 0.01229119
 0.01216459 0.01225376 0.0122602  0.01242185]

mean value: 0.012315011024475098

key: test_mcc
value: [0.70014004 0.79349205 0.47140452 0.68252363 0.82158384 0.73029674
 0.60678804 0.87177979 0.75592895 0.66143783]

mean value: 0.7095375421713689

key: train_mcc
value: [0.81587826 0.79415212 0.62017367 0.91471323 0.89002473 0.89940294
 0.72894554 0.89180538 0.77045723 0.78127257]

mean value: 0.8106825677849934

key: test_accuracy
value: [0.84090909 0.88636364 0.68181818 0.84090909 0.90909091 0.86363636
 0.79545455 0.93181818 0.86363636 0.81818182]

mean value: 0.8431818181818181

key: train_accuracy
value: [0.90151515 0.88888889 0.77777778 0.95707071 0.94444444 0.94949495
 0.84848485 0.94444444 0.87373737 0.88131313]

mean value: 0.8967171717171717

key: test_fscore
value: [0.82051282 0.87179487 0.53333333 0.8372093  0.9047619  0.86956522
 0.76923077 0.92682927 0.84210526 0.78947368]

mean value: 0.8164816435011689

key: train_fscore
value: [0.89196676 0.87640449 0.71428571 0.9562982  0.94300518 0.94871795
 0.82248521 0.94210526 0.85632184 0.86685552]

mean value: 0.8818446131668011

key: test_precision
value: [0.94117647 1.         1.         0.85714286 0.95       0.83333333
 0.88235294 1.         1.         0.9375    ]

mean value: 0.9401505602240896

key: train_precision
value: [0.98773006 0.98734177 1.         0.97382199 0.96808511 0.96354167
 0.99285714 0.98351648 0.99333333 0.98709677]

mean value: 0.9837324329980541

key: test_recall
value: [0.72727273 0.77272727 0.36363636 0.81818182 0.86363636 0.90909091
 0.68181818 0.86363636 0.72727273 0.68181818]

mean value: 0.740909090909091

key: train_recall
value: [0.81313131 0.78787879 0.55555556 0.93939394 0.91919192 0.93434343
 0.7020202  0.9040404  0.75252525 0.77272727]

mean value: 0.8080808080808081

key: test_roc_auc
value: [0.84090909 0.88636364 0.68181818 0.84090909 0.90909091 0.86363636
 0.79545455 0.93181818 0.86363636 0.81818182]

mean value: 0.8431818181818181

key: train_roc_auc
value: [0.90151515 0.88888889 0.77777778 0.95707071 0.94444444 0.94949495
 0.84848485 0.94444444 0.87373737 0.88131313]

mean value: 0.8967171717171717

key: test_jcc
value: [0.69565217 0.77272727 0.36363636 0.72       0.82608696 0.76923077
 0.625      0.86363636 0.72727273 0.65217391]

mean value: 0.7015416539981757

key: train_jcc
value: [0.805      0.78       0.55555556 0.91625616 0.89215686 0.90243902
 0.69849246 0.89054726 0.74874372 0.765     ]

mean value: 0.7954191044912481

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.23656082 0.17917037 0.17021155 0.16980386 0.17501688 0.16724229
 0.17781687 0.22106504 0.16424084 0.16352606]

mean value: 0.18246545791625976

key: score_time
value: [0.02395177 0.01514435 0.01610875 0.01673985 0.01671529 0.01640296
 0.02107787 0.01673555 0.01646042 0.01520896]

mean value: 0.01745457649230957

key: test_mcc
value: [1.         0.86452993 0.77352678 0.77352678 0.95553309 0.86452993
 0.90909091 0.95553309 0.95553309 0.95553309]

mean value: 0.9007336690106234

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93181818 0.88636364 0.88636364 0.97727273 0.93181818
 0.95454545 0.97727273 0.97727273 0.97727273]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.93023256 0.88372093 0.88888889 0.97674419 0.93333333
 0.95454545 0.97674419 0.97674419 0.97777778]

mean value: 0.9498731501057083

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95238095 0.9047619  0.86956522 1.         0.91304348
 0.95454545 1.         1.         0.95652174]

mean value: 0.955081874647092

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.86363636 0.90909091 0.95454545 0.95454545
 0.95454545 0.95454545 0.95454545 1.        ]

mean value: 0.9454545454545454

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.93181818 0.88636364 0.88636364 0.97727273 0.93181818
 0.95454545 0.97727273 0.97727273 0.97727273]

mean value: 0.9500000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.86956522 0.79166667 0.8        0.95454545 0.875
 0.91304348 0.95454545 0.95454545 0.95652174]

mean value: 0.9069433465085639

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05210972 0.05436349 0.05617356 0.05209541 0.07101512 0.03992128
 0.05424643 0.06785297 0.05630231 0.06146455]

mean value: 0.05655448436737061

key: score_time
value: [0.01932168 0.02250862 0.01773334 0.02535009 0.02827978 0.01752615
 0.0307796  0.03320765 0.02754951 0.02019429]

mean value: 0.024245071411132812

key: test_mcc
value: [1.         0.86452993 0.86452993 0.86452993 0.95553309 0.82158384
 0.90909091 0.91287093 0.95553309 0.95553309]

mean value: 0.9103734736843416

key: train_mcc
value: [0.98496155 1.         0.98994949 1.         0.98994949 0.99496218
 1.         0.99496218 1.         0.96974644]

mean value: 0.9924531348618916

key: test_accuracy
value: [1.         0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
 0.95454545 0.95454545 0.97727273 0.97727273]

mean value: 0.9545454545454546

key: train_accuracy
value: [0.99242424 1.         0.99494949 1.         0.99494949 0.99747475
 1.         0.99747475 1.         0.98484848]

mean value: 0.9962121212121212

key: test_fscore
value: [1.         0.93023256 0.93333333 0.93333333 0.97777778 0.91304348
 0.95454545 0.95238095 0.97674419 0.97777778]

mean value: 0.9549168851595545

key: train_fscore
value: [0.99236641 1.         0.99497487 1.         0.99492386 0.99746835
 1.         0.99746835 1.         0.98477157]

mean value: 0.996197342691844

key: test_precision
value: [1.         0.95238095 0.91304348 0.91304348 0.95652174 0.875
 0.95454545 1.         1.         0.95652174]

mean value: 0.9521056841709016

key: train_precision
value: [1.         1.         0.99       1.         1.         1.
 1.         1.         1.         0.98979592]

mean value: 0.9979795918367347

key: test_recall
value: [1.         0.90909091 0.95454545 0.95454545 1.         0.95454545
 0.95454545 0.90909091 0.95454545 1.        ]

mean value: 0.9590909090909091

key: train_recall
value: [0.98484848 1.         1.         1.         0.98989899 0.99494949
 1.         0.99494949 1.         0.97979798]

mean value: 0.9944444444444445

key: test_roc_auc
value: [1.         0.93181818 0.93181818 0.93181818 0.97727273 0.90909091
 0.95454545 0.95454545 0.97727273 0.97727273]

mean value: 0.9545454545454546

key: train_roc_auc
value: [0.99242424 1.         0.99494949 1.         0.99494949 0.99747475
 1.         0.99747475 1.         0.98484848]

mean value: 0.9962121212121212

key: test_jcc
value: [1.         0.86956522 0.875      0.875      0.95652174 0.84
 0.91304348 0.90909091 0.95454545 0.95652174]

mean value: 0.9149288537549407

key: train_jcc
value: [0.98484848 1.         0.99       1.         0.98989899 0.99494949
 1.         0.99494949 1.         0.97      ]

mean value: 0.9924646464646465

MCC on Blind test: 0.96

Accuracy on Blind test: 0.98

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.07493091 0.11247015 0.18619466 0.14949703 0.53921151 0.20738435
 0.13005686 0.14361334 0.1468308  0.12836099]

mean value: 0.18185505867004395

key: score_time
value: [0.01473236 0.01457143 0.02343249 0.02896667 0.05819082 0.03038406
 0.02774858 0.02561402 0.01513457 0.02963424]

mean value: 0.026840925216674805

key: test_mcc
value: [0.83205029 0.60678804 0.63636364 0.45643546 0.45454545 0.45643546
 0.77352678 0.72727273 0.63900965 0.59152048]

mean value: 0.617394799410964

key: train_mcc
value: [0.98496155 0.98496155 0.98496155 0.98496155 0.99496218 0.98994949
 0.98994949 0.98496155 0.98994949 0.98994949]

mean value: 0.9879567906059172

key: test_accuracy
value: [0.90909091 0.79545455 0.81818182 0.72727273 0.72727273 0.72727273
 0.88636364 0.86363636 0.81818182 0.79545455]

mean value: 0.8068181818181819

key: train_accuracy
value: [0.99242424 0.99242424 0.99242424 0.99242424 0.99747475 0.99494949
 0.99494949 0.99242424 0.99494949 0.99494949]

mean value: 0.9939393939393939

key: test_fscore
value: [0.91666667 0.76923077 0.81818182 0.71428571 0.72727273 0.71428571
 0.88888889 0.86363636 0.80952381 0.79069767]

mean value: 0.8012670146391077

key: train_fscore
value: [0.99236641 0.99236641 0.99236641 0.99236641 0.99746835 0.99492386
 0.99492386 0.99236641 0.99492386 0.99492386]

mean value: 0.9938995846971164

key: test_precision
value: [0.84615385 0.88235294 0.81818182 0.75       0.72727273 0.75
 0.86956522 0.86363636 0.85       0.80952381]

mean value: 0.8166686723336339

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.68181818 0.81818182 0.68181818 0.72727273 0.68181818
 0.90909091 0.86363636 0.77272727 0.77272727]

mean value: 0.7909090909090909

key: train_recall
value: [0.98484848 0.98484848 0.98484848 0.98484848 0.99494949 0.98989899
 0.98989899 0.98484848 0.98989899 0.98989899]

mean value: 0.9878787878787879

key: test_roc_auc
value: [0.90909091 0.79545455 0.81818182 0.72727273 0.72727273 0.72727273
 0.88636364 0.86363636 0.81818182 0.79545455]

mean value: 0.8068181818181818

key: train_roc_auc
value: [0.99242424 0.99242424 0.99242424 0.99242424 0.99747475 0.99494949
 0.99494949 0.99242424 0.99494949 0.99494949]

mean value: 0.993939393939394

key: test_jcc
value: [0.84615385 0.625      0.69230769 0.55555556 0.57142857 0.55555556
 0.8        0.76       0.68       0.65384615]

mean value: 0.6739847374847375

key: train_jcc
value: [0.98484848 0.98484848 0.98484848 0.98484848 0.99494949 0.98989899
 0.98989899 0.98484848 0.98989899 0.98989899]

mean value: 0.9878787878787879

MCC on Blind test: 0.59

Accuracy on Blind test: 0.79

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.72530508 0.64669752 0.66160345 0.69841456 0.70706224 0.64407086
 0.68081927 0.73172593 0.72187114 0.73497915]

mean value: 0.695254921913147

key: score_time
value: [0.00961423 0.0095017  0.0108285  0.01087284 0.0114634  0.00980496
 0.01087284 0.01086545 0.01108456 0.01084566]

mean value: 0.010575413703918457

key: test_mcc
value: [1.         0.86452993 0.86452993 0.81818182 0.95553309 0.82158384
 0.90909091 1.         0.95553309 0.91287093]

mean value: 0.9101853534252073

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93181818 0.93181818 0.90909091 0.97727273 0.90909091
 0.95454545 1.         0.97727273 0.95454545]

mean value: 0.9545454545454546

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.93023256 0.93333333 0.90909091 0.97674419 0.91304348
 0.95454545 1.         0.97674419 0.95652174]

mean value: 0.9550255844593559

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.95238095 0.91304348 0.90909091 1.         0.875
 0.95454545 1.         1.         0.91666667]

mean value: 0.9520727460944852

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.90909091 0.95454545 0.90909091 0.95454545 0.95454545
 0.95454545 1.         0.95454545 1.        ]

mean value: 0.9590909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.93181818 0.93181818 0.90909091 0.97727273 0.90909091
 0.95454545 1.         0.97727273 0.95454545]

mean value: 0.9545454545454546

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.86956522 0.875      0.83333333 0.95454545 0.84
 0.91304348 1.         0.95454545 0.91666667]

mean value: 0.9156699604743083

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.06525111 0.04378963 0.03224754 0.06311655 0.06546879 0.1236515
 0.13866115 0.07320833 0.10163593 0.10575056]

mean value: 0.08127810955047607

key: score_time
value: [0.02526879 0.01164603 0.01404119 0.01201868 0.01778102 0.02148128
 0.01306319 0.03238225 0.0200932  0.01537132]

mean value: 0.018314695358276366

key: test_mcc
value: [ 0.41294832  0.30618622  0.40951418  0.59152048  0.2773501   0.37796447
  0.29277002 -0.05634362  0.33562431  0.22750788]

mean value: 0.3175042364901061

key: train_mcc
value: [0.97485938 0.6751906  0.97984797 0.97984797 0.84241805 0.8407714
 0.66332496 0.73135745 0.9459053  0.97984797]

mean value: 0.8613371053146376

key: test_accuracy
value: [0.70454545 0.63636364 0.70454545 0.79545455 0.63636364 0.68181818
 0.63636364 0.47727273 0.65909091 0.61363636]

mean value: 0.6545454545454545

key: train_accuracy
value: [0.98737374 0.81313131 0.98989899 0.98989899 0.91666667 0.91414141
 0.80555556 0.84848485 0.97222222 0.98989899]

mean value: 0.9227272727272727

key: test_fscore
value: [0.72340426 0.52941176 0.69767442 0.8        0.66666667 0.63157895
 0.55555556 0.25806452 0.70588235 0.62222222]

mean value: 0.6190460699512758

key: train_fscore
value: [0.98746867 0.77018634 0.98994975 0.98994975 0.91008174 0.90607735
 0.75862069 0.82142857 0.97297297 0.98994975]

mean value: 0.9096685579306306

key: test_precision
value: [0.68       0.75       0.71428571 0.7826087  0.61538462 0.75
 0.71428571 0.44444444 0.62068966 0.60869565]

mean value: 0.6680394491398989

key: train_precision
value: [0.9800995  1.         0.985      0.985      0.98816568 1.
 1.         1.         0.94736842 0.985     ]

mean value: 0.9870633604013567

key: test_recall
value: [0.77272727 0.40909091 0.68181818 0.81818182 0.72727273 0.54545455
 0.45454545 0.18181818 0.81818182 0.63636364]

mean value: 0.6045454545454545

key: train_recall
value: [0.99494949 0.62626263 0.99494949 0.99494949 0.84343434 0.82828283
 0.61111111 0.6969697  1.         0.99494949]

mean value: 0.8585858585858586

key: test_roc_auc
value: [0.70454545 0.63636364 0.70454545 0.79545455 0.63636364 0.68181818
 0.63636364 0.47727273 0.65909091 0.61363636]

mean value: 0.6545454545454545

key: train_roc_auc
value: [0.98737374 0.81313131 0.98989899 0.98989899 0.91666667 0.91414141
 0.80555556 0.84848485 0.97222222 0.98989899]

mean value: 0.9227272727272727

key: test_jcc
value: [0.56666667 0.36       0.53571429 0.66666667 0.5        0.46153846
 0.38461538 0.14814815 0.54545455 0.4516129 ]

mean value: 0.4620417062029965

key: train_jcc
value: [0.97524752 0.62626263 0.9800995  0.9800995  0.835      0.82828283
 0.61111111 0.6969697  0.94736842 0.9800995 ]

mean value: 0.8460540715894056

MCC on Blind test: 0.52

Accuracy on Blind test: 0.76

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0181272  0.01742435 0.03097105 0.05920911 0.03365254 0.03182578
 0.04450655 0.05021977 0.03421092 0.03448963]

mean value: 0.035463690757751465

key: score_time
value: [0.01237655 0.01228476 0.01283073 0.03200722 0.0200367  0.01936555
 0.03299618 0.03163648 0.03088164 0.03133059]

mean value: 0.023574638366699218

key: test_mcc
value: [0.81818182 0.77352678 0.77352678 0.68252363 0.86452993 0.73029674
 0.77352678 0.95553309 0.81818182 0.91287093]

mean value: 0.810269831392801

key: train_mcc
value: [0.87374852 0.8693968  0.86391186 0.86873119 0.85876112 0.89404202
 0.86886419 0.86373551 0.87896726 0.85363334]

mean value: 0.8693791805866737

key: test_accuracy
value: [0.90909091 0.88636364 0.88636364 0.84090909 0.93181818 0.86363636
 0.88636364 0.97727273 0.90909091 0.95454545]

mean value: 0.9045454545454545

key: train_accuracy
value: [0.93686869 0.93434343 0.93181818 0.93434343 0.92929293 0.9469697
 0.93434343 0.93181818 0.93939394 0.92676768]

mean value: 0.9345959595959596

key: test_fscore
value: [0.90909091 0.88372093 0.88888889 0.84444444 0.93023256 0.86956522
 0.88372093 0.97674419 0.90909091 0.95652174]

mean value: 0.9052020712688054

key: train_fscore /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)

value: [0.93670886 0.93564356 0.93266833 0.93467337 0.93       0.94656489
 0.935      0.93233083 0.94       0.9273183 ]

mean value: 0.9350908129430359

key: test_precision
value: [0.90909091 0.9047619  0.86956522 0.82608696 0.95238095 0.83333333
 0.9047619  1.         0.90909091 0.91666667]

mean value: 0.9025738753999624

key: train_precision
value: [0.93908629 0.91747573 0.92118227 0.93       0.92079208 0.95384615
 0.92574257 0.92537313 0.93069307 0.92039801]

mean value: 0.9284589309478474

key: test_recall
value: [0.90909091 0.86363636 0.90909091 0.86363636 0.90909091 0.90909091
 0.86363636 0.95454545 0.90909091 1.        ]

mean value: 0.9090909090909091

key: train_recall
value: [0.93434343 0.95454545 0.94444444 0.93939394 0.93939394 0.93939394
 0.94444444 0.93939394 0.94949495 0.93434343]

mean value: 0.9419191919191919

key: test_roc_auc
value: [0.90909091 0.88636364 0.88636364 0.84090909 0.93181818 0.86363636
 0.88636364 0.97727273 0.90909091 0.95454545]

mean value: 0.9045454545454545

key: train_roc_auc
value: [0.93686869 0.93434343 0.93181818 0.93434343 0.92929293 0.9469697
 0.93434343 0.93181818 0.93939394 0.92676768]

mean value: 0.9345959595959596

key: test_jcc
value: [0.83333333 0.79166667 0.8        0.73076923 0.86956522 0.76923077
 0.79166667 0.95454545 0.83333333 0.91666667]

mean value: 0.8290777338603426

key: train_jcc
value: [0.88095238 0.87906977 0.87383178 0.87735849 0.86915888 0.89855072
 0.87793427 0.87323944 0.88679245 0.86448598]

mean value: 0.8781374160862355

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.40553069 0.27984118 0.32102966 0.60245919 0.34661484 0.62287879
 0.34234953 0.59961581 0.49291015 0.4358778 ]

mean value: 0.44491076469421387

key: score_time
value: [0.01224136 0.02761769 0.02338171 0.01102257 0.03149271 0.0273664
 0.01604629 0.02581143 0.03535819 0.01974392]

mean value: 0.023008227348327637

key: test_mcc
value: [0.81818182 0.77352678 0.77352678 0.64715023 0.86452993 0.73029674
 0.77352678 0.95553309 0.81818182 0.91287093]

mean value: 0.8067324910067508

key: train_mcc
value: [0.87374852 0.8693968  0.86391186 0.82332683 0.85876112 0.89404202
 0.86886419 0.86373551 0.87896726 0.85363334]

mean value: 0.8648387451125236

key: test_accuracy
value: [0.90909091 0.88636364 0.88636364 0.81818182 0.93181818 0.86363636
 0.88636364 0.97727273 0.90909091 0.95454545]

mean value: 0.9022727272727272

key: train_accuracy
value: [0.93686869 0.93434343 0.93181818 0.91161616 0.92929293 0.9469697
 0.93434343 0.93181818 0.93939394 0.92676768]

mean value: 0.9323232323232323

key: test_fscore
value: [0.90909091 0.88372093 0.88888889 0.83333333 0.93023256 0.86956522
 0.88372093 0.97674419 0.90909091 0.95652174]

mean value: 0.9040909601576942

key: train_fscore
value: [0.93670886 0.93564356 0.93266833 0.91094148 0.93       0.94656489
 0.935      0.93233083 0.94       0.9273183 ]

mean value: 0.9327176238423159

key: test_precision
value: [0.90909091 0.9047619  0.86956522 0.76923077 0.95238095 0.83333333
 0.9047619  1.         0.90909091 0.91666667]

mean value: 0.8968882566708654

key: train_precision
value: [0.93908629 0.91747573 0.92118227 0.91794872 0.92079208 0.95384615
 0.92574257 0.92537313 0.93069307 0.92039801]

mean value: 0.9272538027427192

key: test_recall
value: [0.90909091 0.86363636 0.90909091 0.90909091 0.90909091 0.90909091
 0.86363636 0.95454545 0.90909091 1.        ]

mean value: 0.9136363636363636

key: train_recall
value: [0.93434343 0.95454545 0.94444444 0.9040404  0.93939394 0.93939394
 0.94444444 0.93939394 0.94949495 0.93434343]

mean value: 0.9383838383838384

key: test_roc_auc
value: [0.90909091 0.88636364 0.88636364 0.81818182 0.93181818 0.86363636
 0.88636364 0.97727273 0.90909091 0.95454545]

mean value: 0.9022727272727273

key: train_roc_auc
value: [0.93686869 0.93434343 0.93181818 0.91161616 0.92929293 0.9469697
 0.93434343 0.93181818 0.93939394 0.92676768]

mean value: 0.9323232323232323

key: test_jcc
value: [0.83333333 0.79166667 0.8        0.71428571 0.86956522 0.76923077
 0.79166667 0.95454545 0.83333333 0.91666667]

mean value: 0.8274293822119909

key: train_jcc
value: [0.88095238 0.87906977 0.87383178 0.8364486  0.86915888 0.89855072
 0.87793427 0.87323944 0.88679245 0.86448598]

mean value: 0.8740464268427159

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.17021847 0.0488503  0.04700518 0.10988641 0.11036181 0.18732977
 0.1769886  0.13736916 0.13100767 0.08153677]

mean value: 0.12005541324615479

key: score_time
value: [0.01927829 0.01233315 0.01237607 0.02285433 0.01247287 0.02293801
 0.01881552 0.01801705 0.01239252 0.02268243]

mean value: 0.01741602420806885

key: test_mcc
value: [0.91452919 0.91106719 1.         0.95652174 0.77865613 0.82506438
 0.68911026 0.64426877 0.74410286 0.68972332]

mean value: 0.8153043839779892

key: train_mcc
value: [0.862096   0.86750864 0.84716163 0.85188889 0.87164354 0.85185095
 0.85680144 0.88165855 0.87664317 0.86676585]

mean value: 0.8634018662617092

key: test_accuracy
value: [0.95555556 0.95555556 1.         0.97777778 0.88888889 0.91111111
 0.84444444 0.82222222 0.86666667 0.84444444]

mean value: 0.9066666666666666

key: train_accuracy
value: [0.9308642  0.93333333 0.92345679 0.92592593 0.93580247 0.92592593
 0.92839506 0.94074074 0.9382716  0.93333333]

mean value: 0.931604938271605

key: test_fscore
value: [0.95238095 0.95454545 1.         0.97777778 0.88888889 0.91666667
 0.85106383 0.82608696 0.88       0.84444444]

mean value: 0.9091854971013158

key: train_fscore
value: [0.93203883 0.93493976 0.92457421 0.92647059 0.93627451 0.92574257
 0.92839506 0.94117647 0.93857494 0.93366093]

mean value: 0.9321847880082487

key: test_precision
value: [1.         0.95454545 1.         0.95652174 0.86956522 0.88
 0.83333333 0.82608696 0.81481481 0.86363636]

mean value: 0.8998503879373445

key: train_precision
value: [0.91866029 0.91509434 0.91346154 0.92195122 0.93170732 0.92574257
 0.92610837 0.93203883 0.93170732 0.92682927]

mean value: 0.9243301070709858

key: test_recall
value: [0.90909091 0.95454545 1.         1.         0.90909091 0.95652174
 0.86956522 0.82608696 0.95652174 0.82608696]

mean value: 0.9207509881422925

key: train_recall
value: [0.94581281 0.95566502 0.93596059 0.93103448 0.9408867  0.92574257
 0.93069307 0.95049505 0.94554455 0.94059406]

mean value: 0.9402428912842022

key: test_roc_auc
value: [0.95454545 0.9555336  1.         0.97826087 0.88932806 0.91007905
 0.84387352 0.82213439 0.86462451 0.84486166]

mean value: 0.9063241106719367

key: train_roc_auc
value: [0.9308272  0.93327806 0.92342584 0.92591328 0.93578988 0.92592547
 0.92840072 0.94076477 0.93828952 0.93335122]

mean value: 0.9315965956201532

key: test_jcc
value: [0.90909091 0.91304348 1.         0.95652174 0.8        0.84615385
 0.74074074 0.7037037  0.78571429 0.73076923]

mean value: 0.838573793356402

key: train_jcc
value: [0.87272727 0.87782805 0.85972851 0.8630137  0.88018433 0.86175115
 0.86635945 0.88888889 0.88425926 0.87557604]

mean value: 0.8730316648333466

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [2.99451971 2.03133249 2.58804703 2.36343884 2.9534328  3.19177508
 2.325104   2.89092827 3.53216934 2.62339878]

mean value: 2.74941463470459

key: score_time
value: [0.02300715 0.01974368 0.01187634 0.01202106 0.01932669 0.02432084
 0.02360487 0.04886913 0.01317501 0.03754377]

mean value: 0.02334885597229004

key: test_mcc
value: [0.91452919 0.86732843 1.         0.95652174 0.77865613 0.82506438
 0.73559956 0.68972332 0.74410286 0.77865613]

mean value: 0.8290181733629296

key: train_mcc
value: [0.89656272 0.89152603 0.81736586 0.87655164 0.90127552 0.88152087
 0.90618446 0.95556639 0.89140349 0.89639783]

mean value: 0.8914354807454282

key: test_accuracy
value: [0.95555556 0.93333333 1.         0.97777778 0.88888889 0.91111111
 0.86666667 0.84444444 0.86666667 0.88888889]

mean value: 0.9133333333333333

key: train_accuracy
value: [0.94814815 0.94567901 0.90864198 0.9382716  0.95061728 0.94074074
 0.95308642 0.97777778 0.94567901 0.94814815]

mean value: 0.945679012345679

key: test_fscore
value: [0.95238095 0.93023256 1.         0.97777778 0.88888889 0.91666667
 0.875      0.84444444 0.88       0.88888889]

mean value: 0.9154280177187154

key: train_fscore
value: [0.94890511 0.94634146 0.90953545 0.93857494 0.95098039 0.94029851
 0.95308642 0.97766749 0.94581281 0.94840295]

mean value: 0.9459605533255245

key: test_precision
value: [1.         0.95238095 1.         0.95652174 0.86956522 0.88
 0.84       0.86363636 0.81481481 0.90909091]

mean value: 0.9086009996444779

key: train_precision
value: [0.9375     0.93719807 0.90291262 0.93627451 0.94634146 0.945
 0.95073892 0.9800995  0.94117647 0.94146341]

mean value: 0.9418704966176731

key: test_recall
value: [0.90909091 0.90909091 1.         1.         0.90909091 0.95652174
 0.91304348 0.82608696 0.95652174 0.86956522]

mean value: 0.924901185770751

key: train_recall
value: [0.96059113 0.95566502 0.91625616 0.9408867  0.95566502 0.93564356
 0.95544554 0.97524752 0.95049505 0.95544554]

mean value: 0.9501341267131639

key: test_roc_auc
value: [0.95454545 0.93280632 1.         0.97826087 0.88932806 0.91007905
 0.86561265 0.84486166 0.86462451 0.88932806]

mean value: 0.9129446640316206

key: train_roc_auc
value: [0.94811735 0.94565429 0.90862313 0.93826513 0.95060479 0.94072819
 0.95309223 0.97777155 0.94569087 0.94816612]

mean value: 0.9456713651660732

key: test_jcc
value: [0.90909091 0.86956522 1.         0.95652174 0.8        0.84615385
 0.77777778 0.73076923 0.78571429 0.8       ]

mean value: 0.8475593006027788

key: train_jcc
value: [0.90277778 0.89814815 0.83408072 0.88425926 0.90654206 0.88732394
 0.91037736 0.95631068 0.89719626 0.90186916]

mean value: 0.8978885361073676

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01388836 0.01219487 0.01202846 0.01193786 0.01194143 0.01207137
 0.01265931 0.01293683 0.01277041 0.01270103]

mean value: 0.012512993812561036

key: score_time
value: [0.01055479 0.01051068 0.01049137 0.01057005 0.01054406 0.01070571
 0.01106882 0.01103139 0.01090407 0.01119542]

mean value: 0.010757637023925782

key: test_mcc
value: [0.73320158 0.70501339 0.55533597 0.72299881 0.62869461 0.77821935
 0.3860278  0.60637261 0.64426877 0.60637261]

mean value: 0.6366505501355761

key: train_mcc
value: [0.69394577 0.68810424 0.64177606 0.65988684 0.66444098 0.69047787
 0.69047787 0.69787618 0.68793807 0.68334493]

mean value: 0.6798268830360812

key: test_accuracy
value: [0.86666667 0.84444444 0.77777778 0.84444444 0.8        0.88888889
 0.68888889 0.8        0.82222222 0.8       ]

mean value: 0.8133333333333334

key: train_accuracy
value: [0.84691358 0.84197531 0.81728395 0.82716049 0.82962963 0.84197531
 0.84197531 0.84691358 0.84197531 0.83950617]

mean value: 0.8375308641975309

key: test_fscore
value: [0.86363636 0.82051282 0.77272727 0.81081081 0.75675676 0.89361702
 0.66666667 0.79069767 0.82608696 0.79069767]

mean value: 0.7992210017746235

key: train_fscore
value: [0.84878049 0.83333333 0.80319149 0.81578947 0.81889764 0.82978723
 0.82978723 0.83769634 0.83246073 0.82939633]

mean value: 0.8279120283586651

key: test_precision
value: [0.86363636 0.94117647 0.77272727 1.         0.93333333 0.875
 0.73684211 0.85       0.82608696 0.85      ]

mean value: 0.8648802502070102

key: train_precision
value: [0.84057971 0.8839779  0.87283237 0.87570621 0.87640449 0.89655172
 0.89655172 0.88888889 0.88333333 0.88268156]

mean value: 0.8797507924454793

key: test_recall
value: [0.86363636 0.72727273 0.77272727 0.68181818 0.63636364 0.91304348
 0.60869565 0.73913043 0.82608696 0.73913043]

mean value: 0.7507905138339921

key: train_recall
value: [0.85714286 0.78817734 0.74384236 0.7635468  0.76847291 0.77227723
 0.77227723 0.79207921 0.78712871 0.78217822]

mean value: 0.7827122860069258

key: test_roc_auc
value: [0.86660079 0.84189723 0.77766798 0.84090909 0.79644269 0.88833992
 0.69071146 0.8013834  0.82213439 0.8013834 ]

mean value: 0.8127470355731226

key: train_roc_auc
value: [0.84688826 0.84210847 0.81746574 0.82731795 0.82978101 0.84180364
 0.84180364 0.84677852 0.84184022 0.83936497]

mean value: 0.8375152416719505

key: test_jcc
value: [0.76       0.69565217 0.62962963 0.68181818 0.60869565 0.80769231
 0.5        0.65384615 0.7037037  0.65384615]

mean value: 0.6694883956623087

key: train_jcc
value: [0.73728814 0.71428571 0.67111111 0.68888889 0.69333333 0.70909091
 0.70909091 0.72072072 0.71300448 0.70852018]

mean value: 0.7065334385791937

MCC on Blind test: 0.68

Accuracy on Blind test: 0.84

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01217747 0.01220322 0.01207113 0.01213813 0.01209164 0.01210999
 0.01217937 0.01207447 0.01208973 0.01212859]

mean value: 0.01212637424468994

key: score_time
value: [0.01052928 0.01042199 0.01046658 0.01045585 0.01039958 0.01040077
 0.01052475 0.01056433 0.01042843 0.01073074]

mean value: 0.010492229461669922

key: test_mcc
value: [0.64613475 0.78405645 0.82213439 0.86758893 0.5169078  0.64613475
 0.68911026 0.51089209 0.73320158 0.51185771]

mean value: 0.6728018722919369

key: train_mcc
value: [0.71373171 0.71391286 0.70403264 0.72839898 0.76296152 0.72399345
 0.73363435 0.73363435 0.718529   0.75311563]

mean value: 0.7285944480486638

key: test_accuracy
value: [0.82222222 0.88888889 0.91111111 0.93333333 0.75555556 0.82222222
 0.84444444 0.75555556 0.86666667 0.75555556]

mean value: 0.8355555555555555

key: train_accuracy
value: [0.85679012 0.85679012 0.85185185 0.86419753 0.88148148 0.8617284
 0.86666667 0.86666667 0.85925926 0.87654321]

mean value: 0.8641975308641976

key: test_fscore
value: [0.80952381 0.87804878 0.90909091 0.93333333 0.76595745 0.83333333
 0.85106383 0.76595745 0.86956522 0.75555556]

mean value: 0.8371429662120305

key: train_fscore
value: [0.85572139 0.855      0.85       0.86486486 0.8817734  0.85858586
 0.86432161 0.86432161 0.85925926 0.87562189]

mean value: 0.8629469881387253

key: test_precision
value: [0.85       0.94736842 0.90909091 0.91304348 0.72       0.8
 0.83333333 0.75       0.86956522 0.77272727]

mean value: 0.836512863185632

key: train_precision
value: [0.86432161 0.8680203  0.86294416 0.8627451  0.8817734  0.87628866
 0.87755102 0.87755102 0.85714286 0.88      ]

mean value: 0.8708338129852269

key: test_recall
value: [0.77272727 0.81818182 0.90909091 0.95454545 0.81818182 0.86956522
 0.86956522 0.7826087  0.86956522 0.73913043]

mean value: 0.8403162055335969

key: train_recall
value: [0.84729064 0.84236453 0.83743842 0.86699507 0.8817734  0.84158416
 0.85148515 0.85148515 0.86138614 0.87128713]

mean value: 0.8553089791737795

key: test_roc_auc
value: [0.82114625 0.88735178 0.91106719 0.93379447 0.756917   0.82114625
 0.84387352 0.75494071 0.86660079 0.75592885]

mean value: 0.8352766798418972

key: train_roc_auc
value: [0.85681364 0.85682583 0.85188753 0.86419061 0.88148076 0.86167878
 0.86662927 0.86662927 0.8592645  0.87653026]

mean value: 0.8641930449202556

key: test_jcc
value: [0.68       0.7826087  0.83333333 0.875      0.62068966 0.71428571
 0.74074074 0.62068966 0.76923077 0.60714286]

mean value: 0.7243721420730417

key: train_jcc
value: [0.74782609 0.74672489 0.73913043 0.76190476 0.78854626 0.75221239
 0.76106195 0.76106195 0.75324675 0.77876106]

mean value: 0.7590476528359691

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01150799 0.01243472 0.01137662 0.01154757 0.01141047 0.01107907
 0.01314235 0.01315427 0.01413846 0.03310156]

mean value: 0.014289307594299316

key: score_time
value: [0.02538848 0.02856398 0.02112675 0.02574253 0.02009082 0.01918626
 0.05908275 0.04989839 0.05723858 0.07202363]

mean value: 0.03783421516418457

key: test_mcc
value: [0.46640316 0.68972332 0.60079051 0.60000118 0.24655092 0.3860278
 0.55666994 0.33399209 0.55533597 0.42744299]

mean value: 0.4862937886203532

key: train_mcc
value: [0.68398976 0.68932545 0.68482256 0.65994656 0.70428051 0.68422603
 0.72399345 0.67997157 0.65931708 0.72375269]

mean value: 0.6893625650465149

key: test_accuracy
value: [0.73333333 0.84444444 0.8        0.8        0.62222222 0.68888889
 0.77777778 0.66666667 0.77777778 0.71111111]

mean value: 0.7422222222222222

key: train_accuracy
value: [0.84197531 0.84444444 0.84197531 0.82962963 0.85185185 0.84197531
 0.8617284  0.83950617 0.82962963 0.8617284 ]

mean value: 0.8444444444444444

key: test_fscore
value: [0.72727273 0.84444444 0.8        0.79069767 0.56410256 0.66666667
 0.79166667 0.66666667 0.7826087  0.69767442]

mean value: 0.7331800524495166

key: train_fscore
value: [0.84158416 0.84210526 0.83838384 0.82619647 0.84924623 0.83919598
 0.85858586 0.8346056  0.82793017 0.85929648]

mean value: 0.8417130058090375

key: test_precision
value: [0.72727273 0.82608696 0.7826087  0.80952381 0.64705882 0.73684211
 0.76       0.68181818 0.7826087  0.75      ]

mean value: 0.7503819995233375

key: train_precision
value: [0.84577114 0.85714286 0.86010363 0.84536082 0.86666667 0.85204082
 0.87628866 0.85863874 0.83417085 0.87244898]

mean value: 0.856863317321244

key: test_recall
value: [0.72727273 0.86363636 0.81818182 0.77272727 0.5        0.60869565
 0.82608696 0.65217391 0.7826087  0.65217391]

mean value: 0.7203557312252965

key: train_recall
value: [0.83743842 0.82758621 0.81773399 0.80788177 0.83251232 0.82673267
 0.84158416 0.81188119 0.82178218 0.84653465]

mean value: 0.8271667560844754

key: test_roc_auc
value: [0.73320158 0.84486166 0.80039526 0.79940711 0.61956522 0.69071146
 0.77667984 0.66699605 0.77766798 0.71245059]

mean value: 0.742193675889328

key: train_roc_auc
value: [0.84198654 0.84448617 0.84203531 0.82968346 0.85189972 0.84193777
 0.86167878 0.83943813 0.8296103  0.86169097]

mean value: 0.8444447154075013

key: test_jcc
value: [0.57142857 0.73076923 0.66666667 0.65384615 0.39285714 0.5
 0.65517241 0.5        0.64285714 0.53571429]

mean value: 0.5849311607932297

key: train_jcc
value: [0.72649573 0.72727273 0.72173913 0.70386266 0.73799127 0.72294372
 0.75221239 0.71615721 0.70638298 0.75330396]

mean value: 0.726836177256853

MCC on Blind test: 0.41

Accuracy on Blind test: 0.71

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.04429531 0.05560946 0.05552363 0.02630138 0.01901889 0.01773357
 0.0175724  0.01781559 0.01794982 0.01811028]

mean value: 0.02899303436279297

key: score_time
value: [0.04254222 0.02727342 0.02701807 0.0157671  0.01172519 0.01093841
 0.01104927 0.01114202 0.01128674 0.01135397]

mean value: 0.018009638786315917

key: test_mcc
value: [0.83484711 0.91452919 1.         0.91106719 0.73663511 0.82506438
 0.68972332 0.64426877 0.78405645 0.64426877]

mean value: 0.7984460299700171

key: train_mcc
value: [0.7927359  0.80280601 0.78773172 0.78766004 0.80741843 0.80741843
 0.81238873 0.81234453 0.81234453 0.81234453]

mean value: 0.8035192858986034

key: test_accuracy
value: [0.91111111 0.95555556 1.         0.95555556 0.86666667 0.91111111
 0.84444444 0.82222222 0.88888889 0.82222222]

mean value: 0.8977777777777778

key: train_accuracy
value: [0.8962963  0.90123457 0.89382716 0.89382716 0.9037037  0.9037037
 0.90617284 0.90617284 0.90617284 0.90617284]

mean value: 0.9017283950617284

key: test_fscore
value: [0.9        0.95238095 1.         0.95454545 0.86956522 0.91666667
 0.84444444 0.82608696 0.89795918 0.82608696]

mean value: 0.898773583214577

key: train_fscore
value: [0.89756098 0.90291262 0.89486553 0.89434889 0.9037037  0.9037037
 0.90640394 0.90594059 0.90594059 0.90594059]

mean value: 0.9021321147462571

key: test_precision
value: [1.         1.         1.         0.95454545 0.83333333 0.88
 0.86363636 0.82608696 0.84615385 0.82608696]

mean value: 0.9029842910712476

key: train_precision
value: [0.88888889 0.88995215 0.88834951 0.89215686 0.90594059 0.90147783
 0.90196078 0.90594059 0.90594059 0.90594059]

mean value: 0.8986548412370806

key: test_recall
value: [0.81818182 0.90909091 1.         0.95454545 0.90909091 0.95652174
 0.82608696 0.82608696 0.95652174 0.82608696]

mean value: 0.8982213438735178

key: train_recall
value: [0.90640394 0.91625616 0.90147783 0.89655172 0.90147783 0.90594059
 0.91089109 0.90594059 0.90594059 0.90594059]

mean value: 0.9056820953031264

key: test_roc_auc
value: [0.90909091 0.95454545 1.         0.9555336  0.86758893 0.91007905
 0.84486166 0.82213439 0.88735178 0.82213439]

mean value: 0.8973320158102767

key: train_roc_auc
value: [0.89627128 0.90119739 0.89380822 0.89382042 0.90370921 0.90370921
 0.90618446 0.90617227 0.90617227 0.90617227]

mean value: 0.9017216992635224

key: test_jcc
value: [0.81818182 0.90909091 1.         0.91304348 0.76923077 0.84615385
 0.73076923 0.7037037  0.81481481 0.7037037 ]

mean value: 0.8208692273909666

key: train_jcc
value: [0.81415929 0.82300885 0.80973451 0.80888889 0.82432432 0.82432432
 0.82882883 0.8280543  0.8280543  0.8280543 ]

mean value: 0.8217431917161224

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.16594219 1.53831148 0.54525065 0.66455746 0.58896971 1.97863793
 1.58019543 2.03711033 1.07067323 0.89870596]

mean value: 1.206835436820984

key: score_time
value: [0.02138186 0.01350737 0.01318359 0.01283979 0.02119589 0.02027488
 0.02557015 0.01267719 0.0126822  0.0126617 ]

mean value: 0.016597461700439454

key: test_mcc
value: [0.83484711 0.91106719 0.95652174 0.95652174 0.68911026 0.86732843
 0.60079051 0.68911026 0.61657545 0.56604076]

mean value: 0.7687913462185909

key: train_mcc
value: [0.81736586 0.86188899 0.80263415 0.81331421 0.81271657 0.80904514
 0.83406549 0.86381736 0.84818518 0.86717283]

mean value: 0.8330205780457784

key: test_accuracy
value: [0.91111111 0.95555556 0.97777778 0.97777778 0.84444444 0.93333333
 0.8        0.84444444 0.77777778 0.77777778]

mean value: 0.88

key: train_accuracy
value: [0.90864198 0.9308642  0.90123457 0.90617284 0.9037037  0.9037037
 0.91604938 0.9308642  0.92345679 0.93333333]

mean value: 0.9158024691358024

key: test_fscore
value: [0.9        0.95454545 0.97777778 0.97777778 0.8372093  0.93617021
 0.8        0.85106383 0.82142857 0.76190476]

mean value: 0.8817877688313116

key: train_fscore
value: [0.90953545 0.93170732 0.90049751 0.90865385 0.89817232 0.90025575
 0.91282051 0.93301435 0.9253012  0.93198992]

mean value: 0.9151948202363086

key: test_precision
value: [1.         0.95454545 0.95652174 0.95652174 0.85714286 0.91666667
 0.81818182 0.83333333 0.6969697  0.84210526]

mean value: 0.8831988568258591

key: train_precision
value: [0.90291262 0.92270531 0.90954774 0.88732394 0.95555556 0.93121693
 0.94680851 0.90277778 0.90140845 0.94871795]

mean value: 0.9208974792335061

key: test_recall
value: [0.81818182 0.95454545 1.         1.         0.81818182 0.95652174
 0.7826087  0.86956522 1.         0.69565217]

mean value: 0.8895256916996047

key: train_recall
value: [0.91625616 0.9408867  0.89162562 0.93103448 0.84729064 0.87128713
 0.88118812 0.96534653 0.95049505 0.91584158]

mean value: 0.9111252011900697

key: test_roc_auc
value: [0.90909091 0.9555336  0.97826087 0.97826087 0.84387352 0.93280632
 0.80039526 0.84387352 0.77272727 0.77964427]

mean value: 0.8794466403162056

key: train_roc_auc
value: [0.90862313 0.93083939 0.90125835 0.9061113  0.90384334 0.90362386
 0.91596352 0.93094913 0.92352339 0.93329025]

mean value: 0.9158025654782227

key: test_jcc
value: [0.81818182 0.91304348 0.95652174 0.95652174 0.72       0.88
 0.66666667 0.74074074 0.6969697  0.61538462]

mean value: 0.7964030494465277

key: train_jcc
value: [0.83408072 0.87214612 0.81900452 0.83259912 0.81516588 0.81860465
 0.83962264 0.87443946 0.86098655 0.87264151]

mean value: 0.8439291167891907

MCC on Blind test: 0.75

Accuracy on Blind test: 0.88

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02708101 0.02197504 0.0223949  0.02293539 0.02220106 0.01959419
 0.02083325 0.01980853 0.01847649 0.02072978]

mean value: 0.021602964401245116

key: score_time
value: [0.01235509 0.00963283 0.01003337 0.01024127 0.00966859 0.0089643
 0.00935817 0.00911379 0.00966048 0.00908685]

mean value: 0.009811472892761231

key: test_mcc
value: [0.82506438 0.82213439 0.91106719 0.95643752 0.87476705 0.95643752
 0.91106719 0.86732843 0.74605372 0.78530224]

mean value: 0.8655659627181006

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91111111 0.91111111 0.95555556 0.97777778 0.93333333 0.97777778
 0.95555556 0.93333333 0.86666667 0.88888889]

mean value: 0.9311111111111111

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.9047619  0.90909091 0.95454545 0.97674419 0.93617021 0.9787234
 0.95652174 0.93617021 0.85714286 0.88372093]

mean value: 0.9293591810737865

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95       0.90909091 0.95454545 1.         0.88       0.95833333
 0.95652174 0.91666667 0.94736842 0.95      ]

mean value: 0.942252652381943

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86363636 0.90909091 0.95454545 0.95454545 1.         1.
 0.95652174 0.95652174 0.7826087  0.82608696]

mean value: 0.9203557312252965

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91007905 0.91106719 0.9555336  0.97727273 0.93478261 0.97727273
 0.9555336  0.93280632 0.86857708 0.89031621]

mean value: 0.9313241106719368

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.82608696 0.83333333 0.91304348 0.95454545 0.88       0.95833333
 0.91666667 0.88       0.75       0.79166667]

mean value: 0.8703675889328063

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.91

Accuracy on Blind test: 0.96

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.11638784 0.12223196 0.12821341 0.12665248 0.12380767 0.12398219
 0.12610722 0.1174705  0.12105989 0.12040401]

mean value: 0.12263171672821045

key: score_time
value: [0.01924944 0.01850915 0.01916957 0.01890802 0.01916337 0.01787496
 0.01930618 0.0202477  0.01806045 0.01934838]

mean value: 0.01898372173309326

key: test_mcc
value: [0.86732843 0.95643752 0.91106719 0.95652174 0.73320158 0.73559956
 0.86732843 0.68972332 0.83484711 0.55841694]

mean value: 0.8110471829947312

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.97777778 0.95555556 0.97777778 0.86666667 0.86666667
 0.93333333 0.84444444 0.91111111 0.77777778]

mean value: 0.9044444444444445

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93023256 0.97674419 0.95454545 0.97777778 0.86363636 0.875
 0.93617021 0.84444444 0.92       0.77272727]

mean value: 0.9051278270083317

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95238095 1.         0.95454545 0.95652174 0.86363636 0.84
 0.91666667 0.86363636 0.85185185 0.80952381]

mean value: 0.9008763201371897

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.95454545 0.95454545 1.         0.86363636 0.91304348
 0.95652174 0.82608696 1.         0.73913043]

mean value: 0.9116600790513834

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93280632 0.97727273 0.9555336  0.97826087 0.86660079 0.86561265
 0.93280632 0.84486166 0.90909091 0.77865613]

mean value: 0.9041501976284585

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.86956522 0.95454545 0.91304348 0.95652174 0.76       0.77777778
 0.88       0.73076923 0.85185185 0.62962963]

mean value: 0.8323704379356553

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01029634 0.01115966 0.01019788 0.01129127 0.01026201 0.0103786
 0.01133752 0.01088929 0.01124072 0.01106381]

mean value: 0.010811710357666015

key: score_time
value: [0.00888944 0.00963187 0.00929451 0.00969863 0.0097034  0.00970888
 0.00967789 0.0094142  0.00970745 0.00901723]

mean value: 0.009474349021911622

key: test_mcc
value: [0.52631666 0.60000118 0.77865613 0.60000118 0.24356483 0.64613475
 0.33402405 0.43557241 0.77821935 0.51185771]

mean value: 0.5454348232923768

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75555556 0.8        0.88888889 0.8        0.62222222 0.82222222
 0.66666667 0.71111111 0.88888889 0.75555556]

mean value: 0.7711111111111111

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7755102  0.79069767 0.88888889 0.79069767 0.60465116 0.83333333
 0.69387755 0.68292683 0.89361702 0.75555556]

mean value: 0.7709755895052613

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7037037  0.80952381 0.86956522 0.80952381 0.61904762 0.8
 0.65384615 0.77777778 0.875      0.77272727]

mean value: 0.769071536354145

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86363636 0.77272727 0.90909091 0.77272727 0.59090909 0.86956522
 0.73913043 0.60869565 0.91304348 0.73913043]

mean value: 0.7778656126482213

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75790514 0.79940711 0.88932806 0.79940711 0.6215415  0.82114625
 0.66501976 0.71343874 0.88833992 0.75592885]

mean value: 0.7711462450592885

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63333333 0.65384615 0.8        0.65384615 0.43333333 0.71428571
 0.53125    0.51851852 0.80769231 0.60714286]

mean value: 0.6353248371998372

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.55

Accuracy on Blind test: 0.77

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [2.70288324 4.30727792 2.46852684 2.61855292 2.55521131 2.43811393
 2.36803675 1.75058436 2.72210979 2.69686818]

mean value: 2.6628165245056152

key: score_time
value: [0.25447559 0.19142795 0.16145873 0.14909005 0.13192844 0.12770748
 0.09348536 0.12782025 0.173311   0.17790127]

mean value: 0.15886061191558837

key: test_mcc
value: [0.91452919 0.95643752 0.95652174 1.         0.86758893 0.91452919
 0.82506438 0.82213439 0.95643752 0.82574419]

mean value: 0.9038987043172344

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1.         0.93333333 0.95555556
 0.91111111 0.91111111 0.97777778 0.91111111]

mean value: 0.9511111111111111

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95238095 0.97674419 0.97777778 1.         0.93333333 0.95833333
 0.91666667 0.91304348 0.9787234  0.90909091]

mean value: 0.9516094041145673

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.95652174 1.         0.91304348 0.92
 0.88       0.91304348 0.95833333 0.95238095]

mean value: 0.949332298136646

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.95454545 1.         1.         0.95454545 1.
 0.95652174 0.91304348 1.         0.86956522]

mean value: 0.9557312252964427

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.97727273 0.97826087 1.         0.93379447 0.95454545
 0.91007905 0.91106719 0.97727273 0.91205534]

mean value: 0.9508893280632411

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90909091 0.95454545 0.95652174 1.         0.875      0.92
 0.84615385 0.84       0.95833333 0.83333333]

mean value: 0.9092978615587312

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.89

Accuracy on Blind test: 0.95

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [1.58991671 1.86406493 2.03459716 1.74722981 2.07112956 2.12889957
 1.83267713 1.98935246 1.70792747 2.10284638]

mean value: 1.9068641185760498

key: score_time
value: [0.21701336 0.21829891 0.22647643 0.18004084 0.16667223 0.17558956
 0.19052815 0.21188164 0.21953034 0.15258312]

mean value: 0.19586145877838135

key: test_mcc
value: [0.91452919 0.95643752 0.95652174 1.         0.86758893 0.91452919
 0.77821935 0.82213439 0.91452919 0.69583743]

mean value: 0.8820326916601581

key: train_mcc
value: [0.95556639 0.95061698 0.95061698 0.94569087 0.95556748 0.9457805
 0.94568955 0.96053948 0.95066215 0.96049359]

mean value: 0.9521223971905592

key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1.         0.93333333 0.95555556
 0.88888889 0.91111111 0.95555556 0.84444444]

mean value: 0.94

key: train_accuracy
value: [0.97777778 0.97530864 0.97530864 0.97283951 0.97777778 0.97283951
 0.97283951 0.98024691 0.97530864 0.98024691]

mean value: 0.9760493827160494

key: test_fscore
value: [0.95238095 0.97674419 0.97777778 1.         0.93333333 0.95833333
 0.89361702 0.91304348 0.95833333 0.8372093 ]

mean value: 0.9400772718068289

key: train_fscore
value: [0.97788698 0.97536946 0.97536946 0.97283951 0.97777778 0.97256858
 0.97270471 0.9800995  0.97512438 0.98019802]

mean value: 0.9759938371686563

key: test_precision
value: [1.         1.         0.95652174 1.         0.91304348 0.92
 0.875      0.91304348 0.92       0.9       ]

mean value: 0.9397608695652174

key: train_precision
value: [0.9754902  0.97536946 0.97536946 0.97524752 0.98019802 0.9798995
 0.97512438 0.985      0.98       0.98019802]

mean value: 0.9781896552287914

key: test_recall
value: [0.90909091 0.95454545 1.         1.         0.95454545 1.
 0.91304348 0.91304348 1.         0.7826087 ]

mean value: 0.9426877470355731

key: train_recall
value: [0.98029557 0.97536946 0.97536946 0.97044335 0.97536946 0.96534653
 0.97029703 0.97524752 0.97029703 0.98019802]

mean value: 0.9738233429254255

key: test_roc_auc
value: [0.95454545 0.97727273 0.97826087 1.         0.93379447 0.95454545
 0.88833992 0.91106719 0.95454545 0.8458498 ]

mean value: 0.9398221343873517

key: train_roc_auc
value: [0.97777155 0.97530849 0.97530849 0.97284544 0.97778374 0.97282105
 0.97283324 0.9802346  0.9752963  0.98024679]

mean value: 0.9760449690289226

key: test_jcc
value: [0.90909091 0.95454545 0.95652174 1.         0.875      0.92
 0.80769231 0.84       0.92       0.72      ]

mean value: 0.8902850410459107

key: train_jcc
value: [0.95673077 0.95192308 0.95192308 0.94711538 0.95652174 0.94660194
 0.9468599  0.96097561 0.95145631 0.96116505]

mean value: 0.9531272860931357

MCC on Blind test: 0.88

Accuracy on Blind test: 0.94

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.03431368 0.02331161 0.01544642 0.01508474 0.01589894 0.01519084
 0.01523089 0.02336693 0.02334094 0.02675939]

mean value: 0.020794439315795898

key: score_time
value: [0.02263975 0.015486   0.01317906 0.02044368 0.01301241 0.02103996
 0.01305509 0.01366353 0.01381397 0.01300645]

mean value: 0.015933990478515625

key: test_mcc
value: [0.64613475 0.78405645 0.82213439 0.86758893 0.5169078  0.64613475
 0.68911026 0.51089209 0.73320158 0.51185771]

mean value: 0.6728018722919369

key: train_mcc
value: [0.71373171 0.71391286 0.70403264 0.72839898 0.76296152 0.72399345
 0.73363435 0.73363435 0.718529   0.75311563]

mean value: 0.7285944480486638

key: test_accuracy
value: [0.82222222 0.88888889 0.91111111 0.93333333 0.75555556 0.82222222
 0.84444444 0.75555556 0.86666667 0.75555556]

mean value: 0.8355555555555555

key: train_accuracy
value: [0.85679012 0.85679012 0.85185185 0.86419753 0.88148148 0.8617284
 0.86666667 0.86666667 0.85925926 0.87654321]

mean value: 0.8641975308641976

key: test_fscore
value: [0.80952381 0.87804878 0.90909091 0.93333333 0.76595745 0.83333333
 0.85106383 0.76595745 0.86956522 0.75555556]

mean value: 0.8371429662120305

key: train_fscore
value: [0.85572139 0.855      0.85       0.86486486 0.8817734  0.85858586
 0.86432161 0.86432161 0.85925926 0.87562189]

mean value: 0.8629469881387253

key: test_precision
value: [0.85       0.94736842 0.90909091 0.91304348 0.72       0.8
 0.83333333 0.75       0.86956522 0.77272727]

mean value: 0.836512863185632

key: train_precision
value: [0.86432161 0.8680203  0.86294416 0.8627451  0.8817734  0.87628866
 0.87755102 0.87755102 0.85714286 0.88      ]

mean value: 0.8708338129852269

key: test_recall
value: [0.77272727 0.81818182 0.90909091 0.95454545 0.81818182 0.86956522
 0.86956522 0.7826087  0.86956522 0.73913043]

mean value: 0.8403162055335969

key: train_recall
value: [0.84729064 0.84236453 0.83743842 0.86699507 0.8817734  0.84158416
 0.85148515 0.85148515 0.86138614 0.87128713]

mean value: 0.8553089791737795

key: test_roc_auc
value: [0.82114625 0.88735178 0.91106719 0.93379447 0.756917   0.82114625
 0.84387352 0.75494071 0.86660079 0.75592885]

mean value: 0.8352766798418972

key: train_roc_auc
value: [0.85681364 0.85682583 0.85188753 0.86419061 0.88148076 0.86167878
 0.86662927 0.86662927 0.8592645  0.87653026]

mean value: 0.8641930449202556

key: test_jcc
value: [0.68       0.7826087  0.83333333 0.875      0.62068966 0.71428571
 0.74074074 0.62068966 0.76923077 0.60714286]

mean value: 0.7243721420730417

key: train_jcc
value: [0.74782609 0.74672489 0.73913043 0.76190476 0.78854626 0.75221239
 0.76106195 0.76106195 0.75324675 0.77876106]

mean value: 0.7590476528359691

MCC on Blind test: 0.73

Accuracy on Blind test: 0.87

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [4.27274823 1.59566879 1.63563442 1.6189487  1.60230374 1.59007549
 1.5283258  1.495116   1.52452278 1.55737209]

mean value: 1.8420716047286987

key: score_time
value: [0.01275396 0.01314974 0.0131228  0.01300788 0.01266623 0.01288438
 0.01313043 0.01349545 0.01266217 0.01410031]

mean value: 0.013097333908081054

key: test_mcc
value: [0.87406293 0.95643752 1.         0.95643752 0.91485328 0.95643752
 0.86732843 0.82213439 0.95643752 0.77865613]

mean value: 0.908278523558777

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93333333 0.97777778 1.         0.97777778 0.95555556 0.97777778
 0.93333333 0.91111111 0.97777778 0.88888889]

mean value: 0.9533333333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.92682927 0.97674419 1.         0.97674419 0.95652174 0.9787234
 0.93617021 0.91304348 0.9787234  0.88888889]

mean value: 0.9532388767942495

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         0.91666667 0.95833333
 0.91666667 0.91304348 0.95833333 0.90909091]

mean value: 0.9572134387351778

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.86363636 0.95454545 1.         0.95454545 1.         1.
 0.95652174 0.91304348 1.         0.86956522]

mean value: 0.9511857707509881

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93181818 0.97727273 1.         0.97727273 0.95652174 0.97727273
 0.93280632 0.91106719 0.97727273 0.88932806]

mean value: 0.9530632411067194

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.86363636 0.95454545 1.         0.95454545 0.91666667 0.95833333
 0.88       0.84       0.95833333 0.8       ]

mean value: 0.9126060606060606

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.95

Accuracy on Blind test: 0.97

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.05823278 0.07864523 0.08365369 0.06540322 0.09378934 0.07705927
 0.10100937 0.11681724 0.09124517 0.08683014]

mean value: 0.08526854515075684

key: score_time
value: [0.02650523 0.02493405 0.01329923 0.02620196 0.03271937 0.02158356
 0.02505922 0.02166438 0.02495146 0.02136326]

mean value: 0.02382817268371582

key: test_mcc
value: [0.82506438 0.73320158 0.91106719 0.86732843 0.73663511 0.77821935
 0.68911026 0.73663511 0.55666994 0.64426877]

mean value: 0.7478200124484804

key: train_mcc
value: [0.93608359 0.91614635 0.90123397 0.91614635 0.9062683  0.92620337
 0.92593586 0.91606106 0.91115718 0.91615248]

mean value: 0.9171388520151313

key: test_accuracy
value: [0.91111111 0.86666667 0.95555556 0.93333333 0.86666667 0.88888889
 0.84444444 0.86666667 0.77777778 0.82222222]

mean value: 0.8733333333333333

key: train_accuracy
value: [0.96790123 0.95802469 0.95061728 0.95802469 0.95308642 0.96296296
 0.96296296 0.95802469 0.95555556 0.95802469]

mean value: 0.9585185185185185

key: test_fscore
value: [0.9047619  0.86363636 0.95454545 0.93023256 0.86956522 0.89361702
 0.85106383 0.86363636 0.79166667 0.82608696]

mean value: 0.8748812336363161

key: train_fscore
value: [0.96836983 0.95843521 0.95073892 0.95843521 0.95354523 0.96240602
 0.96277916 0.95802469 0.95566502 0.95823096]

mean value: 0.9586630239446279

key: test_precision
value: [0.95       0.86363636 0.95454545 0.95238095 0.83333333 0.875
 0.83333333 0.9047619  0.76       0.82608696]

mean value: 0.8753078298513082

key: train_precision
value: [0.95673077 0.95145631 0.95073892 0.95145631 0.94660194 0.97461929
 0.96517413 0.95566502 0.95098039 0.95121951]

mean value: 0.9554642596269585

key: test_recall
value: [0.86363636 0.86363636 0.95454545 0.90909091 0.90909091 0.91304348
 0.86956522 0.82608696 0.82608696 0.82608696]

mean value: 0.8760869565217391

key: train_recall
value: [0.98029557 0.96551724 0.95073892 0.96551724 0.96059113 0.95049505
 0.96039604 0.96039604 0.96039604 0.96534653]

mean value: 0.9619689801492465

key: test_roc_auc
value: [0.91007905 0.86660079 0.9555336  0.93280632 0.86758893 0.88833992
 0.84387352 0.86758893 0.77667984 0.82213439]

mean value: 0.8731225296442688

key: train_roc_auc
value: [0.96787056 0.95800615 0.95061698 0.95800615 0.95306784 0.96293225
 0.96295664 0.95803053 0.95556748 0.95804273]

mean value: 0.9585097302833732

key: test_jcc
value: [0.82608696 0.76       0.91304348 0.86956522 0.76923077 0.80769231
 0.74074074 0.76       0.65517241 0.7037037 ]

mean value: 0.7805235587334538

key: train_jcc
value: [0.93867925 0.92018779 0.90610329 0.92018779 0.91121495 0.92753623
 0.92822967 0.91943128 0.91509434 0.91981132]

mean value: 0.9206475908747523

MCC on Blind test: 0.7

Accuracy on Blind test: 0.85

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01486421 0.01032543 0.01009607 0.01081514 0.01028252 0.01089168
 0.01108813 0.01036739 0.01119781 0.01117921]

mean value: 0.011110758781433106

key: score_time
value: [0.011132   0.00922537 0.00892806 0.00906157 0.00955486 0.00952983
 0.00965738 0.00968456 0.00975657 0.00976205]

mean value: 0.009629225730895996

key: test_mcc
value: [0.79670588 0.77821935 0.73663511 0.86732843 0.77865613 0.87406293
 0.46930785 0.64426877 0.73320158 0.51185771]

mean value: 0.7190243743973286

key: train_mcc
value: [0.6994877  0.68482256 0.68960241 0.7385111  0.74355351 0.76791201
 0.70964919 0.75845593 0.74835945 0.78285689]

mean value: 0.7323210752164393

key: test_accuracy
value: [0.88888889 0.88888889 0.86666667 0.93333333 0.88888889 0.93333333
 0.73333333 0.82222222 0.86666667 0.75555556]

mean value: 0.8577777777777778

key: train_accuracy
value: [0.84938272 0.84197531 0.84444444 0.8691358  0.87160494 0.88395062
 0.85432099 0.87901235 0.87407407 0.89135802]

mean value: 0.865925925925926

key: test_fscore
value: [0.87179487 0.88372093 0.86956522 0.93023256 0.88888889 0.93877551
 0.72727273 0.82608696 0.86956522 0.75555556]

mean value: 0.8561458433392566

key: train_fscore
value: [0.84634761 0.83838384 0.84130982 0.86783042 0.87       0.88395062
 0.84987277 0.87657431 0.87218045 0.89      ]

mean value: 0.8636449842307918

key: test_precision
value: [1.         0.9047619  0.83333333 0.95238095 0.86956522 0.88461538
 0.76190476 0.82608696 0.86956522 0.77272727]

mean value: 0.8674941001027957

key: train_precision
value: [0.86597938 0.86010363 0.86082474 0.87878788 0.88324873 0.8817734
 0.87434555 0.89230769 0.88324873 0.8989899 ]

mean value: 0.8779609631421748

key: test_recall
value: [0.77272727 0.86363636 0.90909091 0.90909091 0.90909091 1.
 0.69565217 0.82608696 0.86956522 0.73913043]

mean value: 0.8494071146245059

key: train_recall
value: [0.82758621 0.81773399 0.8226601  0.85714286 0.85714286 0.88613861
 0.82673267 0.86138614 0.86138614 0.88118812]

mean value: 0.8499097693020533

key: test_roc_auc
value: [0.88636364 0.88833992 0.86758893 0.93280632 0.88932806 0.93181818
 0.73418972 0.82213439 0.86660079 0.75592885]

mean value: 0.857509881422925

key: train_roc_auc
value: [0.84943667 0.84203531 0.84449837 0.86916549 0.87164074 0.88395601
 0.85425304 0.87896893 0.87404282 0.89133298]

mean value: 0.8659330341901185

key: test_jcc
value: [0.77272727 0.79166667 0.76923077 0.86956522 0.8        0.88461538
 0.57142857 0.7037037  0.76923077 0.60714286]

mean value: 0.7539311212137298

key: train_jcc
value: [0.73362445 0.72173913 0.72608696 0.76651982 0.7699115  0.7920354
 0.73893805 0.78026906 0.77333333 0.8018018 ]

mean value: 0.7604259514076851

MCC on Blind test: 0.77

Accuracy on Blind test: 0.88

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01319623 0.02258158 0.01745749 0.02201819 0.02212381 0.02121663
 0.02074075 0.01941252 0.02034569 0.02611756]

mean value: 0.020521044731140137

key: score_time
value: [0.00947046 0.01199651 0.01194668 0.01197267 0.01198769 0.01198697
 0.01204848 0.0119555  0.0119822  0.01206636]

mean value: 0.011741352081298829

key: test_mcc
value: [0.79670588 0.82213439 0.87406293 0.95643752 0.59725988 0.73663511
 0.73559956 0.55362003 0.77865613 0.77865613]

mean value: 0.7629767557040962

key: train_mcc
value: [0.77582446 0.8644041  0.81918005 0.88888095 0.61326848 0.85131769
 0.88257176 0.69882885 0.85568499 0.92648542]

mean value: 0.8176446738630777

key: test_accuracy
value: [0.88888889 0.91111111 0.93333333 0.97777778 0.77777778 0.86666667
 0.86666667 0.75555556 0.88888889 0.88888889]

mean value: 0.8755555555555555

key: train_accuracy
value: [0.87901235 0.9308642  0.90617284 0.94320988 0.77777778 0.92098765
 0.94074074 0.82962963 0.92345679 0.96296296]

mean value: 0.9014814814814814

key: test_fscore
value: [0.87179487 0.90909091 0.92682927 0.97674419 0.80769231 0.86363636
 0.875      0.8        0.88888889 0.88888889]

mean value: 0.8808565684331424

key: train_fscore
value: [0.86501377 0.93364929 0.9        0.94117647 0.81707317 0.9144385
 0.94202899 0.85350318 0.91733333 0.96350365]

mean value: 0.904772036038694

key: test_precision
value: [1.         0.90909091 1.         1.         0.7        0.9047619
 0.84       0.6875     0.90909091 0.90909091]

mean value: 0.8859534632034631

key: train_precision
value: [0.98125    0.89954338 0.96610169 0.9787234  0.69550173 0.99418605
 0.91981132 0.7472119  0.99421965 0.94736842]

mean value: 0.9123917545678761

key: test_recall
value: [0.77272727 0.90909091 0.86363636 0.95454545 0.95454545 0.82608696
 0.91304348 0.95652174 0.86956522 0.86956522]

mean value: 0.8889328063241106

key: train_recall
value: [0.77339901 0.97044335 0.84236453 0.90640394 0.99014778 0.84653465
 0.96534653 0.9950495  0.85148515 0.98019802]

mean value: 0.9121372482075794

key: test_roc_auc
value: [0.88636364 0.91106719 0.93181818 0.97727273 0.78162055 0.86758893
 0.86561265 0.75098814 0.88932806 0.88932806]

mean value: 0.875098814229249

key: train_roc_auc
value: [0.87927376 0.93076623 0.90633078 0.94330098 0.77725211 0.92080427
 0.94080135 0.83003707 0.92327952 0.96300541]

mean value: 0.9014851485148515

key: test_jcc
value: [0.77272727 0.83333333 0.86363636 0.95454545 0.67741935 0.76
 0.77777778 0.66666667 0.8        0.8       ]

mean value: 0.7906106223525579

key: train_jcc
value: [0.76213592 0.87555556 0.81818182 0.88888889 0.69072165 0.84236453
 0.89041096 0.74444444 0.84729064 0.92957746]

mean value: 0.8289571874991976

MCC on Blind test: 0.81

Accuracy on Blind test: 0.9

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01864982 0.01952457 0.01760387 0.018224   0.01664209 0.01940393
 0.01955223 0.03549552 0.02013016 0.0204246 ]

mean value: 0.020565080642700195

key: score_time
value: [0.01202798 0.01201439 0.012398   0.01194167 0.01194906 0.01242661
 0.01245832 0.02830505 0.01237369 0.01226449]

mean value: 0.013815927505493163

key: test_mcc
value: [0.91452919 0.87406293 0.87476705 0.72645449 0.57868151 0.58158
 0.73559956 0.57373395 0.69404997 0.55362003]

mean value: 0.7107078686833688

key: train_mcc
value: [0.80550226 0.82411192 0.82353111 0.62805778 0.7612786  0.64806439
 0.9019476  0.77341987 0.90127552 0.66265175]

mean value: 0.7729840794536423

key: test_accuracy
value: [0.95555556 0.93333333 0.93333333 0.84444444 0.77777778 0.75555556
 0.86666667 0.77777778 0.84444444 0.75555556]

mean value: 0.8444444444444444

key: train_accuracy
value: [0.8962963  0.90617284 0.90864198 0.78518519 0.87160494 0.79753086
 0.95061728 0.87654321 0.95061728 0.80493827]

mean value: 0.8748148148148148

key: test_fscore
value: [0.95238095 0.92682927 0.93617021 0.8627451  0.8        0.80701754
 0.875      0.80769231 0.85714286 0.8       ]

mean value: 0.8624978240173622

key: train_fscore
value: [0.88648649 0.89784946 0.91415313 0.82281059 0.88444444 0.83057851
 0.95145631 0.88888889 0.95024876 0.83643892]

mean value: 0.8863355507758013

key: test_precision
value: [1.         1.         0.88       0.75862069 0.71428571 0.67647059
 0.84       0.72413793 0.80769231 0.6875    ]

mean value: 0.8088707230902972

key: train_precision
value: [0.98203593 0.98816568 0.86403509 0.70138889 0.80566802 0.71276596
 0.93333333 0.80645161 0.955      0.71886121]

mean value: 0.8467705715067385

key: test_recall
value: [0.90909091 0.86363636 1.         1.         0.90909091 1.
 0.91304348 0.91304348 0.91304348 0.95652174]

mean value: 0.9377470355731226

key: train_recall
value: [0.80788177 0.8226601  0.97044335 0.99507389 0.98029557 0.9950495
 0.97029703 0.99009901 0.94554455 1.        ]

mean value: 0.9477344778812856

key: test_roc_auc
value: [0.95454545 0.93181818 0.93478261 0.84782609 0.78063241 0.75
 0.86561265 0.77470356 0.84288538 0.75098814]

mean value: 0.8433794466403162

key: train_roc_auc
value: [0.89651514 0.90637955 0.908489   0.78466566 0.8713359  0.79801736
 0.95066576 0.8768229  0.95060479 0.80541872]

mean value: 0.8748914792957128

key: test_jcc
value: [0.90909091 0.86363636 0.88       0.75862069 0.66666667 0.67647059
 0.77777778 0.67741935 0.75       0.66666667]

mean value: 0.762634901656756

key: train_jcc
value: [0.7961165  0.81463415 0.84188034 0.69896194 0.79282869 0.71024735
 0.90740741 0.8        0.90521327 0.71886121]

mean value: 0.7986150853388724

MCC on Blind test: 0.75

Accuracy on Blind test: 0.87

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.18614674 0.18338513 0.18672371 0.18425417 0.18633842 0.17906499
 0.16986871 0.1727941  0.16615391 0.17229795]

mean value: 0.17870278358459474

key: score_time
value: [0.01630163 0.0177443  0.01695061 0.01684165 0.01702309 0.01606345
 0.01537633 0.01550055 0.015517   0.01609135]

mean value: 0.016340994834899904

key: test_mcc
value: [0.91452919 0.91452919 1.         0.91106719 0.91485328 0.91452919
 0.91106719 0.86732843 0.95643752 0.77865613]

mean value: 0.9082997309622647

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95555556 0.95555556 1.         0.95555556 0.95555556 0.95555556
 0.95555556 0.93333333 0.97777778 0.88888889]

mean value: 0.9533333333333334

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95238095 0.95238095 1.         0.95454545 0.95652174 0.95833333
 0.95652174 0.93617021 0.9787234  0.88888889]

mean value: 0.9534466676811728

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.95454545 0.91666667 0.92
 0.95652174 0.91666667 0.95833333 0.90909091]

mean value: 0.9531824769433466

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 1.         0.95454545 1.         1.
 0.95652174 0.95652174 1.         0.86956522]

mean value: 0.9555335968379447

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.95454545 1.         0.9555336  0.95652174 0.95454545
 0.9555336  0.93280632 0.97727273 0.88932806]

mean value: 0.9530632411067194

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90909091 0.90909091 1.         0.91304348 0.91666667 0.92
 0.91666667 0.88       0.95833333 0.8       ]

mean value: 0.9122891963109354

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.93

Accuracy on Blind test: 0.96

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.0584321  0.06014061 0.07051253 0.0679574  0.06842804 0.0796442
 0.0581665  0.08605957 0.07957792 0.07208705]

mean value: 0.07010059356689453

key: score_time
value: [0.03084612 0.02727365 0.02405024 0.02450156 0.03062487 0.02380657
 0.0292778  0.04444814 0.02564001 0.03785896]

mean value: 0.029832792282104493

key: test_mcc
value: [0.91452919 0.95643752 0.95643752 1.         0.86758893 0.95643752
 0.91452919 0.86732843 0.95643752 0.77865613]

mean value: 0.9168381944162244

key: train_mcc
value: [0.98519729 0.98029509 0.99507377 0.9901234  0.98029509 0.97532008
 0.98024679 0.99507377 0.9704168  0.98024679]

mean value: 0.9832288871744899

key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1.         0.93333333 0.97777778
 0.95555556 0.93333333 0.97777778 0.88888889]

mean value: 0.9577777777777777

key: train_accuracy
value: [0.99259259 0.99012346 0.99753086 0.99506173 0.99012346 0.98765432
 0.99012346 0.99753086 0.98518519 0.99012346]

mean value: 0.9916049382716049

key: test_fscore
value: [0.95238095 0.97674419 0.97674419 1.         0.93333333 0.9787234
 0.95833333 0.93617021 0.9787234  0.88888889]

mean value: 0.9580041901306127

key: train_fscore
value: [0.99259259 0.99009901 0.997543   0.99507389 0.99009901 0.98759305
 0.99009901 0.99751861 0.98507463 0.99009901]

mean value: 0.9915791810761856

key: test_precision
value: [1.         1.         1.         1.         0.91304348 0.95833333
 0.92       0.91666667 0.95833333 0.90909091]

mean value: 0.9575467720685112

key: train_precision
value: [0.9950495  0.99502488 0.99509804 0.99507389 0.99502488 0.99004975
 0.99009901 1.         0.99       0.99009901]

mean value: 0.993551895808134

key: test_recall
value: [0.90909091 0.95454545 0.95454545 1.         0.95454545 1.
 1.         0.95652174 1.         0.86956522]

mean value: 0.9598814229249012

key: train_recall
value: [0.99014778 0.98522167 1.         0.99507389 0.98522167 0.98514851
 0.99009901 0.9950495  0.98019802 0.99009901]

mean value: 0.9896259084036483

key: test_roc_auc
value: [0.95454545 0.97727273 0.97727273 1.         0.93379447 0.97727273
 0.95454545 0.93280632 0.97727273 0.88932806]

mean value: 0.9574110671936759

key: train_roc_auc
value: [0.99259864 0.99013559 0.99752475 0.9950617  0.99013559 0.98764815
 0.9901234  0.99752475 0.9851729  0.9901234 ]

mean value: 0.9916048870896942

key: test_jcc
value: [0.90909091 0.95454545 0.95454545 1.         0.875      0.95833333
 0.92       0.88       0.95833333 0.8       ]

mean value: 0.9209848484848485

key: train_jcc
value: [0.98529412 0.98039216 0.99509804 0.99019608 0.98039216 0.9754902
 0.98039216 0.9950495  0.97058824 0.98039216]

mean value: 0.9833284799068142

MCC on Blind test: 0.93

Accuracy on Blind test: 0.96

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.11871982 0.17861295 0.23517871 0.15984797 0.21449161 0.23632669
 0.18187809 0.18064666 0.17393255 0.1803596 ]

mean value: 0.1859994649887085

key: score_time
value: [0.02222514 0.02561641 0.023772   0.02483273 0.02410769 0.02581143
 0.02375984 0.02371955 0.02334571 0.03471303]

mean value: 0.025190353393554688

key: test_mcc
value: [0.65335861 0.73320158 0.77821935 0.86758893 0.46720513 0.38019877
 0.65604724 0.46640316 0.73559956 0.42744299]

mean value: 0.616526531713881

key: train_mcc
value: [0.98529376 0.98529376 0.99017193 0.99017193 0.99507389 0.98529269
 0.99017145 1.         0.98529269 0.98529269]

mean value: 0.9892054809931761

key: test_accuracy
value: [0.82222222 0.86666667 0.88888889 0.93333333 0.73333333 0.68888889
 0.82222222 0.73333333 0.86666667 0.71111111]

mean value: 0.8066666666666666

key: train_accuracy
value: [0.99259259 0.99259259 0.99506173 0.99506173 0.99753086 0.99259259
 0.99506173 1.         0.99259259 0.99259259]

mean value: 0.9945679012345678

key: test_fscore
value: [0.8        0.86363636 0.88372093 0.93333333 0.71428571 0.68181818
 0.80952381 0.73913043 0.875      0.69767442]

mean value: 0.7998123186217221

key: train_fscore
value: [0.99255583 0.99255583 0.9950495  0.9950495  0.99753086 0.9925187
 0.99502488 1.         0.9925187  0.9925187 ]

mean value: 0.9945322521977115

key: test_precision
value: [0.88888889 0.86363636 0.9047619  0.91304348 0.75       0.71428571
 0.89473684 0.73913043 0.84       0.75      ]

mean value: 0.8258483626721613

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.86363636 0.86363636 0.95454545 0.68181818 0.65217391
 0.73913043 0.73913043 0.91304348 0.65217391]

mean value: 0.7786561264822134

key: train_recall
value: [0.98522167 0.98522167 0.99014778 0.99014778 0.99507389 0.98514851
 0.99009901 1.         0.98514851 0.98514851]

mean value: 0.9891357362337219

key: test_roc_auc
value: [0.8201581  0.86660079 0.88833992 0.93379447 0.73221344 0.68972332
 0.82411067 0.73320158 0.86561265 0.71245059]

mean value: 0.8066205533596839

key: train_roc_auc
value: [0.99261084 0.99261084 0.99507389 0.99507389 0.99753695 0.99257426
 0.9950495  1.         0.99257426 0.99257426]

mean value: 0.9945678681168609

key: test_jcc
value: [0.66666667 0.76       0.79166667 0.875      0.55555556 0.51724138
 0.68       0.5862069  0.77777778 0.53571429]

mean value: 0.6745829228243021

key: train_jcc
value: [0.98522167 0.98522167 0.99014778 0.99014778 0.99507389 0.98514851
 0.99009901 1.         0.98514851 0.98514851]

mean value: 0.9891357362337219

MCC on Blind test: 0.62

Accuracy on Blind test: 0.81

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.67379546 0.64568853 0.6526382  0.67437029 0.65233517 0.66356587
 0.65809655 0.66884041 0.65953517 0.64904428]

mean value: 0.6597909927368164

key: score_time
value: [0.00955462 0.00969028 0.0093205  0.0094049  0.00958753 0.00942659
 0.00944066 0.01032162 0.00953197 0.00981259]

mean value: 0.009609127044677734

key: test_mcc
value: [0.91452919 0.95643752 0.95643752 1.         0.91485328 0.91452919
 0.86732843 0.82506438 0.78530224 0.82213439]

mean value: 0.8956616127817072

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95555556 0.97777778 0.97777778 1.         0.95555556 0.95555556
 0.93333333 0.91111111 0.88888889 0.91111111]

mean value: 0.9466666666666667

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95238095 0.97674419 0.97674419 1.         0.95652174 0.95833333
 0.93617021 0.91666667 0.88372093 0.91304348]

mean value: 0.9470325684863796

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         1.         0.91666667 0.92
 0.91666667 0.88       0.95       0.91304348]

mean value: 0.9496376811594203

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.95454545 0.95454545 1.         1.         1.
 0.95652174 0.95652174 0.82608696 0.91304348]

mean value: 0.9470355731225296

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.97727273 0.97727273 1.         0.95652174 0.95454545
 0.93280632 0.91007905 0.89031621 0.91106719]

mean value: 0.9464426877470355

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90909091 0.95454545 0.95454545 1.         0.91666667 0.92
 0.88       0.84615385 0.79166667 0.84      ]

mean value: 0.9012668997668998

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.96

Accuracy on Blind test: 0.98

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.03017378 0.05092001 0.03264356 0.0517385  0.03246832 0.07408309
 0.03232288 0.03246665 0.05935621 0.03307438]

mean value: 0.04292473793029785

key: score_time
value: [0.02918839 0.02444148 0.01487088 0.01399708 0.02390194 0.01415896
 0.0149672  0.01498842 0.02188349 0.01332688]

mean value: 0.018572473526000978

key: test_mcc
value: [0.5216284  0.46720513 0.51089209 0.43557241 0.38112585 0.55841694
 0.19960474 0.44784269 0.2903816  0.46640316]

mean value: 0.4279073012043456

key: train_mcc
value: [0.77727216 0.81448302 0.98519729 0.95177249 0.878915   0.8700435
 0.94707011 0.94707011 0.96124772 0.98519693]

mean value: 0.9118268331436081

key: test_accuracy
value: [0.75555556 0.73333333 0.75555556 0.71111111 0.68888889 0.77777778
 0.6        0.71111111 0.64444444 0.73333333]

mean value: 0.7111111111111111

key: train_accuracy
value: [0.87654321 0.89876543 0.99259259 0.97530864 0.93580247 0.9308642
 0.97283951 0.97283951 0.98024691 0.99259259]

mean value: 0.9528395061728395

key: test_fscore
value: [0.71794872 0.71428571 0.74418605 0.73469388 0.65       0.77272727
 0.60869565 0.66666667 0.68       0.73913043]

mean value: 0.7028334382647542

key: train_fscore
value: [0.85955056 0.88767123 0.99259259 0.97596154 0.93157895 0.92553191
 0.97201018 0.97201018 0.98058252 0.99255583]

mean value: 0.9490045499762084

key: test_precision
value: [0.82352941 0.75       0.76190476 0.66666667 0.72222222 0.80952381
 0.60869565 0.8125     0.62962963 0.73913043]

mean value: 0.7323802588668318

key: train_precision
value: [1.         1.         0.9950495  0.95305164 1.         1.
 1.         1.         0.96190476 0.99502488]

mean value: 0.9905030785669636

key: test_recall
value: [0.63636364 0.68181818 0.72727273 0.81818182 0.59090909 0.73913043
 0.60869565 0.56521739 0.73913043 0.73913043]

mean value: 0.6845849802371542

key: train_recall
value: [0.75369458 0.79802956 0.99014778 1.         0.87192118 0.86138614
 0.94554455 0.94554455 1.         0.99009901]

mean value: 0.9156367360874018

key: test_roc_auc
value: [0.75296443 0.73221344 0.75494071 0.71343874 0.68675889 0.77865613
 0.59980237 0.71442688 0.64229249 0.73320158]

mean value: 0.7108695652173913

key: train_roc_auc
value: [0.87684729 0.89901478 0.99259864 0.97524752 0.93596059 0.93069307
 0.97277228 0.97277228 0.98029557 0.99258645]

mean value: 0.9528788469980003

key: test_jcc
value: [0.56       0.55555556 0.59259259 0.58064516 0.48148148 0.62962963
 0.4375     0.5        0.51515152 0.5862069 ]

mean value: 0.5438762832252821

key: train_jcc
value: [0.75369458 0.79802956 0.98529412 0.95305164 0.87192118 0.86138614
 0.94554455 0.94554455 0.96190476 0.98522167]

mean value: 0.9061592765342953

MCC on Blind test: 0.54

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0267837  0.04602504 0.04733205 0.03782296 0.05901909 0.02543259
 0.04678321 0.01800036 0.06643128 0.03821254]

mean value: 0.041184282302856444

key: score_time
value: [0.02365017 0.02335858 0.03110051 0.02066255 0.0384798  0.02338219
 0.01257777 0.02225137 0.02346373 0.02367878]

mean value: 0.024260544776916505

key: test_mcc
value: [0.91452919 0.95643752 0.91106719 0.91106719 0.77865613 0.82506438
 0.73559956 0.64752602 0.70501339 0.64426877]

mean value: 0.802922934636812

key: train_mcc
value: [0.85762118 0.86692207 0.84700001 0.85704185 0.86692207 0.87160416
 0.87199635 0.86211613 0.88643125 0.88165855]

mean value: 0.8669313605558044

key: test_accuracy
value: [0.95555556 0.97777778 0.95555556 0.95555556 0.88888889 0.91111111
 0.86666667 0.82222222 0.84444444 0.82222222]

mean value: 0.9

key: train_accuracy
value: [0.92839506 0.93333333 0.92345679 0.92839506 0.93333333 0.93580247
 0.93580247 0.9308642  0.94320988 0.94074074]

mean value: 0.9333333333333333

key: test_fscore
value: [0.95238095 0.97674419 0.95454545 0.95454545 0.88888889 0.91666667
 0.875      0.81818182 0.8627451  0.82608696]

mean value: 0.9025785475816701

key: train_fscore
value: [0.93012048 0.93430657 0.92420538 0.92944039 0.93430657 0.93564356
 0.93658537 0.93170732 0.94320988 0.94117647]

mean value: 0.9340701983296061

key: test_precision
value: [1.         1.         0.95454545 0.95454545 0.86956522 0.88
 0.84       0.85714286 0.78571429 0.82608696]

mean value: 0.8967600225861095

key: train_precision
value: [0.91037736 0.92307692 0.91747573 0.91826923 0.92307692 0.93564356
 0.92307692 0.91826923 0.9408867  0.93203883]

mean value: 0.9242191416230418

key: test_recall
value: [0.90909091 0.95454545 0.95454545 0.95454545 0.90909091 0.95652174
 0.91304348 0.7826087  0.95652174 0.82608696]

mean value: 0.9116600790513834

key: train_recall
value: [0.95073892 0.94581281 0.93103448 0.9408867  0.94581281 0.93564356
 0.95049505 0.94554455 0.94554455 0.95049505]

mean value: 0.9442008486562942

key: test_roc_auc
value: [0.95454545 0.97727273 0.9555336  0.9555336  0.88932806 0.91007905
 0.86561265 0.82312253 0.84189723 0.82213439]

mean value: 0.8995059288537549

key: train_roc_auc
value: [0.92833976 0.93330244 0.92343803 0.92836414 0.93330244 0.93580208
 0.93583866 0.93090036 0.94321563 0.94076477]

mean value: 0.9333268302199678

key: test_jcc
value: [0.90909091 0.95454545 0.91304348 0.91304348 0.8        0.84615385
 0.77777778 0.69230769 0.75862069 0.7037037 ]

mean value: 0.8268287029756295

key: train_jcc
value: [0.86936937 0.87671233 0.85909091 0.86818182 0.87671233 0.87906977
 0.88073394 0.87214612 0.89252336 0.88888889]

mean value: 0.8763428838668663

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.47064686 0.37768698 0.39077091 0.46480179 0.4833498  0.96630669
 0.25368643 0.32124949 0.53351617 0.37344742]

mean value: 0.46354625225067136

key: score_time
value: [0.03063655 0.02306867 0.02072167 0.02873063 0.02459288 0.01240277
 0.01261353 0.03182149 0.0360291  0.02511287]

mean value: 0.02457301616668701

key: test_mcc
value: [0.83484711 0.95643752 0.91106719 0.91106719 0.77865613 0.82506438
 0.68911026 0.64752602 0.70501339 0.64426877]

mean value: 0.7903057965349736

key: train_mcc
value: [0.79798935 0.86692207 0.84700001 0.85704185 0.90127552 0.87160416
 0.92098717 0.86211613 0.88643125 0.88165855]

mean value: 0.869302605153845

key: test_accuracy
value: [0.91111111 0.97777778 0.95555556 0.95555556 0.88888889 0.91111111
 0.84444444 0.82222222 0.84444444 0.82222222]

mean value: 0.8933333333333333

key: train_accuracy
value: [0.89876543 0.93333333 0.92345679 0.92839506 0.95061728 0.93580247
 0.96049383 0.9308642  0.94320988 0.94074074]

mean value: 0.9345679012345679

key: test_fscore
value: [0.9        0.97674419 0.95454545 0.95454545 0.88888889 0.91666667
 0.85106383 0.81818182 0.8627451  0.82608696]

mean value: 0.8949468353222984

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.90072639 0.93430657 0.92420538 0.92944039 0.95098039 0.93564356
 0.96039604 0.93170732 0.94320988 0.94117647]

mean value: 0.9351792390184265

key: test_precision
value: [1.         1.         0.95454545 0.95454545 0.86956522 0.88
 0.83333333 0.85714286 0.78571429 0.82608696]

mean value: 0.8960933559194428

key: train_precision
value: [0.88571429 0.92307692 0.91747573 0.91826923 0.94634146 0.93564356
 0.96039604 0.91826923 0.9408867  0.93203883]

mean value: 0.9278112000318885

key: test_recall
value: [0.81818182 0.95454545 0.95454545 0.95454545 0.90909091 0.95652174
 0.86956522 0.7826087  0.95652174 0.82608696]

mean value: 0.8982213438735178

key: train_recall
value: [0.91625616 0.94581281 0.93103448 0.9408867  0.95566502 0.93564356
 0.96039604 0.94554455 0.94554455 0.95049505]

mean value: 0.9427278934790031

key: test_roc_auc
value: [0.90909091 0.97727273 0.9555336  0.9555336  0.88932806 0.91007905
 0.84387352 0.82312253 0.84189723 0.82213439]

mean value: 0.8927865612648221

key: train_roc_auc
value: [0.89872214 0.93330244 0.92343803 0.92836414 0.95060479 0.93580208
 0.96049359 0.93090036 0.94321563 0.94076477]

mean value: 0.9345607959810759

key: test_jcc
value: [0.81818182 0.95454545 0.91304348 0.91304348 0.8        0.84615385
 0.74074074 0.69230769 0.75862069 0.7037037 ]

mean value: 0.8140340901810167

key: train_jcc
value: [0.81938326 0.87671233 0.85909091 0.86818182 0.90654206 0.87906977
 0.92380952 0.87214612 0.89252336 0.88888889]

mean value: 0.8786348035374227

MCC on Blind test: 0.79

Accuracy on Blind test: 0.89